Anda di halaman 1dari 34

Introduction

Lesson Agenda
Introduction
Course Objectives
Course Agenda
Lab Environment
Additional resources
Introduction
Questions invoking interest and participatio
n
How much do you know about big data?
What is the job market for big data analys
t?
What kind of professional skills does it re
quire to be big data analyst?
What do you hope to learn from this cour
se?
Course Objectives
After taking this course, you should know or be able to:
Understand big data and significance to enterprise
Acquire raw data using Hadoop HDFS and Oracle NoSQL Databa
se
Organize the collected data using MapReduce, Hive and Oracle B
ig Data Connectors
Analyze the data using Oracle connector and R Engine
Understand the big data governance and big data best practices
Learn writing Hadoop jobs in different languages
Programming Languages: Java, Python
High-Level Languages: Apache Pig, Hive
Chapter Structur
Ch 1 Ch 2
e Ch 3
Overview of Technical Oracle Big Data
Big Data Foundation Solution
Introduction

Raw Data Ch 4 Ch 5
Acquisition Using HDFS Using Oracle
NoSQL DB

Collected Data
Organization Ch 6 Ch 7
Using Hadoop Using Hive and Enterprise Big
MapReduce Pig Data Strategy

Data Analysis

Ch 8 Ch 9 Ch 10 Ch 11 Ch 12
Fundamental R Using ORCH & Integration Governance Best
ORE Practices
Course Materials
Each student computer includes:
Oracle BigDataLite 4.0 VM
Class Notes (slides)
Lab Activity Guide
Practice files
Overview of Big Data
Hottest technical topic since 2010
Data grows extremely rapidly
Examples of big data sources:
stream data, social networks such as face
book, twitter, smartphone location-based
serivces, web server logs, blogs, data fro
m sensors
What Is Big Data?
Big Data is defined as voluminous, unstructured data from
many different sources, such as:
Social networks
Banking and financial services
E-commerce services
Web-centric services
Internet search indexes
Scientific searches
Document searches
Medical records
Web logs
And so on
Big Data Definition
No single standard definition

Big Data is data whose scale, diversity, an


d complexity require new architecture, tech
niques, algorithms, and analytics to manag
e it and extract value and hidden knowledg
e from it
Big Data Challenge
Why does data become big data?
With enormous amount of unstructured data, it is harder to d
erive value from.
Need for new processing platform:
Twitter (over 7 TB/day)
Facebook (over 10 TB/day)
Google (over 20 PB/day)
Challenges:
Use the raw data to gain more customers
Enhance the user experience
Improve profitability
Heres a problem. If only I could capture and analyze the dat
a, then I could solve the problem.
Hypothesis?
Business needs to combine unstructured
data with structured data to make more c
ompetitive decisions.
When the big data solution mechanism is
identified appropriately and implemented
in an organization, the business value will
eventually grow.
Importance of Big Data to busin
ess
86% of interviewed executives considered unstructured data to be a
n important part of their business
The most important goal of using Big Data is to find hidden value
by the intelligent filtering of low-density, high-volume data.
To apply Big Data techniques to improve the business, you shoul
d understand how to:
Filter web logs to understand E-commerce behavior
Derive sentiment from social media and customers
Support interactions and understand statistical correlation meth
ods and their relevance for the business sector
There should be a strong business-driven context to ensure that
your company is on the right track.
Use Cases of Big Data
Real world Use Cases
Clickstream analysis
Sentiment analysis
Recommendation engines
Ad Targeting
Search Quality
Big Data Analyst
A big data analyst needs to be able to support the business and m
anagement with clear and insightful analyses on the data at hand.
This includes
data mining skills (including data auditing, aggregation, validation and rec
onciliation),
advanced modelling techniques,
Testing, creating and explaining results in clear and concise reports.
The big data analyst should
have experience with real-time analytics and business intelligent platform
s
be able to work with SQL databases
and several programming languages
and statistical software packages such as R, Java, MatLab or SPSS.
At least basic knowledge of working with Hadoop and MapReduce
Using scripting languages
Big Data Scientist
The sexiest job in the 21st century
Skills needed:
statistical,
mathematical,
predictive modelling
business strategy skills
be able to communicate their findings, orally and visually
have a set of ethical responsibilities
programming languages such as Python, R, Java, Ruby, Cloju
re, Matlab, Pig or SQL
understanding of Hadoop, Hive and/or MapReduce
a Masters Degree or even PhD
Big Data, Big Money
Big Data Salaries:
https://datajobs.com/big-data-salary
http://diginomica.com/2014/04/23/data-sci
entist-debate-salary-data-definitions
/
http://
www.datasciencecentral.com/profiles/blogs
/data-scientists-making-300-000-a-year-wa
ll-street-journal
Lab Environment
Oracle VirtualBox
a cross-platform virtualization application
for example, you can run Windows and Linux on your
Mac, run Windows Server 2008 on your Linux server, r
un Linux on your Windows PC, and so on,
Running multiple operating systems simultaneous
ly
Easier software installations
Infrastructure consolidation: instead of running ma
ny physical computers only partially used, one can pac
k many virtual machines onto a few powerful hosts an
d balance the loads between them
Terminologies
Host operating system (host OS)-- the oper
ating system of the physical computer on
which VirtualBox was installed.
Guest operating system (guest OS) -- opera
ting system that is running inside the virtu
al machine
Virtual machine (VM) -- the special environ
ment that VirtualBox creates for your guest
operating system while it is running
Starting VirtualBox
Double-click on the package file with an e
xtension .vbox-extpack. Or
In "Programs" menu, click on the item in t
he "VirtualBox" group.
VirtualBox Manager
This window is called the "VirtualBox Ma
nager". On the left, you can see a pane th
at will later list all your virtual machines. S
ince you have not created any, the list is e
mpty. A row of buttons above it allows yo
u to create new VMs and work on existing
VMs, once you have some. The pane on th
e right displays the properties of the virtu
al machine currently selected,
Running your virtual machin
e
Double-click on its entry in the list within the
Manager window or
Select its entry in the list in the Manager windo
w it and press the "Start" button at the top or
Navigate to the "VirtualBox VMs" folder in you
r system user's home directory, find the subdi
rectory of the machine you want to start and d
ouble-click on the machine settings file (with a
.vbox file extension).
Starting the BigDataLite VM
Double click the icon Oracle VM Virtualbo
x on the desktop
In the Virtualbox manager window, select
bigdatalite 4.1 and click start
Click user Oracle and type password welc
ome1
Click Login
Shutdown VM
Click System Menu and select Shut Down
Saving the state of the machi
ne
When click on the "Close" button of your
virtual machine window, you will see
Options
Save the machine state: VirtualBox "freezes" the virtual m
achine by completely saving its state to your local disk. Whe
n you start the VM again later, you will find that the VM con
tinues exactly where it was left off. All your programs will st
ill be open, and your computer resumes operation. Saving t
he state of a virtual machine is thus in some ways similar to
suspending a laptop computer (e.g. by closing its lid).
Send the shutdown signal. This will send an ACPI shutdo
wn signal to the virtual machine, which has the same effect
as if you click shutdown on a real computer.
Power off the machine: VirtualBox stops running the virtu
al machine, but without saving its state. Not Recommended
Importing and exporting virtual mac
hines
They can come in several files, as one or several disk images, typi
cally in the widely-used VMDK format (see
Section 5.2, Disk image files (VDI, VMDK, VHD, HDD) ) and a textu
al description file in an XML dialect with an .ovf extension. These
files must then reside in the same directory for VirtualBox to be a
ble to import them.
Alternatively, the above files can be packed together into a single
archive file, typically with an .ova extension.

To import an appliance in one of the above formats,


simply double-click on the OVF/OVA file.
Alternatively, select "File" -> "Import appliance" from the Manage
r window.

Anda mungkin juga menyukai