Graph mining
www.kellytechno.com
HDFS
Hadoop YARN
Hadoop
MapReduce
www.kellytechno.com
www.kellytechno.com
CRAY-1
www.kellytechno.com
ADVANCES IN CPUS
Moores Law
The
www.kellytechno.com
SINGLE-CORE LIMITATION
www.kellytechno.com
DISTRIBUTED SYSTEMS
Allows developers to
use multiple
machines for a
single task
www.kellytechno.com
data exchanges
Managing a finite bandwidth
Controlling computation timing is complicated
www.kellytechno.com
www.kellytechno.com
Facebook
500
Yahoo
Over
170 PB
eBay
Over
TB per day
6 PB
Must be scalable
www.kellytechno.com
PARTIAL FAILURES
www.kellytechno.com
COMPONENT RECOVERY
www.kellytechno.com
SCALABILITY
system failure
www.kellytechno.com
HADOOP
HIGH-LEVEL OVERVIEW
FAULT TOLERANCE
Failures are detected by the master program
which reassigns the work to a different node
Restarting a task does not affect the nodes
working on other portions of the data
If a failed node restarts, it is added back to the
system and assigned new tasks
The master can redundantly execute the same
task to avoid slow running nodes
www.kellytechno.com
www.kellytechno.com
OVERVIEW
www.kellytechno.com
Different
www.kellytechno.com
DATA REPLICATION
www.kellytechno.com
DATA RETRIEVAL
Then
MAPREDUCE
www.kellytechno.com
MAPREDUCE OVERVIEW
MAPREDUCE FEATURES
Fault-Tolerance
www.kellytechno.com
THE MAPPER
www.kellytechno.com
www.kellytechno.com
THE REDUCER
www.kellytechno.com
ANATOMY OF A CLUSTER
www.kellytechno.com
OVERVIEW
NameNode
Secondary NameNode
JobTracker
DataNode
TaskTracker
THE NAMENODE
JobTracker
Determines
TaskTracker
Keeps
HADOOP ECOSYSTEM
www.kellytechno.com
www.kellytechno.com
OTHER TOOLS
Hive
Pig
HBase
Cascading
Flume
Thank You
Presented By