Presented By :: M .Syamprasad
MTECH-CSE Student
12P66D5816
Topics Covered
What is Hadoop?
Why, Where, When?
Benefits of Hadoop
How Hadoop Works?
Hadoop Architecture
Hadoop Common
HDFS
Hadoop MapReduce
Installation &
Execution
Demo of installation
Hadoop Community
What is Hadoop?
Hadoop was created by Douglas Reed Cutting who named
,
Hadoop, Why?
Need to process 100TB datasets
On 1 node:
scanning @ 50MB/s = 23 days
Benefits of Hadoop
Hadoop is designed to run on cheap commodity
hardware
Hadoop Architecture
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing
Hadoop Consists::
Data Flow
Web
Servers
Scribe
Servers
Network
Storage
Oracle
RAC
Hadoop Cluster
MySQ
L
Hadoop Common
HDFS
HDFS Architecture
Hadoop MapReduce
The Map-Reduce programming model
Framework for distributed processing of large data sets
Pluggable user code runs in generic framework
Common design pattern in data processing
input | map | shuffle | reduce | output
Natural for:
Log processing
Web search indexing
Ad-hoc queries
MapReduce Implementation
1.Input files split (M splits)
2.Assign Master & Workers
3.Map tasks
4.Writing intermediate data to
disk (R regions)
5.Intermediate data read &
sort
6.Reduce tasks
7.Return
MapReduce Cluster
Implementation
Input files
M map
tasks
Intermediate
files
R reduce
tasks
split 0
split 1
split 2
split 3
split 4
Several map or
reduce tasks can
run on a single
computer
Output
files
Output 0
Output 1
Examples of MapReduceWord
Count
Hadoop Community
Hadoop Users
Adobe
Alibaba
Amazon
AOL
Facebook
Google
IBM
Major Contributor
Apache
Cloudera
Yahoo
References