Shivnath Babu
Lifecycle of a MapReduce Job
Map function
Reduce function
Map function
Reduce function
Fault tolerance is of
high priority in the
MapReduce framework
HDFS Architecture
Lifecycle of a MapReduce Job
Time
1D projection for
io.sort.factor=500
Automatic Optimization? (Not yet in Hadoop)
Shuffle
Map Map Map Reduce Reduce
Wave 1 Wave 2 Wave 3 Wave 1 Wave 2
What if
#reduces
increased
to 9?