Anda di halaman 1dari 1

Assignment Week 1

Lets say you have 100 TB of data to store and to run MapReduce on this amount of data.
Configuration of datanodes

8 GB RAM
10 TB HDD
100 MB/s read-write speed

Let say replication factor is 3 and block size is 64 mb.


By simple calculation you will need:
= Total amount of Data * Replication Factor / Disk space of 1 datanode
= 100 * 3 / 10
= 30 datanodes
Now lets say you need to run MapReduce program on this 100 TB of data.
Reading 100 TB data at a speed of 100 MB/s using only 1 node will take:
= Total data / Read-write speed
= 100 * 1024 * 1024 / 100
= 1048576 seconds
= 291.27 hours

With 30 data node you will be able to finish this job in


= 291.27 / 30
= 9.70 hours

-------------------Task for you------------------Q. How many datanodes you will need to complete MapReduce job in 5 minutes?

[Send your answer at hadoop@edureka.in]

Anda mungkin juga menyukai