Selamat datang di Scribd!

Lewati carousel

How Hadoop Works

Diunggah oleh

Matthew Reach

0% menganggap dokumen ini bermanfaat (0 suara)

13 tayangan26 halaman

Hadoop how it works

Hak Cipta

Format Tersedia

PDF, TXT atau baca online dari Scribd

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Laporkan Dokumen Ini

Hadoop how it works

Hak Cipta:

Format Tersedia

Unduh sebagai PDF, TXT atau baca online dari Scribd

Tandai sebagai konten tidak pantas

0% menganggap dokumen ini bermanfaat (0 suara)

13 tayangan26 halaman

How Hadoop Works

Diunggah oleh

Matthew Reach

Hadoop how it works

Hak Cipta:

Format Tersedia

Unduh sebagai PDF, TXT atau baca online dari Scribd

Tandai sebagai konten tidak pantas

Lompat ke Halaman

Anda di halaman 1dari 26

Cari di dalam dokumen

CPS216: Advanced Database Systems

(Data-intensive Computing Systems)

How MapReduce Works (in Hadoop)

Shivnath Babu
Lifecycle of a MapReduce Job

Map function

Reduce function

Run this program as a

MapReduce job
Lifecycle of a MapReduce Job

Map function

Reduce function

Run this program as a

MapReduce job
Lifecycle of a MapReduce Job
Time

Input Map Map Reduce Reduce

Splits Wave 1 Wave 2 Wave 1 Wave 2
Components in a Hadoop MR Workflow

Next few slides are from: http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hadoop-pig

Job Submission
Initialization
Scheduling
Execution
Map Task
Sort Buffer
Reduce Tasks
Quick Overview of Other Topics (Will
Revisit Them Later in the Course)
Dealing with failures
Hadoop Distributed FileSystem (HDFS)
Optimizing a MapReduce job
Dealing with Failures and Slow Tasks
What to do when a task fails?
Try again (retries possible because of idempotence)
Try again somewhere else
Report failure
What about slow tasks: stragglers
Run another version of the same task in parallel. Take
results from the one that finishes first
What are the pros and cons of this approach?

Fault tolerance is of
high priority in the
MapReduce framework
HDFS Architecture
Lifecycle of a MapReduce Job
Time

Input Map Map Reduce Reduce

Splits Wave 1 Wave 2 Wave 1 Wave 2

How are the number of splits, number of map and reduce

tasks, memory allocation to tasks, etc., determined?
Job Configuration Parameters
190+ parameters in
Hadoop
Set manually or defaults
are used
Hadoop Job Configuration Parameters

Image source: http://www.jaso.co.kr/265

Tuning Hadoop Job Conf. Parameters
Do their settings impact performance?
What are ways to set these parameters?
Defaults -- are they good enough?
Best practices -- the best setting can depend on data, job, and
cluster properties
Automatic setting
Experimental Setting
Hadoop cluster on 1 master + 16 workers
Each node:
2GHz AMD processor, 1.8GB RAM, 30GB local disk
Relatively ill-provisioned!
Xen VM running Debian Linux
Max 4 concurrent maps & 2 reduces
Maximum map wave size = 16x4 = 64
Maximum reduce wave size = 16x2 = 32
Not all users can run large Hadoop clusters:
Can Hadoop be made competitive in the 10-25 node, multi GB
to TB data size range?
Parameters Varied in Experiments
Hadoop 50GB TeraSort

Varying number of reduce tasks, number of concurrent sorted

streams for merging, and fraction of map-side sort buffer
devoted to metadata storage
Hadoop 50GB TeraSort

Varying number of reduce tasks for different values

of the fraction of map-side sort buffer devoted to
metadata storage (with io.sort.factor = 500)
Hadoop 50GB TeraSort

Varying number of reduce tasks for different values of

io.sort.factor (io.sort.record.percent = 0.05, default)
Hadoop 75GB TeraSort

1D projection for
io.sort.factor=500
Automatic Optimization? (Not yet in Hadoop)

Shuffle
Map Map Map Reduce Reduce
Wave 1 Wave 2 Wave 3 Wave 1 Wave 2

What if
#reduces
increased
to 9?

Map Map Map Reduce Reduce Reduce

Wave 1 Wave 2 Wave 3 Wave 1 Wave 2 Wave 3

Anda mungkin juga menyukai

Fear: Trump in the White House
Dari Everand
Fear: Trump in the White House
Bob Woodward
Penilaian: 3.5 dari 5 bintang
3.5/5 (738)
A Man Called Ove: A Novel
Dari Everand
A Man Called Ove: A Novel
Fredrik Backman
Penilaian: 4.5 dari 5 bintang
4.5/5 (4610)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dari Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Penilaian: 3.5 dari 5 bintang
3.5/5 (231)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Dari Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Penilaian: 4.5 dari 5 bintang
4.5/5 (121)
Grit: The Power of Passion and Perseverance
Dari Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Penilaian: 4 dari 5 bintang
4/5 (588)
Never Split the Difference: Negotiating As If Your Life Depended On It
Dari Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Penilaian: 4.5 dari 5 bintang
4.5/5 (838)
The Little Book of Hygge: Danish Secrets to Happy Living
Dari Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Penilaian: 3.5 dari 5 bintang
3.5/5 (400)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Dari Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Penilaian: 4.5 dari 5 bintang
4.5/5 (266)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Dari Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Penilaian: 4 dari 5 bintang
4/5 (5794)
Rise of ISIS: A Threat We Can't Ignore
Dari Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Penilaian: 3.5 dari 5 bintang
3.5/5 (137)
Her Body and Other Parties: Stories
Dari Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Penilaian: 4 dari 5 bintang
4/5 (821)
A Tree Grows in Brooklyn
Dari Everand
A Tree Grows in Brooklyn
Betty Smith
Penilaian: 4.5 dari 5 bintang
4.5/5 (1929)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Dari Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Penilaian: 4 dari 5 bintang
4/5 (1090)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Dari Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Penilaian: 3.5 dari 5 bintang
3.5/5 (2259)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Dari Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Penilaian: 4.5 dari 5 bintang
4.5/5 (345)
Shoe Dog: A Memoir by the Creator of Nike
Dari Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Penilaian: 4.5 dari 5 bintang
4.5/5 (537)
The Emperor of All Maladies: A Biography of Cancer
Dari Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Penilaian: 4.5 dari 5 bintang
4.5/5 (271)
The Glass Castle: A Memoir
Dari Everand
The Glass Castle: A Memoir
Jeannette Walls
Penilaian: 4.5 dari 5 bintang
4.5/5 (1713)
Team of Rivals: The Political Genius of Abraham Lincoln
Dari Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Penilaian: 4.5 dari 5 bintang
4.5/5 (234)
John Adams
Dari Everand
John Adams
David McCullough
Penilaian: 4.5 dari 5 bintang
4.5/5 (2409)
Principles: Life and Work
Dari Everand
Principles: Life and Work
Ray Dalio
Penilaian: 4 dari 5 bintang
4/5 (599)
Yes Please
Dari Everand
Yes Please
Amy Poehler
Penilaian: 4 dari 5 bintang
4/5 (1891)
The Light Between Oceans: A Novel
Dari Everand
The Light Between Oceans: A Novel
M.L. Stedman
Penilaian: 4.5 dari 5 bintang
4.5/5 (789)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Dari Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Penilaian: 4 dari 5 bintang
4/5 (895)
Scale Fingering Chart
Dokumen1 halaman
Scale Fingering Chart
Matthew Reach
Belum ada peringkat
Wolf Hall: A Novel
Dari Everand
Wolf Hall: A Novel
Hilary Mantel
Penilaian: 4 dari 5 bintang
4/5 (3811)
The Perks of Being a Wallflower
Dari Everand
The Perks of Being a Wallflower
Stephen Chbosky
Penilaian: 4.5 dari 5 bintang
4.5/5 (2104)
The Woman in Cabin 10
Dari Everand
The Woman in Cabin 10
Ruth Ware
Penilaian: 3.5 dari 5 bintang
3.5/5 (2322)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Dari Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Penilaian: 4.5 dari 5 bintang
4.5/5 (474)
Sing, Unburied, Sing: A Novel
Dari Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Penilaian: 4 dari 5 bintang
4/5 (1103)
The Art of Racing in the Rain: A Novel
Dari Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Penilaian: 4 dari 5 bintang
4/5 (4200)
Angela's Ashes: A Memoir
Dari Everand
Angela's Ashes: A Memoir
Frank McCourt
Penilaian: 4.5 dari 5 bintang
4.5/5 (440)
The Constant Gardener: A Novel
Dari Everand
The Constant Gardener: A Novel
John le Carré
Penilaian: 3.5 dari 5 bintang
3.5/5 (104)
The Outsider: A Novel
Dari Everand
The Outsider: A Novel
Stephen King
Penilaian: 4 dari 5 bintang
4/5 (1839)
On Fire: The (Burning) Case for a Green New Deal
Dari Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Penilaian: 4 dari 5 bintang
4/5 (74)
Brooklyn: A Novel
Dari Everand
Brooklyn: A Novel
Colm Tóibín
Penilaian: 3.5 dari 5 bintang
3.5/5 (1937)
EXCEL - How To Write Perfect VLOOKUP and INDEX and MATCH Formulas
Dokumen29 halaman
EXCEL - How To Write Perfect VLOOKUP and INDEX and MATCH Formulas
gerrydimayuga
100% (1)
The Yellow House: A Memoir (2019 National Book Award Winner)
Dari Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Penilaian: 4 dari 5 bintang
4/5 (98)
The Unwinding: An Inner History of the New America
Dari Everand
The Unwinding: An Inner History of the New America
George Packer
Penilaian: 4 dari 5 bintang
4/5 (45)
Little Women
Dari Everand
Little Women
Louisa May Alcott
Penilaian: 4 dari 5 bintang
4/5 (104)
Reverse Engineer An Excel Spreadsheet Into CA ERwin Data Modeler
Dokumen7 halaman
Reverse Engineer An Excel Spreadsheet Into CA ERwin Data Modeler
Matthew Reach
0% (1)
Solution Manual For Investment Science by David Luenberger
Dokumen94 halaman
Solution Manual For Investment Science by David Luenberger
koenajax
96% (28)
Bad Feminist: Essays
Dari Everand
Bad Feminist: Essays
Roxane Gay
Penilaian: 4 dari 5 bintang
4/5 (1016)
IBM Insurance IAA Warehouse GIMv85
Dokumen23 halaman
IBM Insurance IAA Warehouse GIMv85
Matthew Reach
Belum ada peringkat
Steve Jobs
Dari Everand
Steve Jobs
Walter Isaacson
Penilaian: 4.5 dari 5 bintang
4.5/5 (806)
Manhattan Beach: A Novel
Dari Everand
Manhattan Beach: A Novel
Jennifer Egan
Penilaian: 3.5 dari 5 bintang
3.5/5 (792)
Hazardous Area Classification
Dokumen36 halaman
Hazardous Area Classification
venkeeku
100% (1)
Astm A394 2008 PDF
Dokumen6 halaman
Astm A394 2008 PDF
Javier Ricardo Romero Bohorquez
Belum ada peringkat
Sample Test - Exam 000-470 - IBM Unica Campaign V8.5
Dokumen3 halaman
Sample Test - Exam 000-470 - IBM Unica Campaign V8.5
Matthew Reach
Belum ada peringkat
Living and Working: in Sweden
Dokumen24 halaman
Living and Working: in Sweden
Matthew Reach
Belum ada peringkat
Hadoop Interview Questions
Dokumen1 halaman
Hadoop Interview Questions
Matthew Reach
Belum ada peringkat
EB2406 - Teradata PDF
Dokumen18 halaman
EB2406 - Teradata PDF
Matthew Reach
Belum ada peringkat
Idoc - Pub - Data Model Flexcube Data Model PDF
Dokumen353 halaman
Idoc - Pub - Data Model Flexcube Data Model PDF
Matthew Reach
Belum ada peringkat
Pyspark-1 6 1
Dokumen32 halaman
Pyspark-1 6 1
Matthew Reach
Belum ada peringkat
How To Setup LDAP in ERwin Web Portal
Dokumen3 halaman
How To Setup LDAP in ERwin Web Portal
Matthew Reach
Belum ada peringkat
Microsoft Data Warehouse Design Considerations
Dokumen36 halaman
Microsoft Data Warehouse Design Considerations
Matthew Reach
Belum ada peringkat
11 - Chapter 3 PDF
Dokumen123 halaman
11 - Chapter 3 PDF
Matthew Reach
Belum ada peringkat
A Loan-Level Data Collection For Buy-To-Let Lending: Phase 2
Dokumen23 halaman
A Loan-Level Data Collection For Buy-To-Let Lending: Phase 2
Matthew Reach
Belum ada peringkat
PharmaSUG 2016 SS05
Dokumen19 halaman
PharmaSUG 2016 SS05
Matthew Reach
Belum ada peringkat
Temporal Query Processing in Teradata
Dokumen6 halaman
Temporal Query Processing in Teradata
Matthew Reach
Belum ada peringkat
6545 Us35000
Dokumen4 halaman
6545 Us35000
Rafael Barros
Belum ada peringkat
Arithmetic Unit: Dr. Sowmya BJ
Dokumen146 halaman
Arithmetic Unit: Dr. Sowmya BJ
tinni09112003
Belum ada peringkat
NumaticsFRLFlexiblokR072010 Es
Dokumen40 halaman
NumaticsFRLFlexiblokR072010 Es
Gabriel San Martin Rifo
Belum ada peringkat
Beam Design: Background
Dokumen2 halaman
Beam Design: Background
olomizana
Belum ada peringkat
Physics Gcse Coursework Resistance of A Wire
Dokumen8 halaman
Physics Gcse Coursework Resistance of A Wire
f5dq3ch5
100% (2)
W.R. Klemm (Auth.) - Atoms of Mind - The - Ghost in The Machine - Materializes-Springer Netherlands (2011)
Dokumen371 halaman
W.R. Klemm (Auth.) - Atoms of Mind - The - Ghost in The Machine - Materializes-Springer Netherlands (2011)
El equipo de Genesis Project
Belum ada peringkat
SOFARSOLAR ModBus-RTU Communication Protocol
Dokumen22 halaman
SOFARSOLAR ModBus-RTU Communication Protocol
Вячеслав Ларионов
Belum ada peringkat
CH 12 Review Solutions PDF
Dokumen11 halaman
CH 12 Review Solutions PDF
Oyinkansola Osibodu
Belum ada peringkat
Avh-X8550bt Operating Manual Eng-Esp-Por
Dokumen7 halaman
Avh-X8550bt Operating Manual Eng-Esp-Por
Rannie Ison
Belum ada peringkat
Gas Turbine Compressor Washing
Dokumen8 halaman
Gas Turbine Compressor Washing
wolf_ns
100% (1)
Atom
Dokumen15 halaman
Atom
dewi murtasima
Belum ada peringkat
Low Temperature Plastics - Ensinger
Dokumen4 halaman
Low Temperature Plastics - Ensinger
Anonymous r3MoX2ZMT
Belum ada peringkat
Pavement Materials - Aggregates
Dokumen14 halaman
Pavement Materials - Aggregates
tombasingh
Belum ada peringkat
Measures of Central Tendency: Mean Median Mode
Dokumen20 halaman
Measures of Central Tendency: Mean Median Mode
Ria Bariso
Belum ada peringkat
HPC168 Passenger Counter
Dokumen9 halaman
HPC168 Passenger Counter
Rommel Gómez
Belum ada peringkat
CISCO Router Software - Configuration PDF
Dokumen408 halaman
CISCO Router Software - Configuration PDF
asalihovic
Belum ada peringkat
Surface Mount Multilayer Varistor: SC0805ML - SC2220ML Series
Dokumen8 halaman
Surface Mount Multilayer Varistor: SC0805ML - SC2220ML Series
Taleb
Belum ada peringkat
Test Bank For Chemistry An Atoms Focused Approach 3rd Edition Thomas R Gilbert Rein V Kirss Stacey Lowery Bretz Natalie Foster
Dokumen38 halaman
Test Bank For Chemistry An Atoms Focused Approach 3rd Edition Thomas R Gilbert Rein V Kirss Stacey Lowery Bretz Natalie Foster
auntyprosperim1ru
100% (10)
20CB PDF
Dokumen59 halaman
20CB PDF
Chidiebere Samuel Okogwu
Belum ada peringkat
Valence Bond Theory VBT
Dokumen32 halaman
Valence Bond Theory VBT
Asif Ahnaf
Belum ada peringkat
Calculation Eurocode 2
Dokumen4 halaman
Calculation Eurocode 2
rammiris
Belum ada peringkat
Hall 2005 Napaeina
Dokumen10 halaman
Hall 2005 Napaeina
Kellyta Rodriguez
Belum ada peringkat
DIO 1000 v1.1 - EN Op Manual
Dokumen25 halaman
DIO 1000 v1.1 - EN Op Manual
Miguel Ángel Pérez Fuentes
Belum ada peringkat
Product Bulletin N 6: Bearing Assemblies - Shaft Variations
Dokumen1 halaman
Product Bulletin N 6: Bearing Assemblies - Shaft Variations
RANAIVOARIMANANA
Belum ada peringkat
ECE Experiment
Dokumen13 halaman
ECE Experiment
asm9809
0% (1)
Auto-Tune Pid Temperature & Timer General Specifications: N L1 L2 L3
Dokumen4 halaman
Auto-Tune Pid Temperature & Timer General Specifications: N L1 L2 L3
sharawany 20
Belum ada peringkat