Cloud Builders Xeon Apache Hadoop Guide

Intel Cloud Builders Guide to Cloud Design
and Deployment on Intel Platforms

Apache* Hadoop*
Audience and Purpose
This reference architecture is for companies who are looking to build their own cloud
computing infrastructure, including both enterprise IT organizations and cloud service
providers or cloud hosting providers. The decision to use a cloud for the delivery of
IT services is best done by starting with the knowledge and experience gained from
previous work. This reference architecture gathers into one place the essentials of
a Apache* Hadoop* cluster build out complete with benchmarking using TeraSort
workload. This paper defnes easy to use steps to replicate the deployment at your data
center lab environment. The installation is based on Intel-powered servers and creates
a multi node, optimized Hadoop environment. The reference architecture contains
details on the Hadoop topology, hardware and software deployed, installation and
confguration steps, and tests for real-world use cases that should signifcantly reduce
the learning curve for building and operating your frst Hadoop infrastructure.
It is not expected that this paper can be used as-is. For example, adapting to an
existing network and identifying specifc management requirements are out of scope
for this paper. Therefore, it is expected that the user of this paper will make signifcant
adjustments as required to the design presented in order to meet their specifc
requirements of their own data center or lab environment. This paper also assumes that
the reader has basic knowledge of computing infrastructure components and services.
Intermediate knowledge of Linux* operating system, Python*, Hadoop framework and
basic system administration skills is assumed.
February 2012
Intel Cloud Builders Guide
Intel Xeon Processor-based Servers
Apache* Hadoop*
Intel Xeon Processor 5600 Series
2
Intel Cloud Builders Guide: Apache* Hadoop*
Table of Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Hadoop* Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Hadoop System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Operation of a Hadoop Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
TeraSort Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
TeraSort Workfow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Test Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Intel Benchmark Install and Test Tool (Intel BITT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Intel BITT Benefts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Confguring the Setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Running TeraSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33
Executive Summary
Map reduce technology is gaining
popularity among enterprises for a
variety of large-scale data intensive jobs.
Map reduce based on Apache* Hadoop*
is rapidly emerging as a technology
preferred for big data processing and
management. Enterprises are deploying
commodity standard server clusters
and using business intelligence tools
along with Apache Hadoop to obtain
high performing solutions for their large
scale data processing requirements.
Motivation to deploy Hadoop comes from
the fact that enterprises are gathering
huge unstructured data sets generated
by their business processes, which
enterprises are looking to exploit to get
the most value out of this data to help
them in the decision making process.
Hadoop infrastructure moves data closer
to compute to achieve high processing
throughput. In this paper we tried to
create a small commodity server cluster
based on an Apache Hadoop distribution
and ran sort benchmark to get data on
how fast the cluster can process data.
This reference architecture will give
an understanding on how to set up the
cluster, tune parameters, and run sort
benchmark. This reference architecture
provides a blue print for building a
cluster with Intel Xeon processor based
standard server platforms and the open
source Apache Hadoop distribution. The
paper further describes parameters for
tuning and execution of sort benchmark
to measure performance.
Hadoop* Overview
Apache Hadoop is a framework for
running applications on large cluster
built using standard hardware. The
Hadoop framework transparently
provides applications both reliability
and data motion. Hadoop implements
a computational paradigm named
MapReduce, where the application is
divided into many small fragments of
work, each of which may be executed or
re-executed on any node in the cluster.
In addition, it provides a distributed fle
system that stores data on the compute
nodes, providing very high aggregate
bandwidth across the cluster. Both
MapReduce and the Hadoop Distributed
File System (HDFS) are designed so that
node failures are automatically tolerated
by the framework.
Hadoop framework consists of three
major components:
Common: Hadoop Common is a set
of utilities that support the Hadoop
subprojects. Hadoop Common includes
FileSystem, RPC, and serialization
libraries.
HDFS: The Hadoop Distributed File
System (HDFS) is a distributed file
system designed to run on commodity
hardware. It has many similarities
with existing distributed file systems.
However, the differences from other
distributed file systems are significant.
HDFS is highly fault-tolerant and
is designed to be deployed on low-
cost hardware. HDFS provides high
throughput access to application data
and is suitable for applications that
have large data sets. HDFS can stream
file system data.
MapReduce: MapReduce was first
developed by Google to process large
datasets. MapReduce has two functions,
map and reduce, and a framework for
running a large number of instances
of these programs on commodity
hardware. The map function reads
a set of records from an input file,
processes these records, and outputs
a set of intermediate records. As part
of the map function, a split function
distributes the intermediate records
across many buckets using a hash
function. The reduce function then
processes the intermediate records.
The MapReduce Framework consists
of a single master JobTracker and one
slave TaskTracker per cluster node. The
master is responsible for scheduling the
jobs' component tasks on the slaves,
monitoring them, and re-executing the
failed tasks. The slaves execute the
tasks as directed by the master.
Figure 1: Hadoop* stack
3
Figure 2: Hadoop* deployment on standard server nodes
Hadoop* System Architecture
Hadoop framework works on the principle
of "moving compute closer to the data."
Figure 2 shows typical deployment of
Hadoop framework on multiple standard
server nodes. The computation occurs
on the same node where data resides,
which enables Hadoop to deliver better
performance compared to storing
data on the network. A combination of
standard server platforms and Hadoop
infrastructure provide a cost effcient
and high performance platform for data-
parallel applications. Each Hadoop cluster
has one Master Node and multiple slave
nodes. The Master node runs NameNode
and JobTracker functions, coordinating
with slave nodes to get the job fed to the
cluster completed. The SlaveNodes run
TaskTracker, HDFS to store the data, and
have Map and Reduce functions which
perform the data computations.
4
Operation of a Hadoop* Cluster
Figure 3 shows the operation of a
Hadoop cluster. The client submits the
job to the Master node which acts as an
orchestrator with the Slave nodes to
complete the job. The JobTracker on the
Master node is responsible for controlling
the MapReduce job. The slaves run
TaskTracker which keeps track of the
MapReduce job, reporting the job status
to the JobTracker on frequent intervals. In
an event of a task failure, the JobTracker
reschedules the task on the same slave
node or a different slave node. HDFS
is a location aware or rack aware fle
system which primarily manages data in a
Hadoop cluster. HDFS replicates the data
on various nodes in the cluster to attain
data reliability; however, HDFS has a single
point of failure in NameNode function.
If the NameNode fails the fle system
and data become inaccessible. Since the
JobTracker assigns the data to slave nodes,
JobTracker is aware of the data location
and effciently schedules the task where
the data is residing, thus decreasing the
need to move data from one node to other
and saving network bandwidth. Once
the map function is complete, the data is
transferred to different node to perform
reduce function. MapReduce framework
provides an effcient way to scale the size
of the cluster by adopting modular scale-
out strategy. The nodes are scaled out
by adding one or more nodes with HDFS
and MapReduce functions supporting new
nodes as they are added.
Figure 3: Operation of Hadoop* cluster
5
Cluster hardware setup:
Total 17 nodes in the cluster. One
Master node and 16 Slave nodes.
Data Network: Arista 7124 switch
connected to Intel Ethernet Server
Adapter X520-DA2 dual 10GbE NIC on
every node.
Each server has an internal private
Intel dual 1GbE NIC connected to a
top-of-rack switch that is used for
management tasks.
Each node has a disk enclosure
populated with SATA II 7.2K, 2TB hard
disk drives for a total of 24TBs of raw
storage per hard disk enclosure.
Dual socket Intel 5520 Chipset
platform.
Two Intel Xeon processor X5680 at
3.33GHz, 12MB cache.
48GB 1333MHz DDR3 memory
Red Hat Enterprise Linux* 6.0 (RHEL
6.0)(Kernel: 2.6.32-71.el6..x86_64)
Hadoop* Framework v0.20.1
Figure 4: Cluster hardware setup
6
TeraSort Workload
TeraSort is a popular Hadoop
benchmarking workload. The 1TB limit is
not a hard-set limit since TeraSort allows
the user to sort any size of dataset by
changing various parameters. TeraSort
benchmark tests HDFS and MapReduce
functions in the Hadoop cluster. TeraSort
is part of the Hadoop framework and
is part of the standard Apache Hadoop
installation package.
TeraSort is widely used to benchmark and
tune large Hadoop clusters with hundreds
of nodes.
TeraSort works in two steps:
TeraGen: This generates random data
based on the dataset size set by the
user. This dataset is used as input data
for the sort benchmark.
TeraSort: TeraSort sorts the input data
generated by TeraGen and stores the
output data on HDFS.
An optional third step, called TeraValidate,
allows validation of the sorted data. This
paper does not discuss this optional third
step.
TeraSort Workflow
Figure 5 shows the workfow of the
TeraSort workload tested on our cluster.
The fow chart depicts the start of the
workload at one control node with one
master node kick starting the job and 16
slave nodes dividing 8192 map tasks. Once
the map phase is complete, the cluster
starts the reduce phase with 243 tasks.
When the reduce phase is completed, the
data output is stored on the fle system.
Test Methodology
To run the workload we used an Intel
Benchmark Install and Test Tool (Intel
BITT. The workload was scripted to kick-
start the job on the cluster, run TeraGen
to generate the test data, and run the
TeraSort task to sort the generated
data. The scrip also kicks off a series of
counters on the slave nodes to gather
performance metrics on each of the
nodes. Key hardware metrics such as
processor utilization, network bandwidth
consumption, memory utilization, and
disk bandwidth consumption is captured
on each node at 30 second intervals.
Once the job is complete, the counters
are stopped on all slave nodes and the
log fles containing performance data are
copied to the master node for calculating
utilization of the cluster. This data is
plotted into graphs using gnuplot and
presented for further analysis. Also we
noted the time taken to complete the
job taken from the Hadoop management
user interface. The lower the time
measurement the better the performance.

Figure 5: TeraSort workfow
7
Intel Benchmark Install and Test
Tool
Intel Benchmark Install and Test Tool (Intel
BITT) provides tools to install, confgure,
run, and analyze benchmark programs
on small test clusters. The installCli tool
is used to install tar fles on a cluster.
monCli is used to monitor performance
of the cluster nodes and provides options
to start monitoring, stop monitoring,
and generate CPU, disk I/O, memory,
and network performance plots for the
nodes and cluster. hadoopCli provides an
automated Hadoop test environment. The
Intel BITT templates enable confgurable
plot generation. Intel BITT command
scripts enable confgurable scripts to
control monitoring actions. Benchmark
confguration is implemented by using
XML fles. Confgurable properties include
the location of installation, monitoring
directories, monitoring sampling duration,
the list of the cluster nodes, and the list of
the tar fles that need to be installed. Intel
BITT is implemented by using Python* and
uses gnuplot to generate performance
plots. Intel BITT currently runs on Linux*.
Intel BITT Features
Intel Benchmark Install and Test Tool
provides the following tools:
installCli: Used to install a specified list
of tar files to a specified list of nodes
monCli: Used to monitor performance
metrics locally and/or remotely. It can
be used to monitor the performance of
a cluster. The tool currently supports
sar and iostat monitoring tools.
hadoopCli: Used to install, configure,
and test Hadoop clusters.
Intel BITT is implemented in an object
oriented fashion. It can be extended to
support other performance monitoring
tools such as vmstat and mpstat if it is
needed. The toolkit includes the following
building blocks:
XML parser: Parses the XML properties
including name, value, and description
fields. The install and monitor
configuration is defined by using XML
properties. Tool specific options are
passed through command line options.
cmd
conf
samples
scripts
templates
Log file parser: Log files in the form
of tables which contains rows and
columns are parsed and CSV files are
generated for each column. The column
items on each row are separated using
whitespace. The column header names
are used to create CSV file names.
Plot generator: gnuplot is used to plot
the contents of the CSV files by using
templates. The templates define the list
of CSV files that are used as inputs to
generate the plots. The templates also
define labels and titles of the plots.
Sar monitoring tool
Iostat monitoring tool
VTuneTM monitoring tool
Emon monitoring tool
installCli is used to install Intel BITT
monCli is used to monitor local or cluster
nodes
hadoopCli is implemented by using the
building blocks defined above and it is
used to create and test Hadoop clusters
Configuring the Setup
We installed RHEL 6.0 on all 17 nodes with the default confguration and confgured passphraseless SSH access between the nodes to
enable them to communicate without having to login with a password every time there is a transaction between them.
1. Install Intel BITT tar file
Cd
mkdir bitt
cp bitt-1.0.tar bitt
cd bitt/bitt-2.0
The following is the list of subdirectories under Intel BITT home:
8
2. Create a release directory under Intel BITT home to copy tar files.
mkdir p bitt/bitt-1.0/release
cp bitt-1.0.tar bitt/bitt-1.0/release
You can also download and copy the Hadoop tar fle to the release directory as well if you are planning to test Hadoop.
cp hadoop-0.21.0.tar.gz ~/bitt/bitt-1.0/release
3. Download jdk and create a tar file from the installed jdk tar.
For example:
mkdir jdk
cp jdk-6u23-linux-x64.bin jdk
cd jdk
chmod +x jdk-6u23-linux-x64.bin
./jdk-6u23-linux-x64.bin
rm jdk-6u23-linux-x64.bin
tar -cvf ~/bitt/bitt-1.0/release/jdk1.6.0_23.tar
4. Download gnuplot and create a tar file from the installed gnuplot tree.
For example:
mkdir myinstall
cp gnuplot-4.4.0-rc1.tar myinstall
cd myinstall/
tar -xvf gnuplot-4.4.0-rc1.tar
mkdir p install/ gnuplot-4.4.0
cd gnuplot-4.4.0-rc1
./confgure --prefx=/home/<user>/myinstall/install/ gnuplot-4.4.0
make
make install
cd ../install
tar -cvf ~/bitt/bitt-1.0/release/gnuplot-4.4.0.tar.
5. Download Python and create a tar file from the installed python tar for your platform.
For example:
mkdir myinstall
cp Python-3.1.3.tgz myinstall
cd myinstall/
tar -xvf Python-3.1.3.tgz
mkdir p install/ Python-3.1.3
cd Python-3.1.3
./confgure --prefx=/home/<user>/myinstall/install/ Python-3.1.3
make
make install
cd ../install
tar -cvf ~/bitt/bitt-1.0/release/ Python-3.1.3.tar .
9
6. Run TeraSort.
Run terasort.sh. You need to update the corresponding confguration fles as described below.
For example:
cd ~/bitt/bitt-1.0/conf
install gnuplot on your client system
install python on your client system
Make sure python3 and gnuplot are on your path on the client system
cd ~/bitt/bitt-1.0/scripts
./terasort.sh
7. Configuration file edits. All configuration files are found under ~/bitt/bitt-1.0/conf
a. hadoopNodeList: Configuration file which contains cluster nodes. Any addition or removal of nodes from the cluster should
register here to be recognized by the load generator tool.
b. hadoopTarList: Configuration file where the executable are installed.
../release/bitt-1.0.tar.gz
../release/Python-3.2.tar.gz
../release/jdk1.6.0_25.tar.gz
../release/hadoop-0.20.2.tar.gz
../release/gnuplot-4.4.3.tar.gz
node1.domain.com
node2.domain.com
node3.domain.com
node4.domain.com
.
.
node17.domain.com
10
c. hadoop-env.sh: Main Hadoop environment configuration file.
# Set Hadoop-specifc environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed confguration it is best to
# set JAVA_HOME in this fle, so that it is correctly defned on
# remote nodes.
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH=
# The maximum amount of heap to use, in MB. Default is 1000.
# export HADOOP_HEAPSIZE=2000
# Extra Java runtime options. Empty by default.
# export HADOOP_OPTS=-server
# Command specifc options appended to HADOOP_OPTS when specifed
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.
jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_JOBTRACKER_OPTS"
11
d. hadoopCloudConf.xml: Custom XML configuration file used to define key parameters on how the test is executed and where
the data is stored.
<?xml version="1.0" encoding="UTF-8" ?>
<confguration>
<property>
<name>cloudTemplateLoc</name>
<value>/home/Hadoop/bitt/bitt-1.0/conf</value>
<description>cloud conf template fle location</description>
</property>
<property>
<name>cloudTemplateVars</name>
<value>all</value>
<description>the list of template variables to copy</description>
</property>
<property>
<name>jobTrackerPort</name>
<value>8021</value>
<description>jobtracker port</description>
</property>
<property>
<name>nameNodePort</name>
<value>8020</value>
<description>jobtracker port</description>
</property>
<property>
<name>cloudConfDir</name>
<value>/tmp/hadoopConf</value>
<description>generated cloud conf fle</description>
</property>
<property>
<name>cloudTmpDir</name>
<value>hadoop-${user.name}</value>
<description>cloud tmp dir</description>
</property>
<property>
<name>cloudInstallDir</name>
<value>/usr/local/hadoop/install</value>
<description>cloud install dir</description>
</property>
<property>
<name>cloudNodeList</name>
<value>/home/Hadoop/bitt/bitt-1.0/conf/hadoopNodeList</value>
<description>cluster nodes</description>
</property>
<property>
<name>monNodeList</name>
<value>/home/Hadoop/bitt/bitt-1.0/conf/hadoopMonNodeList</value>
12
<description>cluster monitor nodes</description>
</property>
<property>
<name>cloudTarList</name>
<value>/home/Hadoop/bitt/bitt-1.0/conf/hadoopTarList</value>
<description>cluster nodes</description>
</property>
<property>
<name>monInterval</name>
<value>30</value>
<description>sampling duration</description>
</property>
<property>
<name>monCount</name>
<value>0</value>
<description>number of samples</description>
</property>
<property>
<name>monResults</name>
<value>/tmp/monHadRes</value>
<description>cloud monitor log fles location</description>
</property>
<property>
<name>monSummary</name>
<value>/tmp/monHadSum</value>
</property>
<property>
<name>monDir</name>
<value>/tmp/monHadLoc</value>
</property>
<property>
<name>gnuCmd</name>
<value>/usr/local/hadoop/install/gnuplot-4.4.3/bin/gnuplot</value>
<description>none</description>
</property>
</confguration>
13
e. hdfs-site-template.xml: Hadoop configuration file where HDFS parameters are set. Please note the optimizations values we
used to run the test are shown in bold font.

<confguration>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specifed when the fle is created.
The default is used if replication is not specifed in create time.
</description>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>655360</value>
<description>number of fles Hadoop serves at one time</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/mnt/disk1/hdfs/data,/mnt/disk2/hdfs/data,/mnt/disk3/hdfs/data,/mnt/disk4/hdfs/data,/mnt/disk5/hdfs/data,/mnt/
disk6/hdfs/data,/mnt/disk7/hdfs/data,/mnt/disk8/hdfs/data,/mnt/disk9/hdfs/data,/mnt/disk10/hdfs/data,/mnt/disk11/hdfs/
data,/mnt/disk12/hdfs/data</value>
<description>Determines where on the local flesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
<property>
<name>dfs.block.size</name>
<description>The default block size for new fles.</description>
</property>
<property>
<name>io.fle.buffer.size</name>
<description> </description>
</property>
14
<property>
<name>ipc.server.tcpnodelay</name>
<value>true</value>
</property>
<property>
<name>ipc.client.tcpnodelay</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>40</value>
</property>
<property>
<name>io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>io.sort.mb</name>
<value>220</value>
</property>
</confguration>
15
f. mapred-site-template.xml: Hadoop configuration file which defines key MapReduce parameters. Values used in our testing
are highlighted in bold font.

<confguration>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>24</value>
<description>The maximum number of map tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
<name>io.sort.record.percent</name>
<value>0.3</value>
<description>added as per ssg reco
</description>
</property>
<property>
<name>io.sort.spill.percent</name>
<value>0.9</value>
<description>addded as per ssg reco
</description>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>12</value>
<description>The maximum number of reduce tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>64</value>
<description>The default number of reduce tasks per job. Typically set
to a prime close to the number of available hosts. Ignored when
mapred.job.tracker is "local". Assume 10 nodes, 10*2 - 2
</description>
</property>
<property>
<name>mapred.local.dir</name>
<value>/mnt/disk1/hdfs/mapred,/mnt/disk2/hdfs/mapred,/mnt/disk3/hdfs/
mapred,/mnt/disk4/hdfs/mapred,/mnt/disk5/hdfs/mapred,/mnt/disk6/hdfs/
16
mapred</value>
<description>The local directory where MapReduce stores intermediate
data fles. May be a comma-separated list of
directories on different devices in order to spread disk i/o.
Directories that do not exist are ignored.
</description>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m -Djava.net.preferIPv4Stack=true</value>
<description>Java opts for the task tracker child processes.
The following symbol, if present, will be interpolated: @taskid@ is replaced
by current TaskID. Any other occurrences of '@' will go unchanged.
For example, to enable verbose gc logging to a fle named for the taskid in
/tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
-Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
The confguration variable mapred.child.ulimit can be used to control the
maximum virtual memory of the child processes.
</description>
</property>
<property>
<name>mapred.output.compress</name>
<value>false</value>
<description>Should the job outputs be compressed?
</description>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>false</value>
<description>Should the outputs of the maps be compressed before being
sent across the network. Uses SequenceFile compression.
</description>
</property>
<property>
<name>mapred.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.DefaultCodec</value>
<description>If the job outputs are compressed, how should they be compressed?
</description>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.DefaultCodec</value>
<description>If the job outputs are compressed, how should they be compressed?
</description>
</property>
17
<property>
<name>mapred.map.tasks.speculative.execution</name>
<value>true</value>
</property>
<property>
<name>mapred.reduce.tasks.speculative.execution</name>
<value>true</value>
</property>
<property>
<name>mapred.job.reuse.jvm.num.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.reduce.parallel.copies</name>
<value>20</value>
</property>
<property>
<name>mapred.min.split.size</name>
</property>
<property>
<name>mapred.reduce.copy.backoff</name>
<value>5</value>
</property>
<property>
<name>mapred.job.shuffe.merge.percent</name>
<value>0.7</value>
</property>
<property>
<name>mapred.job.shuffe.input.buffer.percent</name>
<value>0.66</value>
</property>
<property>
<name>mapred.job.reduce.input.buffer.percent</name>
<value>0.90</value>
</property>
</confguration>
18
g. hadoop-terasort.xml: Intel BITT configuration file from which the parameters are read before the test runs. Parameters in
this configuration file override values in the other configuration files mentioned above. This configuration file helps to quickly
change the parameter values for different test runs without editing individual configuration files.
<confguration>
<property>
<name>mapred.map.tasks</name>
<value>8192</value>
<description> Total Map task number </description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>243</value>
<description> Total Reduce task number </description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description> Number of copies to replicate </description>
</property>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
<description> compress map output </description>
</property>
<property>
<name>mapred.job.shuffe.input.buffer.percent</name>
<value>0.66</value>
<description> none </description>
</property>
<property>
<name>dataSetSizeSmall</name>
<description> Total Record Number about 1T data. 1 record has 100 bytes </
description>
</property>
<property>
<name>dataSetSize</name>
<value>10000000000</value>
<description> Total Record Number about 1T data. 1 record has 100 bytes </
description>
</property>
19
<property>
<name>dataSetName</name>
<value>tera</value>
</property>
<property>
<name>outputDataName</name>
<value>tera-sort2</value>
</property>
<property>
<name>jarFile</name>
<value>hadoop-0.20.2-examples.jar</value>
</property>
</confguration>
20
Running TeraSort
TeraSort can be started by running terasort.sh. The script runs various commands involved in starting the test, starting performance
counters, ending the test, and gathering performance counter data for analysis. Below is the list of commands executed when the
script is running, and a brief explanation on what the command does.
#!/usr/bin/env bash
###########################################################
#Intel Benchmark Install and Test Tool (BITT) Use Cases
#Typical sequence for hadoop terasort benchmark:
###########################################################
echo "START: terasort benchmark..............................."
date
# Stop any current running test on the cluster.
../scripts/hadoopCli -a stop -c ../conf/hadoopCloudConf.xml
# Kill Java* processes on the nodes
./runkill.sh
# Install fresh copy of executables on the slave nodes.
../scripts/hadoopCli -a install -c ../conf/hadoopCloudConf.xml
# Format the HDFS to store the data
../scripts/hadoopCli -a format -c ../conf/hadoopCloudConf.xml
# Start Java processes on all slave nodes.
../scripts/hadoopCli -a start -c ../conf/hadoopCloudConf.xml
# 2 minutes delay to get the processes started on the slave nodes.
sleep 120
# Generate 1TB of data which will be used for sorting.
../scripts/hadoopCli -a data -c ../conf/hadoopCloudConf.xml
# Create monitoring directories.
../scripts/monCli -r clean -c ../conf/hadoopCloudConf.xml
# Start iostat utility to monitor disk usage on the slave nodes.
../scripts/monCli -m iostat -a run -c ../conf/hadoopCloudConf.xml -s run_iostat.sh
21
# Start sar utility on all the slave nodes to monitor CPU, network, and memory utilization
../scripts/monCli -m sar -a run -c ../conf/hadoopCloudConf.xml -s run_sar2.sh
# Start the sort activity on the 1TB data generated in the earlier step.
../scripts/hadoopCli -a run -c ../conf/hadoopCloudConf.xml
# Stop sar monitoring utility.
../scripts/monCli -m sar -a kill -s run_sar_kill.sh -c ../conf/hadoopCloudConf.xml
# Stop sar monitoring utility.
../scripts/monCli -m sar -a kill -s run_sar_kill.sh -c ../conf/hadoopCloudConf.xml
# Stop iostat utility.
../scripts/monCli -m iostat -a kill -s run_iostat_kill.sh -c ../conf/hadoopCloudConf.xml
# Convert iostat generated data to CSV fle format.
../scripts/monCli -m iostat -a csv -c ../conf/hadoopCloudConf.xml
# Convert data generated from sar utility to CSV format.
../scripts/monCli -m sar -a csv -c ../conf/hadoopCloudConf.xml -s run_sar_gen.sh
# Using gnuplot to generate image containing graph of iostat data.
../scripts/monCli -m iostat -a plot -t iostat -c ../conf/hadoopCloudConf.xml
# Using gnuplot to generate image containing CPU graph from sar data.
../scripts/monCli -m sar -a plot -t cpu -c ../conf/hadoopCloudConf.xml
# Using gnuplot to generate image containing memory graph from sar data.
../scripts/monCli -m sar -a plot -t mem -c ../conf/hadoopCloudConf.xml
# Using gnuplot to generate image containing network graph from sar data.
../scripts/monCli -m sar -a plot -t nw -c ../conf/hadoopCloudConf.xml
# Archive logfles on all the slave nodes.
../scripts/monCli -r tar -c ../conf/hadoopCloudConf.xml
22
# Copy archived logfles from slave nodes to master node.
../scripts/monCli -r collect -c ../conf/hadoopCloudConf.xml
# Stop running processes on slave nodes.
../scripts/hadoopCli -a stop -c ../conf/hadoopCloudConf.xml
# Creates folder called cluster in head node /tmp/monHadSum.
../scripts/monCli -r cluster -c ../conf/hadoopCloudConf.xml
# Calculate average CPU utilization of the cluster.
../scripts/monCli -r average -m sar -t cpu -c ../conf/hadoopCloudConf.xml
# Calculate average memory throughput of the cluster.
../scripts/monCli -r throughput -m sar -t mem -c ../conf/hadoopCloudConf.xml
# Calculate average network utilization for the cluster.
../scripts/monCli -r throughput -m sar -t nw -c ../conf/hadoopCloudConf.xml
# Calculate average disk throughput of the cluster.
../scripts/monCli -r throughput -m iostat -t iostat -c ../conf/hadoopCloudConf.xml
# Copy contents of hadoopconf folder.
cp -r /tmp/hadoopConf /tmp/monHadSum
# Copy contents of hadoopconf folder.
cp -r /tmp/hadoopConf /tmp/monHadSum
# Copy performance data gathering templates.
cp -r ../templates /tmp/monHadSum
# Create archive with all the logfles and graph images.
tar -cvf /tmp/baselineMapRtera.tar /tmp/monHadSum
# End script.
echo "END: terasort benchmark..............................."
date
23
Results
Figure 6 shows the test results from running TeraSort using the 17 node Hadoop cluster. The images shows two rounds of testing
with data compression turned ON and one with data compression turned OFF. The lower the time to complete, which is measured in
seconds, the better the result. The images also show resource utilization in terms of processor, memory, disk throughput, and network
throughput for both the test runs.
Figure 6: Time taken to complete TeraSort.
In Figure 6, the blue bar shows time taken by TeraSort to run with output of the map phase compressed before the data was stored
for reduced phase. In our test the cluster sorted 1TB of data in 1207 seconds with data compression. The red bar shows time taken
by TeraSort run to complete without data compression. As we can see in the graph, TeraSort completes in 1040 seconds and is faster
than run with data compression.
24
The following graphs show resource utilization with data compression enabled.
Figure 7: Processor utilization with Data compression enabled
Figure 7 shows the average processor utilization of the cluster with data compression. The Intel Xeon processor X5680 has the
additional task of compressing the data and makes an excellent choice for setting up the Hadoop cluster.
25
Figure 8 shows average network throughput of the cluster with data compression enabled. Since the data is compressed before
getting transmitted over the network, the amount of data sent over the network is reduced.
Figure 8: Network throughout with data compression enabled
26
Figure 9 shows the average percentage of cluster memory used. Since we allocated almost 2GB of memory per task, the entire 48GB
of memory on the server is utilized when the TeraSort benchmark is running.
Figure 9: Memory usage
27
Figure 10 shows the average disk throughput of the cluster. Since the data is compressed, the writes are minimal during the map
phase and peak to nearly 600Mb/s when the sorted data is committed to the disk.
Figure 10: Disk throughput with data compression enabled
28
The following graphs show resource utilization with data compression disabled.
Figure 11: Processor utilization with Data compression disabled
29
Figure 12: Network throughput with data compression disabled
Figure 12 shows the network throughput reaching almost 300MB/s, or close to 3Gb/s, when TeraSort is run with compression of data
disabled. To provide optimal bandwidth to accommodate the data transfer between the nodes, Intel Ethernet server adapter X520
based on10GbE effciently handles the data throughput of the cluster.
30
Figure 13: Memory usage
31
Figure 14: Disk throughput with data compression disabled
With compression disabled we see the disk usage is higher as with network usage. The peak writes were at 620MB/s and remained
above 400MB/s for the entire run. The total cluster throughput including read and writes was closer to 1GB/s at the peaks.
32
Conclusion
Hadoop clusters beneft a great deal from servers based on Intel Xeon processor 5680; the dual socket servers are optimal for any
Hadoop deployment ranging from a few nodes to hundreds of nodes. In our test runs we were able to put the cluster to its maximum
utilization. With the cluster being 100 percent utilized, jobs complete faster, making way for other job sets to run on the cluster.
With data centers aiming to get the most out of performance per watt, having an energy effcient Intel Xeon processor 5600 series
provides cost benefts on a per node basis. In distributed workloads it is key to have high throughput network connections to handle
workloads with large datasets. In the test, Intel Ethernet server adapters X520-DA2 based on 10GbE were able to achieve data rates
of 3Gb/s during the workload execution. While compressing the data has advantages of substantial reduction in data transfer over the
network, the time to complete increases compared to test runs without data being compressed. System administrators and application
developers have to make the decision whether to enable data compression based on their specifc requirements. Intel has published a
set of guidelines on tuning Hadoop clusters which can be found at http://software.intel.com/fle/31124. Using LZO based compression
codecs may alleviate some of the bottlenecks found with default Zlib compression codecs.
33
Disclaimers
Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See www.intel.com/
products/processor_number for details.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROP-
ERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTELS TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER,
AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR
A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING
BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE
PERSONAL INJURY OR DEATH MAY OCCUR.
Intel may make changes to specifcations and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked
reserved or undefned. Intel reserves these for future defnition and shall have no responsibility whatsoever for conficts or incompatibilities arising from future changes to them. The infor-
mation here is subject to change without notice. Do not fnalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifcations. Current characterized er-
rata are available on request. Contact your local Intel sales offce or your distributor to obtain the latest specifcations and before placing your product order. Copies of documents which
have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intels Web site at www.intel.com.
Copyright 2012 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon inside, and Intel Intelligent Power Node Manager are trademarks of Intel
Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
For more information:
http://hadoop.apache.org/common/docs/r0.20.1/
http://software.intel.com/fle/31124/
http://software.intel.com/en-us/articles/intel-benchmark-install-and-test-tool-intel-bitt-tools/
www.intel.com/cloudbuilders

Cloud Builders Xeon Apache Hadoop Guide

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Cloud Builders Xeon Apache Hadoop Guide

Diunggah oleh

Hak Cipta:

Format Tersedia

Intel Cloud Builders Guide to Cloud Design

and Deployment on Intel Platforms

Anda mungkin juga menyukai