By Gopal Krishna
-----------------------------------------------------------Before going to start the multimode hadoop cluster setup on Cent OS, a fully working single
node hadoop cluster setup on Cent OS has to be completed on all the nodes.
NOTE 2 : We will call the machine1as the master from now onwards and machine2 as
slave. The master may act as slave too to share some load and the slave-only machine will
be a pure slave. We will change hostnames of these machines in their networking setup,
most notably in /etc/sysconfig/network .
change the HOSTNAME to your preferred master and save the file (press
ctr+X and press y and enter). Looks like the below:
[hduser@master /]$ cat /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=master
[hduser@master /]$
192.168.131.139 master
192.168.131.140 slave
::1
localhost6.localdomain6 localhost6
[hduser@master /]$
NOTE: We will get the IP Address of a node, by typing ifconfig command on each node
a) to its own user account on the master i.e. SSH master in this context and not
necessarily SSH localhost and b) to the hduser user account on the slave (aka
hduser@slave) via a password-less SSH login.
If you followed my Setup Single Node Hadoop Cluster on CentOS, you just have to
add the hduser@masters public SSH key (which should be in ~/.ssh/id_rsa.pub) to
the authorized_keys file of hduser@slave (in this users ~/.ssh/authorized_keys).
once authorized_kyes generated you have to change the permissions using below
COMMANDS
-----------------------------------
Copy the public key from master to slave by running below command
hduser@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave
This command will prompt you for the login password for user hduser on slave, then
copy the public SSH key for you, creating the correct directory and fixing the
permissions as necessary.
You need to stop the iptables(firewalls) so that one node can communicate with
other.
#Stop firewall on master hduser@master:~$
sudo service iptables stop
Run the below command to have the iptables stopped even after system reboot.
hduser@master:~$sudo chkconfig iptables off
Run the below command to have the iptables stopped even after system reboot.
hduser@slave:~$sudo chkconfig iptables off
hduser@master:~$vi /usr/local/hadoop/conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://MASTER:54310</value>
<description>The name of the default file system. A URI whose scheme
and authority determine the FileSystem implementation. The uri's
scheme determines the config property (fs.SCHEME.impl) naming
theFileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.
</description>
</property>
hduser@slave:~$vi /usr/local/hadoop/conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://MASTER:54310</value>
<description>The name of the default file system. A URI whose scheme
and authority determine the FileSystem implementation. The uri's
scheme determines the config property (fs.SCHEME.impl) naming
theFileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.
</description>
</property>
hduser@master:~$vi /usr/local/hadoop/conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>MASTER:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
#Do the following on slave node
Open the mapred-site.xml file using below command and change the value for
hduser@master:~$vi /usr/local/hadoop/conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>MASTER:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
On master node, you need to format the HDFS. Run the below command to do the
HDFS format.
hduser@master:~$hadoop namenode format.
The above command should format the HDFS on both master and slave. If the format
files, you can remove the dfs and mapred folders in /app/hadoop/tmp on both the
nodes manually and then try to format again
STEP 14: Start the cluster
On master node, run the below command
hduser@master:~$start-all.sh
[hduser@master /]$ jps
3458 JobTracker
3128 NameNode
3254 DataNode
5876 Jps
3595 TaskTracker
3377 SecondaryNameNode
[hduser@master /]$
NOTE: If the given demons are not running on respective nodes, you need to check log
files for possible errors. The default log files location is /usr/local/hadoop/logs