Anda di halaman 1dari 33

Hadoop Training ---- By Sasidhar M Sasis937@gmail.

com
Module 1 Introduction and Basics

Why Hadoop?
We can processing and querying vast amount of data.
Efficient.

Security.
Automatic distribution of data and work across machines.

Open source
Parallel Processing

History of Hadoop

Stand alone Applications

Network
Application Server Connection Pulling Data Manipulating Updating Synchronization

Data Base

Challenges in Standalone Application


Network Latency
GB,TB,PBs of Data is moved

What is the size of application? What if data size is huge?

Changing Mindset
Should the application be moved or Data?

Parallel Processing
Apart from data storage performance becomes major concern Parallel Processing
Multithreading OpenMP MPI (Message passing Interface)

Building Scalable Systems


What is the need of scaling?
Storage Processing

Vertical Scaling
Adding extra hardware, RAM etc

Horizontal Scaling
Adding more nodes

Can you scale your existing system? Elastic scalability

Distributed Framework
Data Localization
Moving application to where data resides

Data Availability
When storing data across nodes, it should be available to all other nodes and should be accessible Even if the nodes are failing, data should not be lost

Data Consistency
Data at all times should be consistent

Data Reliablity

Challenges in Distributed Framework


How to reduce network latency? How to make sure that data is not lost? How to design a programming model to access the data?

Chunking an input file

Assignments- Prerequisities
Linux OS Sun JDK 6 (>=1.6) Hadoop 1.0.3 Eclipse Apache Maven 3.0.4

Hadoop Training Module-2


Hadoop Installation

Next Module OverView


Installing Hadoop Running Hadoop on Pseudo Mode Understanding configuration Understanding Hadoop Processes like NN,DN,SNN,JT,TT Running sample mapreduce programs Overview of basic commands and their usage

Agenda
Hadoop Installation Running Sample MapReduce Program HDFS Commands

Step1: Installing Java on RHEL or CentOS or Fedora OS


Download Sun jdk from the oracle web site (rpm.bin file) Execute chmod +x give_rpm.bin_filename java will be installed under /usr/java folder Set JAVA_HOME environment variable in .bashrc file or .bash_profile
export JAVA_HOME=/usr/java/jdk1.6.0_31 export PATH = $PATH:$JAVA_HOME/bin source .bashrc Run the command java -version, it should show the version of jdk you installed

Installing java on Ubuntu Contd..


tar -xvzf jdk-7u9-linux-x64.tar.gz sudo mv jdk1.7.0_04 /usr/lib/jvm/ sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/jdk1.7.0_04/bin/javac 1 sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/jdk1.7.0_04/bin/java 1 sudo update-alternatives --install /usr/bin/javaws javaws /usr/lib/jvm/jdk1.7.0_04/bin/javaws

Step 2: Disabling ipv6


cat /proc/sys/net/ipv6/conf/all/disable_ipv6
The value of 0 indicates that ipv6 is disabled

If ipv6 is not disabled then


Open /etc/sysctl.conf and add the following lines:

net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1

Step 3 Creating a new user for hadoop


Not a mandatory step, but make sure in cluster mode, hadoop should be run from same user
useradd hadoop passwd hadoop

Step 4- Configuring SSH


Nodes in the cluster communicate with each other via ssh Name Node should be able to communicate to the Data nodes in password less manner Run the following command to generate public and private key without password ssh-keygen -t rsa -P cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys Set the permission of authorized_keys to 755 chmod 755 authorized_keys Now when u do ssh localhost, it shold not ask you password

Possible Error While configuring ssh


ssh: connect to host localhost port 22: Connection refused Check if sshd is running or not ps ef | grep sshd Check if ssh server and client are installed or not. If not install it sudo apt-get install openssh-client openssh-server (On Ubuntu) yum -y install openssh-server openssh-client (On Centos) Now start the service by running the following command chkconfig sshd on service sshd start

Step 5-Installing Hadoop


Untar the file by running the following command tar zxf hadoop-1.0.3.tar.gz Create following environment variable in .bashrc file export HADOOP_HOME=/home/hadoop/hadoop-1.0.3 export HADOOP_LIB=$HADOOP_HOME/lib

Step 6: Configuring Hadoop

mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> </property> </configuration>

hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>

Core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop-${user.name}</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> </property> </configuration>

Masters and Slaves file


In masters file specify the IP or hostname of the NameNode In slaves file, specify the IP or hostname of the slaves file

hadoop-env.sh
Specify the JAVA_HOME path

Step 7: Formatting NameNode and starting Hadoop cluster


Require to build file system Where it is created? Run the following command to format the namenode.
cd $HADOOP_HOME bin/hadoop namenode -format

The output should show that NameNode is successfully formatted. Run bin/start-all.sh to start hadoop cluster Execute ps ef | grep hadoop. It should show you all the below 5 processes running
NameNode DataNode Secondary NameNode Task Tracker Job Tracker

Job Tracker UI
http://localhost:50030

NameNode UI
http://locahost:50070

Anda mungkin juga menyukai