Anda di halaman 1dari 4

What is Cloud Computing

What is Grid Computing


What is Virtualization
How above three are inter-related to each other
What is Big Data
Introduction to Analytics and the need for big data
analytics
Hadoop Solutions - Big Picture
Hadoop distributions
Comparing Hadoop Vs. Traditional systems
Volunteer Computing
Data Retrieval - Radom Access Vs. Sequential Access
NoSQL Databases
The Motivation for Hadoop
Problems with traditional large-scale systems
Requirements for a new approach
Hadoop: Basic Concepts
What is Hadoop?
The Hadoop Distributed File System
How MapReduce Works
Anatomy of a Hadoop Cluster
Hadoop Demons
Namenode
Datanode
Secondary namenode
Job tracker
Task tracker
HDFS at detail
Blocks and Splits
Replication
Data high availability
Data Integrity
Cluster architecture and block placement
Programming Practices & Performance Tuning
Developing MapReduce Programs in
Local Mode
Pseudo-distributed Mode
Fully distributed mode
Writing a MapReduce Program
Examining a Sample MapReduce Program
Basic API Concepts
The Driver Code
The Mapper
The Reducer
Hadoop's Streaming API
Setup Hadoop cluster
Install and configure Apache Hadoop
Make a fully distributed Hadoop cluster on a single
laptop/desktop
Install and configure Cloudera Hadoop distribution in fully
distributed mode
Install and configure Horton Works Hadoop distribution in
fully distributed mode
Monitoring the cluster
Getting used to management console of Cloudera and
Horton Works
Hadoop Security
Why Hadoop Security Is Important
Hadoop's Security System Concepts
What Kerberos Is and How it Works
Configuring Kerberos Security
Integrating a Secure Cluster with Other Systems
Managing and Scheduling Jobs
Managing Running Jobs
Hands-On Exercise
The FIFO Scheduler
The FairScheduler
Configuring the FairScheduler
Hands-On Exercise
Cluster Maintenance
Checking HDFS Status
Hands-On Exercise
Copying Data Between Clusters
Adding and Removing
Cluster Nodes
Rebalancing the Cluster
Hands-On Exercise
Name Node Metadata Backup
Cluster Monitoring and Troubleshooting
General System Monitoring
Managing Hadoop's Log Files
Using the NameNode and
JobTracker Web UIs
Hands-On Exercise
Cluster Monitoring with Ganglia
Common Troubleshooting Issues
Benchmarking Your Cluster
Hadoop Ecosystem covered as part of Hadoop
Administrator
Eco system component: Ganglia
Install and configure Ganglia on a cluster
Configure and use Ganglia
Use Ganglia for graphs.
Eco system component: Nagios
Nagios concepts
Install and configure Nagios on cluster
Use Nagios for sample alerts and monitoring
Eco system component: Hive
Hive concepts
Install and configure hive on cluster
Create database, access it console
Develop and run sample applications in Java/Python to
access hive
Eco system component: Sqoop
Install and configure sqoop on cluster
Import data from Oracle/Mysql to hive
Overview of other Eco system component:
Oozie, Avro, Thrift, Rest, Mahout, Cassandra, YARN,
MR2 etc

Anda mungkin juga menyukai