Big Data and Hadoop For Developers - Syllabus

Big Data and Hadoop for Developers Level 1
Description
Gartner predicts that 4.4 Million Jobs will be created globally to support BigData. BigData is a popular
term used to describe the exponential growth, availability and use of information, both structured and
unstructured. It is imperative that organizations and IT leaders focus on the ever-increasing volume,
variety and velocity of information that forms BigData.
Hadoop is the core platform for structuring BigData, and solves the problem of making it useful for
Analytics. Our course will teach you all you need to learn about using Hadoop for BigData analysis and
give you a clear understanding about processing BigData with Hadoop.
Why learn about Processing BigData with Hadoop?
Businesses are now aware of the large volumes of data that they generate in their day to day
transactions. They have also realized that this BigData can provide very valuable insights once
analyzed
The massive volume of BigData and its unstructured format make it difficult to analyze
BigData. Hadoop brings the ability to cheaply process large amounts of data, regardless of
structure.
If you are an IT professional who wants to stay up to date with the current buzzword then this
is the course for you.
Knowledge about processing BigData with Hadoop will also prove to be a huge Resume builder
for Students who will be trying for Placements soon.
If you are a developer who is uncertain about how Hadoop works, this course will clear things
up and save you lot of time and effort
If you are business that is planning to shift to Hadoop, then this is the right course for your
employees to get trained.
Processing BigData with Hadoop will prove to be an answer to many questions at once.
The session will be handled by very experienced trainers who not only have immense
knowledge but are also loaded with valuable experience
Objectives
What is Hadoop and how can it help process large data sets.
How to write MapReduce programs using Hadoop API.
How to use HDFS (the Hadoop Distributed Filesytem), from the command line and API, for
effectively loading and processing data in Hadoop.
How to ingest data from a RDBMS or a data warehouse to Hadoop.
Best practices for building, debugging and optimizing Hadoop solutions.
Get introduced to tools like Pig, Hive, HBase, Elastic MapReduce etc. and understand how they
can help in BigData projects.
Who should attend
A developer who wants to learn Hadoop but you dont know where to start
A team that is struggling to extract insights from large scale and fast growing data in
traditional systems
A team that has decided to migrate from a RDBMS or a traditional data warehouse to Hadoop,
but needs help getting started
Course Outline
Day 1 and 2
Introduction
Big Data
What is Big Data?
Trends across industries.
Opportunities to disrupt business models across industries.
Industry specific Use Cases.
Some brief Case Studies.
Data Science
An emerging new discipline.
Skills required to be a Data Scientist.
Hadoop
What is Hadoop?
Why do we need a new tool? / Motivations for Hadoop
A comparison with traditional databases (RDBMS) and data warehouses.
Data Hub/Lake/Reservoir: The role of Hadoop in a modern data architecture.
Apache Hadoop
Distributions including Hadoop: Cloudera, Hortonworks, MapR, IBM, Pivotal and Intel.
An overview of a typical Hadoop cluster.
Hadoop Deployment
Commodity Hardware
Hadoop Appliances
Hadoop on the Cloud
Hadoop as a Service
Lab: Install and configure a multi node Hadoop cluster with Ambari
Data Storage
File System Abstraction
Big Data and Distributed File Systems
Hadoop Distributed File System (HDFS)
HDFS Architecture
Architectural assumptions and goals
How data is stored in HDFS
How data is read from HDFS
Namenodes and Datanodes
Blocks
Data Replication
Fault Tolerance
Data Integrity
Namespaces
Federation in Hadoop 2.0
High Availability in Hadoop 2.0
Security and Encryption

HDFS Interfaces: FileSystem API, FSShell, WebHDFS, Fuse etc.
Lab: Manipulating files in HDFS using hadoop fs commands.
Lab: Manipulating files in HDFS pragmatically using the FileSystem API.
Alternative Hadoop File Systems: IBM GPFS, MapR-FS, Lustre, Amazon S3 etc.
Data Processing
MapReduce
The fundamentals: map() and reduce()
Data Locality
Architecture of the MapReduce framework.
Phases of a MapReduce Job

Lab: Write a simple log analysis MapReduce application
Job Execution
Partitioners
Combiners
The flow of <key, value> pairs in a MapReduce Job

Lab: Write an Inverted Index MapReduce Application with custom Partitioner and Combiner
Custom types and Composite Keys
Custom Comparators
InputFormats and OutputFormats
Distributed Cache
MapReduce Design Patterns
Sorting
Joins
Streaming Job: Writing MapReduce programs in languages other than Java

Lab: Writing a streaming MapReduce job in Python
YARN and Hadoop 2.0
Separating resource management and processing
YARN Applications: MapReduce, Tez, HBase, Storm, Spark, Giraph etc.
YARN Architecture
ResourceManager
NodeManagers
ApplicationMasters
Containers
Fault Tolerance
Tez: Accelerating processing of data stored in HDFS
Data Integration
Integrating Hadoop into your existing enterprise.
Introduction to Sqoop
Lab: Importing data from an RDBMS to HDFS using Sqoop
Lab: Exporting data from HDFS to an RDBMS
Other data integration tools: Flume, Kafka, Informatica, Talend etc.
Higher Level Tools
Defining workflows with Oozie
An introduction to Hive
Architecture
Interfaces: Hive Shell, Thrift, JDBC, ODBC etc.
HiveQL: A dialect of SQL
Data Types and File Formats
Creating Tables and Loading Data
Schema at Read
Querying Data
User Defined Functions
An introduction to Pig
Grunt Shell
Pigs Data Model
Pig Latin
User Defined Functions
An introduction to HBase
Architecture
Client API
MapReduce Integration
Schema Design
Day 3 (optional)
MapReduce
Lab: Writing custom InputFormat and OutputFormat
Lab: Implementing Total Sort
Lab: Implementing Secondary Sort with Composite Keys and Custom Comparators
Hive
Lab: Writing Hive Queries: Managed/External Tables, Formats, Partitions etc.
Lab: Writing a User Defined Hive Function
Lab: Accessing data in Hive from Excel over ODBC
Pig
Lab: Writing and excuting a Pig Latin script
Lab: Writing a Pig User Defined Function
HBase
Lab: Importing data into HBase
Lab: Writing an HBase MapReduce Job
Other Details
Questions?
For latest batch dates, fees, location and general inquiries, contact our sales team at: +91
8880002200 or email at sales@cloudthat.in
For purely technical queries about the course please contact Bhavesh at bhavesh@cloudthat.in

Big Data and Hadoop For Developers - Syllabus

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Big Data and Hadoop For Developers - Syllabus

Diunggah oleh

Hak Cipta:

Format Tersedia

Big Data and Hadoop for Developers Level 1

How to write MapReduce programs using Hadoop API.

How to ingest data from a RDBMS or a data warehouse to Hadoop.

Best practices for building, debugging and optimizing Hadoop solutions.

Who should attend

What is Big Data?

Trends across industries.

Opportunities to disrupt business models across industries.

Industry specific Use Cases.

Some brief Case Studies.

An emerging new discipline.

Skills required to be a Data Scientist.

Why do we need a new tool? / Motivations for Hadoop

A comparison with traditional databases (RDBMS) and data warehouses.

Data Hub/Lake/Reservoir: The role of Hadoop in a modern data architecture.

An overview of a typical Hadoop cluster.

Hadoop on the Cloud

File System Abstraction

Big Data and Distributed File Systems

Hadoop Distributed File System (HDFS)

Architectural assumptions and goals

How data is stored in HDFS

How data is read from HDFS

Namenodes and Datanodes

Federation in Hadoop 2.0

High Availability in Hadoop 2.0

Security and Encryption

The fundamentals: map() and reduce()

Architecture of the MapReduce framework.

Phases of a MapReduce Job

The flow of <key, value> pairs in a MapReduce Job

Custom types and Composite Keys

InputFormats and OutputFormats

MapReduce Design Patterns

Streaming Job: Writing MapReduce programs in languages other than Java

YARN and Hadoop 2.0

Separating resource management and processing

YARN Applications: MapReduce, Tez, HBase, Storm, Spark, Giraph etc.

Integrating Hadoop into your existing enterprise.

Other data integration tools: Flume, Kafka, Informatica, Talend etc.

Higher Level Tools

Defining workflows with Oozie

Interfaces: Hive Shell, Thrift, JDBC, ODBC etc.

HiveQL: A dialect of SQL

Data Types and File Formats

Creating Tables and Loading Data

User Defined Functions

Pigs Data Model

User Defined Functions

Anda mungkin juga menyukai