Master the skills of Big Data, NoSQL and Data Science at once and become a successful Big Data Scientist
with access to 16 courses at once for a lifetime. Start your journey now!
List of Courses present in this combo pack
Hadoop Architect Training : All in 1 Combo Course: Hadoop Developer, Hadoop Analyst, Hadoop Administ
and Hadoop Testing
R Programming Training
Mahout Training
Data Science Training: Building Recommender Systems
Statistics and Probability Training
Apache Solr Training
Splunk Training
Apache Storm Training
Splunk Admin Training
HBase Training
Cassandra Training
MongoDB Training
Apache Spark, Scala Training
Key Features:
A comprehensive, in-depth combo of Big Data + Data Science + No-SQL courses including as many as
16 niches, highly endorsed and top-paying technology courses
Intensive Learning on Hadoop Hadoop Architect Training All in 1 Combo Course which includes
Hadoop Developer, Hadoop Analyst, Hadoop Administration and Hadoop Testing, R programming Training,
Mahout Training, Data Science Training: Building Recommender Systems, Statistics and Probability Training,
Apache Solr Training, Splunk Training, apache Storm Training, Splunk admin Training, HBase Training,
Cassandra Training, MongoDB Training and Apache Spark and Scala Training
Intellipaat Proprietary VM for Lifetime and free cloud access for 6 months for performing exercises.
70% of extensive learning through Hands-on exercises, Project Work, Assignments and Quizzes
The training will prepare you for multiple Professional Certification Exams:
Cloudera Certification:
CCA Spark and Hadoop Developer, CCAH, R Certification, Mahout Certification, Cloudera Certification
(CCP:DS), Apache Storm Certification, Cloudera Apache HBase Certification, Apache Cassandra Professional
Certification, MongoDB Certification, Apache Spark Certification
Hadoop Projects
1. Project
Working with Map Reduce, Hive, Sqoop
Problem Statement
It describes how to import MySQL data using Sqoop and querying it using hive and also describes how to run the
word count MapReduce job.
2. Project
Work on Movie lens data for finding top records
Data
Movie Lens Dataset
Problem Statement
It includes:
Write a MapReduce program to find the top 10 movies from the u.data file
Create the same top 10 movies using PIG by loading u.data into pig
Create the same top 10 movies using HIVE by loading u.data into HIVE
3. Project
Hadoop Yarn Project End to End PoC
Problem Statement
It includes:
How to use sqoop commands to bring the data into the HDFS
How to process the real word data or a huge amount of data using MapReduce program in terms of the
movie etc.
4. Project
Partitioning Tables
Problem Statement
It describes the parting and How to perform portioning. It includes:
Manual Partitioning
Dynamic Partitioning
Bucketing
5. Project
Sales Commission
Data Sales
Problem Statement
Hadoop Multi Node Cluster Setup using Amazon ec2 Creating 4 node cluster setup
Id
Opening Date
Revenue
It also includes:
Data Overview
Data Fields
Ways of Recommendation
Matrix Factorization
Collaboration Filtering
SPT Project
Data Analysis Project
Data Sales
Problem Statement
It includes the following actions:
Data Cleaning
Splunk Project:
The Splunk Project, after finishing this training course, will let you create a report and dashboard with
the text file having employee details.
You will perform various row operations to fetch data as per your requirements and use important
Splunk commands on the file to extract certain fields.
Other significant aspects of this project are editing the event, adding tags, searching event with tag
names and saving tag search.
Installation of HBase
Creation of Table
Installation of Java
Curriculum
Hadoop
Module 1 Introduction to Big Data & Hadoop, Hadoop Ecosystem, Map Reduce and
HDFS
Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency
Deep Dive in Map Reduce Execution Framework, Partitioner, Combiner, Data Types, Key pairs
HDFS Deep Dive Architecture, Data Replication, Name Node, Data Node, Data Flow
Assignment 1
Module 2 Hands-on Exercises
Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files, their
Properties and Demon Threads
Introduction to Yarn
Assignment -2 and 3
Mini Project Importing Mysql Data using Sqoop and Querying it using Hive
2.
3.
MapReduce Combiner.
Mapreduce Partitioner.
4.
Job Scheduler
5.
Joining Of Files/Datasets
Distributed cache.
6.
Reduce Joins
Counters
Reduce Join
7.
8.
9.
Input Format
Custom Input Format.
Inverted Indexing.
Map Reduce Inverted Indexing
Hadoop APIs
10.Explanation of MapReduce organization.
Module 3.1
1.
2.
3.
4.
5.
Project 2- Hands on exercise end to end PoC using Yarn or Hadoop 2.7
Running Map Reduce Code for Movie Rating and finding their fans and average rating
Assignment -4 and 5
Module 4 Deep Dive in Pig
1.
Introduction to Pig
What Is Pig?
Pigs Features
2.
Loading Data
Field Definitions
Data Output
Commonly-Used Functions
3.
Grouping
4.
Set Operations
Hands-On Exercise
5.
Extending Pig
UDFs
Pig Jobs
Case studies of Fortune 500 companies which are Electronic Arts and Walmart with real
data sets.
Assignment 6
Module 5 Deep Dive in Hive
1.
Introduction to Hive
What Is Hive?
Data Types
Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue
3.
Self-Managed Tables
4.
Hive Optimization
Partitioning
Bucketing
5.
Indexing Data
Extending Hive
User-Defined Functions
6.
6.
User defined Functions, Optimizing Queries, Tips and Tricks for performance tuning
Assignment 7
Module 6 Impala
1.
Introduction to Impala
What is Impala?
2.
2.
HCatalog
4.
Data Partitioning
Partitioning Overview
Avro Schemas
Compression
What is Hbase
What is NOSQL
Assignment -8
Apache Spark
Module 9 Why Spark? Explain Spark and Hadoop Distributed File System
What is Spark
Components of Spark
Spark Components
Module 11 Running Spark on a Cluster, Writing Spark Applications using Python, Java,
Scala
Advantages of Spark
Hadoop Multi Node Cluster Setup using Amazon ec2 Creating 4 node cluster setup
Working with Large data sets, Steps involved in analyzing large data
Assignment 9, 10
Module 14 Advance Mapreduce
More Advanced Map Reduce Programming, Joining Data Sets in Map Reduce
Assignment 11, 12
Module 15 ETL Connectivity with Hadoop Ecosystem
Connecting to HDFS from ETL tool and moving data from Local system to HDFS
End to End ETL PoC showing Hadoop integration with ETL tool.
Safe Mode
Module 20 Hadoop Multi Node Cluster Setup and Running Map Reduce Jobs on
Amazon Ec2
Hadoop Multi Node Cluster Setup using Amazon ec2 Creating 4 node cluster setup
Module 21 ZOOKEEPER
ZOOKEEPER Introduction
ZOOKEEPER Services
Znodes operations
Znodes watches
Consistency Guarantees
Cluster management
Leader Election
Important points
Why Oozie?
Installing Oozie
Running an example
Workflow application
Workflow submission
Coordinator
Bundle
Layers of abstraction
Architecture
Apache Flume
Closer look
Anatomy of Flume
Core concepts
Event
Clients
Agents
Source
Channels
Sinks
Interceptors
Channel selector
Sink processor
Data ingest
Agent pipeline
Why channels?
HUE introduction
HUE ecosystem
What is HUE?
Advantages of HUE
Integrating users
Integrating HDFS
Impala architecture
Testing
Module 26 Hadoop Stack Integration Testing
Unit testing
Integration testing
Performance testing
Diagnostics
Nightly QA test
Functional testing
Security testing
Scalability Testing
Reliability testing
Release testing
Understanding the Requirement, preparation of the Testing Estimation, Test Cases, Test Data, Test
bed creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion.
ETL testing at every stage (HDFS, HIVE, HBASE) while loading the input (logs/files/records etc) using
sqoop/flume which includes but not limited to data verification, Reconciliation.
Report defects to the development team or manager and driving them to closure.
Report defects to the development team or manager and driving them to closure.
Responsible for creating a testing Framework called MR Unit for testing of Map-Reduce programs.
Major Project on Big Data and Hadoop, Hadoop Development, Cloudera Certification Tips and
Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification
preparation
Project Work
1.
Project
Working with Map Reduce, Hive, Sqoop
Problem Statement
It describes that how to import mysql data using sqoop and querying it using hive and also describes that
how to run the word count mapreduce job.
2. Project
Work on Movie lens data for finding top records
Data
Movie Lens dataset
Problem Statement
It includes:
Write a MapReduce program to find the top 10 movies from the u.data file
Create the same top 10 movies using PIG by loading u.data into pig
Create the same top 10 movies using HIVE by loading u.data into HIVE
3. Project
Hadoop Yarn Project End to End PoC
Problem Statement
It includes:
How to use sqoop commands to bring the data into the hdfs
How to process the real word data or huge amount of data using map reduce program in terms of
movie etc.
4. Project
Partitioning Tables
Problem Statement
It describes about the parting and How to perform portioning. It includes:
Manual Partitioning
Dynamic Partitioning
Bucketing
5. Project
Sales Commission
Data
Sales
Problem Statement
In this we calculate the commission according to the sales.
6. Project
Connecting Pentaho with Hadoop Eco-system
Problem Statement
It includes:
7. Project
Multinode Cluster Setup
Problem Statement
It includes following actions:
Hadoop Multi Node Cluster Setup using Amazon ec2 Creating 4 node cluster setup
8. Project
Hadoop Testing using MR
Problem Statement
It describes that how to test map reduce codes with MR unit.
9. Project
Hadoop Weblog Analytics
Data
Weblogs
Problem Statement
The goal is to enable the participants to have a feel of the actual data sets in a production environment and
how to load the data into a Hadoop cluster using various techniques. Once data is loaded, the next goal is
to perform basic analytics on this data.
R Programming
Module 1 How R Works
R-Calculator
Vector Creation
Generating Repeats
Sorting Process
Merge Function
Strsplit Function
Matrices
Matrix Manipulation
Row Sums
Line Plots
Bar Plots
Histogram
K-Means Clustering
Linear Regression
Scatter Plots
Logistic Regression
Logistic Regression in R
Predication
Confusion Matrix
ROC Curve in R
Function (Mean)
Examples Of Function
Methods to integrate two popular open source softwares for Big Data analytics: R and Hadoop
Project
Restaurant Revenue Prediction
Data
Revenue Data set
Problem Statement
It predicts the annual restaurant sales based on the objective measurements. It uses following data fields:
Id
Opening Date
Revenue
It also includes:
Data Overview
Data Fields
Mahout
Module 1 Mahout Overview
Clustering in Mahout
Pattern Mining
Data flow
Concept of Recommendation
Defining Clustering
User-to-user similarity
Clustering Illustration
Document clustering
Sequence-to-sparse Utility
K-Mean Clustering
Terminology
Classifiable Data
Classification Examples
Clustering
Clustering Process
Transaction Clustering
Distance measure
Clustering algorithm-K-MEAN
Clustering Application-1
Clustering Application-2
Sentiment Analyzer
Pearson Coefficient
Collaborative Filtering
Similarity Algorithms
Pearson Correlation
Project Lifecycle
Data Acquirement
Transforming Data
Data Acquisition
Data Collection
Uniform Distribution
Skewed Distribution
Transformation
Data Request
What is Strategy
Univariate analysis
Multivariate analysis
Bivariate analysis
Standardize Variables
What is Hypothesis?
Negative Correlation
Machine Learning
Contingency Table
What is Mean?
Degree of Freedom
Linear Regression
What is sampling?
Sampling Distribution
Systematic Sampling
Cluster Sampling
linear regration
Hypothesis testing
Sample
performance measure
alternative hypothesis
Threshold value
Null Hypothesis
Alternative Hypothesis
Probability
Machine Learning
Importance of Algorithms
Predict Algorithms
Population data
sampling
Disproportionate Sampling
What is K?
Training Data
Test Data
Validation data
Model Building
Rules
Iteration
Linear regression
Clustering
Manual Profiling
Clustering Algorithm
Graphical Example
Probabilistic Clustering
Pattern Learning
R introduction
Features of R
R+Hadoop
Products
Case Study
Architecture
Projects
Project 1-Understanding Cold Start Problem in Data Science
Ways of Recommendation
Matrix Factorization
Collaboration Filtering
Descriptive Statistics
SPT
Module 1 Information of Statistics
What is statistics
Descriptive statistics
Variable
Module4 Plots
Dot Plots
Histogram
Stemplots
Outlier detection from box plots and Box and whisker plots
What is probability
Bayes Theorem
Module 6 Distributions
Probability Distributions
Few Examples
Student T- Distribution
Sampling Distribution
Student t- Distribution
Poison distribution
Module7 Sampling
Stratified Sampling
Proportionate Sampling
Systematic Sampling
P Value
Stratified Sampling
Cross Tables
Bivariate Analysis
Analysis of Variance
Project
Data Analysis Project
Data
Sales
Problem Statement
It includes the following actions:
Data Cleaning
Apache Solr
Module 1. The Fundamentals
About Solr
Sorting results
Query parsers
More queries
Faceting
Result grouping
Module 3. Indexing
Analyzing text
Module 5. Relevance
Field weighting
Phrase queries
Function queries
Fuzzier search
Sounds-like
Module 6. Extended features
More-like-this
Geospatial
Spell checking
Suggestions
Highlighting
Pseudo-fields
Pseudo-joins
Multilanguage
Module 7. Multicore
Introduction
Commit strategies
ZooKeeper
Project
Function Queries
Problem Statement
It describes that how to use function queries in Solr, suppose an index store the dimensions in meters x,
y, z of some hypothetical boxes with arbitrary names stored in field boxname. Suppose we want to search
for box matching name findbox but ranked according to volumes of boxes.
Splunk
Module 1 Basic Concepts of Splunk Development
Saving searches
Search scheduling
Describing alerts
Alert Creation
Understanding tags
Module 7 Visualizations
Perform calculations
Value Conversion
Round values
Format values
Conditional statements
Overview of Transactions
Search Transactions
Defining a lookup
Extraction of Fields
Project
The Splunk Project, after finishing this training course, will let you create a report and dashboard
with the text file having employee details.
You will perform various row operations to fetch data as per your requirements and use important
Splunk commands on the file to extract certain fields.
Other significant aspects of this project is editing the event, adding tags, searching event with tag
names and saving tag search.
Splunk Admin
Module 1- Simple Splunk Environment
Installing Splunk
License Management
Data Inputs
App management
Universal Forwarder
Forwarder Management
Extraction of Fields
Project
Field Extraction
Problem Statement
It includes:
Apache Storm
Module 1 Understanding Architecture of Storm
Bayesian Law
Storm Topology
Stream Grouping
Tuple
Spout
Bolt-normalization bolt
Module 3 Grouping
Bolt Lifecycle
Concepts of Storm
Projects
Real-time Project on Storm
The Project Bolt Blue Print
HBase
Module 1 HBase Overview
Why HBase?
What is NoSQL?
HDFS vs.HBase
HBase Architecture
HBase Shell
HBase API
Primary Operations
Advanced Operations
Load Utility
Putting Folder to VM
Project
Integrate Hive and Java with HBase
Problem Statement
This project describes that how to integrate hive and java with HBase. It includes following actions:
Installation of HBase
Creation of Table
Cassandra
Module 1-Advantages and Usage of Cassandra
Replication in RDBMS
Schema
No SQL Category
Advantage &Limitation
CAP Theorem
Consistency
What is Cassandra?
Non relational
Installation
Token calculation
Configuration overview
Node tool
Validators
Comparators
Expiring column
QA
Column family
Partitioners
Partitioners strategies
Replication
Gossip protocols
Read operation
Consistency
Comparison
Node settings
Read operation
System keyspace
Commands overview
Column family
VNodes
Thrift
AVRO
JSON
Hector client
Hector tag
Management of Cassandra
Secondary index
API
Java code
Summarization
Thrift
MongoDB
Module 1 Getting started with NoSQL, MongoDB and their Installation
What is MONGODB
JSON/BSON Introduction
Example of JSON
Installation of MONGODB
Database Type
OLTP
OLAP
NOSQL
Why NOSQL
ACID property
CAP Theorem
Base property
Unacknowledged
Acknowledged
Juurnaled
Fsynced
Repica Acknowledged
Installation Rent
used ppt
CRUD Introduction,
Operational strategies
Backup strategies
Monitoring
Monitoring Commands
Data Management
Introduction to replica
Replica set
Type of Replica
Hidden Replica
Arbiter Replica
Sharding
Hands on Exercise
Introduction to Indexes
Type of Indexes
Index Property
Introduction to Aggregation
Type of Aggregation
Hands on Exercise
Access Control
Module 6 MongoDB Integration with Jaspersoft, Load and Manage Unstructured Data
(Videos, Images, Logs, Resumes etc.)
Loading and Managing Unstructured Data (Videos, Images, Logs, Resumes etc.)
Project
Java MongoDB Integration
Problem Statement
It creates a table to insert the video file using the java program. For this it performs following actions:
Installation of Java
Apache Spark
Module 1-Why Spark? Explain Spark and Hadoop Distributed File System
What is Spark
Components of Spark
Spark Components
Module 3-Running Spark on a Cluster, Writing Spark Applications using Python, Java,
Scala
Advantages of Spark
Define count
Define Filter
Define Fold
Define Factors
Module 5-Spark, Hadoop, and the Enterprise Data Centre, Common Spark Algorithms
Apache bookeeper
Define Dstream
Explain Paraquet
Scala ORM
Define Mlib
Persistence
Motivation
Example
Transformation
Examples K-means
Motivation
Broadcast Variables
Example: Join
Accumulators motivation
Example: Join
Accumulator Rules
Custom accumulators
Introduction
Spark SQL
Module 10-Operations/Accumulators/Traits
Accumulators
Traits
Module 11-Scheduling/Partitioning
Static Partitioning
Dynamic Sharing
Fair Scheduling
concurrency in java
concurrency in scala
Array Buffers
Compact Buffer
Protocol Buffer
Mini Projects
Project 1. List the items
Project 2. Sorting of Records
Project 3. Show a histogram of date vs users created. Optionally, use a rich visualization like
Project 4. Prepare a map of tags vs # of questions in each tag and display it.
Major Projects
Project 1 Movie Recommendation
Project 2 Twitter API Integration for tweet Analysis
Project 3 Data Exploration Using Spark SQL Wikipedia dataset
Scala
Module 1-Introduction of Scala
Scala Overview
Advantages of Scala
Language Features
Type Interface
Option
Pattern Matching
Collection
Currying
Traits
Application Space
Recursion in scala
Classes in scala
Constructor
Constructor overloading
Properties
Abstract classes
Object equality
Sealed traits
Case classes
Variable pattern
Constructor pattern
Tuple pattern
Java equivalents
Advantages of traits
Linearization of traits
Iterable
Array in scala
List in scala
Array buffer
Queue in scala
Dequeue in scala
Stacks in scala
Tuples
Selective imports
Testing-Assertions
SBT