A Comprehensive Guide To Ensemble Learning (With Python Codes) PDF

A Comprehensive Guide to Ensemble Learning (with Python codes)
× Registrations Closing: Certified AI & ML BlackBelt Program (Beginner to Master) Closes In

-
3 Days 00 Hours 50 Minutes 23 Seconds Enroll Now
HOME
BLOG ARCHIVE
TRAININGS
DISCUSS
DATAHACK
JOBS
o p q x
i LOGIN / REGISTER
CORPORATE
LEARN l
ENGAGE l
COMPETE l
GET HIRED l
COURSES l
CONTACT
2
AI & ML BLACKBELT PROGRAM
Home n Machine Learning n A Comprehensive Guide to Ensemble Learning (with

Python codes)
MACHINE LEARNING PYTHON
A Comprehensive Guide to Ensemble

Learning (with Python codes)
AISHWARYA SINGH, JUNE 18, 2018 LOGIN TO BOOKMARK THIS ARTICLE
Introduction
When you want to purchase a new car, will you walk up to the first car
JOIN THE NEXTGEN
shop and purchase one based on the advice of the dealer? It’s highly
DATA SCIENCE
unlikely.
ECOSYSTEM
Download Resource
https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

You would likely browser a few web portals where people have posted
their reviews and compare different car models, checking for their -Get access to free
features and prices. You will also probably ask your friends and courses on Analytics
colleagues for their opinion. In short, you wouldn’t directly reach a Vidhya
conclusion, but will instead make a decision considering the opinions
-Get free downloadable
of other people as well.
resource from Analytics
Vidhya
-Save your articles
-Participate in
hackathons and win
prizes
Join
Join now
now
Ensemble models in machine learning operate on a similar idea. They

combine the decisions from multiple models to improve the overall
performance. This can be achieved in various ways, which you will
discover in this article.
The objective of this article is to introduce the concept of ensemble

learning and understand the algorithms which use this technique. To
cement your understanding of this diverse topic, we will explain the
advanced algorithms in Python using a hands-on case study on a real-
life problem.
Note: This article assumes a basic understanding of Machine Learning

algorithms. I would recommend going through this article to
familiarize yourself with these concepts.
Table of Contents
1. Introduction to Ensemble Learning

2. Basic Ensemble Techniques
2.1 Max Voting
2.2 Averaging

2.3 Weighted Average

POPULAR POSTS
3. Advanced Ensemble Techniques
3.1 Stacking
3.2 Blending 8 Useful R Packages for
3.3 Bagging Data Science You Aren’t
3.4 Boosting Using (But Should!)
4. Algorithms based on Bagging and Boosting
24 Ultimate Data
4.1 Bagging meta-estimator
Science Projects To
4.2 Random Forest
Boost Your Knowledge
4.3 AdaBoost
and Skills (& can be
4.4 GBM
accessed freely)
4.5 XGB
4.6 Light GBM Understanding Support
4.7 CatBoost Vector Machine
algorithm from examples
(along with code)
A Complete Tutorial to
1. Introduction to Ensemble Learning Learn Data Science with
Python from Scratch
Let’s understand the concept of ensemble learning with an example. Essentials of Machine
Suppose you are a movie director and you have created a short movie Learning Algorithms
on a very important and interesting topic. Now, you want to take (with Python and R
preliminary feedback (ratings) on the movie before making it public. Codes)
What are the possible ways by which you can do that?
7 Types of Regression
Techniques you should
A: You may ask one of your friends to rate the movie for you.
know!
Now it’s entirely possible that the person you have chosen loves you
very much and doesn’t want to break your heart by providing a 1-star Introduction to k-Nearest
rating to the horrible work you have created. Neighbors: Simplified
(with implementation in
B: Another way could be by asking 5 colleagues of yours to rate the Python)
movie. Stock Prices Prediction
This should provide a better idea of the movie. This method may Using Machine Learning
provide honest ratings for your movie. But a problem still exists. These and Deep Learning
5 people may not be “Subject Matter Experts” on the topic of your Techniques (with Python
movie. Sure, they might understand the cinematography, the shots, or codes)
the audio, but at the same time may not be the best judges of dark
humour.
C: How about asking 50 people to rate the movie?

Some of which can be your friends, some of them can be your
colleagues and some may even be total strangers.
The responses, in this case, would be more generalized and

diversified since now you have people with different sets of skills. And
as it turns out – this is a better approach to get honest ratings than the
previous cases we saw.
With these examples, you can infer that a diverse group of people are
likely to make better decisions as compared to individuals. Similar is
true for a diverse set of models in comparison to single models. This
diversification in Machine Learning is achieved by a technique called
Ensemble Learning.
RECENT POSTS
Now that you have got a gist of what ensemble learning is – let us look
at the various techniques in ensemble learning along with their 8 Awesome Data Science
implementations. Capstone Projects from
Praxis Business School
APRIL 29, 2019
2. Simple Ensemble Techniques Learn how to Build and

Deploy a Chatbot in Minutes
In this section, we will look at a few simple but powerful techniques, using Rasa (IPL Case
Study!)
namely:
APRIL 29, 2019
1. Max Voting
How I Built Animated Plots
2. Averaging in R to Analyze my Fitness
3. Weighted Averaging Data (and you can too!)
APRIL 26, 2019
2.1 Max Voting
DataHack Radio #22:
The max voting method is generally used for classification problems. Exploring Computer Vision
In this technique, multiple models are used to make predictions for and Data Engineering with
each data point. The predictions by each model are considered as a Dat Tran
‘vote’. The predictions which we get from the majority of the models APRIL 25, 2019
are used as the final prediction.
For example, when you asked 5 of your colleagues to rate your movie
(out of 5); we’ll assume three of them rated it as 4 while two of them
gave it a 5. Since the majority gave a rating of 4, the final rating will be
taken as 4. You can consider this as taking the mode of all the
predictions.
The result of max voting would be something like this:
Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating
5 4 5 4 4 4

Sample Code:
Here x_train consists of independent variables in training data, y_train

is the target variable for training data. The validation set is x_test
(independent variables) and y_test (target variable) .
model1 = tree.DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3= LogisticRegression()
model1.fit(x_train,y_train)
pred1=model1.predict(x_test)
final_pred = np.array([])
for i in range(0,len(x_test)):
final_pred = np.append(final_pred, mode([pred1[i], pred2[i
], pred3[i]]))
Alternatively, you can use “VotingClassifier” module in sklearn as

follows:
from sklearn.ensemble import VotingClassifier
model1 = LogisticRegression(random_state=1)
model2 = tree.DecisionTreeClassifier(random_state=1)
model = VotingClassifier(estimators=[('lr', model1), ('dt', mo
del2)], voting='hard')
model.fit(x_train,y_train)
model.score(x_test,y_test)

2.2 Averaging
Similar to the max voting technique, multiple predictions are made for
each data point in averaging. In this method, we take an average of
predictions from all the models and use it to make the final prediction.
Averaging can be used for making predictions in regression problems
or while calculating probabilities for classification problems.
For example, in the below case, the averaging method would take the
average of all the values.
i.e. (5+4+5+4+4)/5 = 4.4
5 4 5 4 4 4.4
Sample Code:
pred1=model1.predict_proba(x_test)
finalpred=(pred1+pred2+pred3)/3
2.3 Weighted Average

This is an extension of the averaging method. All models are assigned

different weights defining the importance of each model for prediction.
For instance, if two of your colleagues are critics, while others have no
prior experience in this field, then the answers by these two friends are
given more importance as compared to the other people.
The result is calculated as [(5*0.23) + (4*0.23) + (5*0.18) + (4*0.18) +

(4*0.18)] = 4.41.
weight 0.23 0.23 0.18 0.18 0.18
rating 5 4 5 4 4 4.41
Sample Code:
finalpred=(pred1*0.3+pred2*0.3+pred3*0.4)
3. Advanced Ensemble techniques
Now that we have covered the basic ensemble techniques, let’s move
on to understanding the advanced techniques.

3.1 Stacking
Stacking is an ensemble learning technique that uses predictions from

multiple models (for example decision tree, knn or svm) to build a new
model. This model is used for making predictions on the test set.
Below is a step-wise explanation for a simple stacked ensemble:
1. The train set is split into 10 parts.
2. A base model (suppose a decision tree) is fitted on 9 parts and

predictions are made for the 10th part. This is done for each part
of the train set.
3. The base model (in this case, decision tree) is then fitted on the
whole train dataset.
4. Using this model, predictions are made on the test set.
5. Steps 2 to 4 are repeated for another base model (say knn)

resulting in another set of predictions for the train set and test
set.

6. The predictions from the train set are used as features to build a
new model.
7. This model is used to make final predictions on the test

prediction set.
Sample code:
We first define a function to make predictions on n-folds of train and

test dataset. This function returns the predictions for train and test for
each model.
def Stacking(model,train,y,test,n_fold):
folds=StratifiedKFold(n_splits=n_fold,random_state=1)
test_pred=np.empty((test.shape[0],1),float)
train_pred=np.empty((0,1),float)
for train_indices,val_indices in folds.split(train,y.values
):
x_train,x_val=train.iloc[train_indices],train.iloc[val_i
ndices]
y_train,y_val=y.iloc[train_indices],y.iloc[val_indices]
model.fit(X=x_train,y=y_train)

train_pred=np.append(train_pred,model.predict(x_val))
test_pred=np.append(test_pred,model.predict(test))
return test_pred.reshape(-1,1),train_pred
Now we’ll create two base models – decision tree and knn.
model1 = tree.DecisionTreeClassifier(random_state=1)
test_pred1 ,train_pred1=Stacking(model=model1,n_fold=10, train
=x_train,test=x_test,y=y_train)
train_pred1=pd.DataFrame(train_pred1)
test_pred1=pd.DataFrame(test_pred1)
test_pred2 ,train_pred2=Stacking(model=model2,n_fold=10,train=
x_train,test=x_test,y=y_train)
train_pred2=pd.DataFrame(train_pred2)
Create a third model, logistic regression, on the predictions of the

decision tree and knn models.
df = pd.concat([train_pred1, train_pred2], axis=1)
df_test = pd.concat([test_pred1, test_pred2], axis=1)
model = LogisticRegression(random_state=1)
model.fit(df,y_train)
model.score(df_test, y_test)
In order to simplify the above explanation, the stacking model we have

created has only two levels. The decision tree and knn models are
built at level zero, while a logistic regression model is built at level one.
Feel free to create multiple levels in a stacking model.

3.2 Blending
Blending follows the same approach as stacking but uses only a

holdout (validation) set from the train set to make predictions. In other
words, unlike stacking, the predictions are made on the holdout set
only. The holdout set and the predictions are used to build a model
which is run on the test set. Here is a detailed explanation of the
blending process:
1. The train set is split into training and validation sets.
2. Model(s) are fitted on the training set.

3. The predictions are made on the validation set and the test set.
4. The validation set and its predictions are used as features to

build a new model.
5. This model is used to make final predictions on the test and
meta-features.
Sample Code:
We’ll build two models, decision tree and knn, on the train set in order
to make predictions on the validation set.

model1.fit(x_train, y_train)
val_pred1=model1.predict(x_val)
test_pred1=model1.predict(x_test)
val_pred1=pd.DataFrame(val_pred1)
val_pred2=model2.predict(x_val)
test_pred2=model2.predict(x_test)
val_pred2=pd.DataFrame(val_pred2)
Combining the meta-features and the validation set, a logistic

regression model is built to make predictions on the test set.
df_val=pd.concat([x_val, val_pred1,val_pred2],axis=1)
df_test=pd.concat([x_test, test_pred1,test_pred2],axis=1)
model = LogisticRegression()
model.fit(df_val,y_val)
model.score(df_test,y_test)
3.3 Bagging
The idea behind bagging is combining the results of multiple models

(for instance, all decision trees) to get a generalized result. Here’s a
question: If you create all the models on the same set of data and
combine it, will it be useful? There is a high chance that these models
will give the same result since they are getting the same input. So how
can we solve this problem? One of the techniques is bootstrapping.
Bootstrapping is a sampling technique in which we create subsets of

observations from the original dataset, with replacement. The size of
the subsets is the same as the size of the original set.
Bagging (or Bootstrap Aggregating) technique uses these subsets

(bags) to get a fair idea of the distribution (complete set). The size of
subsets created for bagging may be less than the original set.

1. Multiple subsets are created from the original dataset, selecting

observations with replacement.
2. A base model (weak model) is created on each of these
subsets.
3. The models run in parallel and are independent of each other.
4. The final predictions are determined by combining the
predictions from all the models.
3.4 Boosting
Before we go further, here’s another question for you: If a data point is

incorrectly predicted by the first model, and then the next (probably all
models), will combining the predictions provide better results? Such
situations are taken care of by boosting.
Boosting is a sequential process, where each subsequent model

attempts to correct the errors of the previous model. The succeeding
models are dependent on the previous model. Let’s understand the
way boosting works in the below steps.
1. A subset is created from the original dataset.

2. Initially, all data points are given equal weights.

3. A base model is created on this subset.
4. This model is used to make predictions on the whole dataset.
5. Errors are calculated using the actual values and predicted

values.
6. The observations which are incorrectly predicted, are given
higher weights.
(Here, the three misclassified blue-plus points will be given
higher weights)
7. Another model is created and predictions are made on the
dataset.
(This model tries to correct the errors from the previous model)
8. Similarly, multiple models are created, each correcting the errors

of the previous model.
9. The final model (strong learner) is the weighted mean of all the
models (weak learners).
Thus, the boosting algorithm combines a number of weak

learners to form a strong learner. The individual models would
not perform well on the entire dataset, but they work well for
some part of the dataset. Thus, each model actually boosts the
performance of the ensemble.

4. Algorithms based on Bagging and Boosting
Bagging and Boosting are two of the most commonly used techniques
in machine learning. In this section, we will look at them in detail.
Following are the algorithms we will be focusing on:
Bagging algorithms:
Bagging meta-estimator
Random forest
Boosting algorithms:
AdaBoost
GBM
XGBM
Light GBM
CatBoost
For all the algorithms discussed in this section, we will follow this
procedure:
Introduction to the algorithm

Sample code
Parameters
For this article, I have used the Loan Prediction Problem. You can
download the dataset from here. Please note that a few code lines
(reading the data, splitting into train-test sets, etc.) will be the same
for each algorithm. In order to avoid repetition, I have written the code
for the same below, and further discussed only the code for the
algorithm.
#importing important packages
import pandas as pd

import numpy as np
#reading the dataset
df=pd.read_csv("/home/user/Desktop/train.csv")
#filling missing values
df['Gender'].fillna('Male', inplace=True)
Similarly, fill values for all the columns. EDA, missing values and
outlier treatment has been skipped for the purposes of this article. To
understand these topics, you can go through this article: Ultimate
guide for Data Exploration in Python using NumPy, Matplotlib
and Pandas.
#split dataset into train and test
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.3, random_state
=0)
x_train=train.drop('Loan_Status',axis=1)
y_train=train['Loan_Status']
x_test=test.drop('Loan_Status',axis=1)
y_test=test['Loan_Status']
#create dummies
x_train=pd.get_dummies(x_train)
x_test=pd.get_dummies(x_test)
Let’s jump into the bagging and boosting algorithms!
4.1 Bagging meta-estimator
Bagging meta-estimator is an ensembling algorithm that can be used

for both classification (BaggingClassifier) and regression
(BaggingRegressor) problems. It follows the typical bagging technique

to make predictions. Following are the steps for the bagging meta-
estimator algorithm:
1. Random subsets are created from the original dataset

(Bootstrapping).
2. The subset of the dataset includes all features.
3. A user-specified base estimator is fitted on each of these smaller
sets.
4. Predictions from each model are combined to get the final result.
Code:
from sklearn.ensemble import BaggingClassifier
from sklearn import tree
model = BaggingClassifier(tree.DecisionTreeClassifier(random_s
tate=1))
model.fit(x_train, y_train)
0.75135135135135134
Sample code for regression problem:
from sklearn.ensemble import BaggingRegressor
model = BaggingRegressor(tree.DecisionTreeRegressor(random_sta
te=1))
Parameters used in the algorithms:
base_estimator:
It defines the base estimator to fit on random subsets of
the dataset.
When nothing is specified, the base estimator is a decision
tree.
n_estimators:
It is the number of base estimators to be created.
The number of estimators should be carefully tuned as a
large number would take a very long time to run, while a
very small number might not provide the best results.

max_samples:
This parameter controls the size of the subsets.
It is the maximum number of samples to train each base
estimator.
max_features:
Controls the number of features to draw from the whole
dataset.
It defines the maximum number of features required to
train each base estimator.
n_jobs:
The number of jobs to run in parallel.
Set this value equal to the cores in your system.
If -1, the number of jobs is set to the number of cores.
random_state:
It specifies the method of random split. When random
state value is same for two models, the random selection
is same for both models.
This parameter is useful when you want to compare
different models.
4.2 Random Forest
Random Forest is another ensemble machine learning algorithm that

follows the bagging technique. It is an extension of the bagging
estimator algorithm. The base estimators in random forest are decision
trees. Unlike bagging meta estimator, random forest randomly selects
a set of features which are used to decide the best split at each node
of the decision tree.
Looking at it step-by-step, this is what a random forest model does:
1. Random subsets are created from the original dataset

(bootstrapping).
2. At each node in the decision tree, only a random set of features
are considered to decide the best split.
3. A decision tree model is fitted on each of the subsets.
4. The final prediction is calculated by averaging the predictions
from all decision trees.

Note: The decision trees in random forest can be built on a subset of

data and features. Particularly, the sklearn model of random forest
uses all features for decision tree and a subset of features are
randomly selected for splitting at each node.
To sum up, Random forest randomly selects data points and features,
and builds multiple trees (Forest) .
Code:
from sklearn.ensemble import RandomForestClassifier
model= RandomForestClassifier(random_state=1)
0.77297297297297296
You can see feature importance by using model.feature_importances_

in random forest.
for i, j in sorted(zip(x_train.columns, model.feature_importan
ces_)):
print(i, j)
The result is as below:
ApplicantIncome 0.180924483743
CoapplicantIncome 0.135979758733
Credit_History 0.186436670523
Property_Area_Urban 0.0167025290557
Self_Employed_No 0.0165385567137
Self_Employed_Yes 0.0134763695267
from sklearn.ensemble import RandomForestRegressor

model= RandomForestRegressor()
Parameters
n_estimators:
It defines the number of decision trees to be created in a
random forest.
Generally, a higher number makes the predictions
stronger and more stable, but a very large number can
result in higher training time.
criterion:
It defines the function that is to be used for splitting.
The function measures the quality of a split for each
feature and chooses the best split.
max_features :
It defines the maximum number of features allowed for the
split in each decision tree.
Increasing max features usually improve performance but
a very high number can decrease the diversity of each
tree.
max_depth:
Random forest has multiple decision trees. This parameter
defines the maximum depth of the trees.
min_samples_split:
Used to define the minimum number of samples required
in a leaf node before a split is attempted.
If the number of samples is less than the required number,
the node is not split.
min_samples_leaf:
This defines the minimum number of samples required to
be at a leaf node.
Smaller leaf size makes the model more prone to
capturing noise in train data.
max_leaf_nodes:
This parameter specifies the maximum number of leaf

nodes for each tree.

The tree stops splitting when the number of leaf nodes
becomes equal to the max leaf node.
n_jobs:
This indicates the number of jobs to run in parallel.
Set value to -1 if you want it to run on all cores in the
system.
random_state:
This parameter is used to define the random selection.
It is used for comparison between various models.

4.3 AdaBoost
Adaptive boosting or AdaBoost is one of the simplest boosting

algorithms. Usually, decision trees are used for modelling. Multiple
sequential models are created, each correcting the errors from the last
model. AdaBoost assigns weights to the observations which are
incorrectly predicted and the subsequent model works to predict these
values correctly.
Below are the steps for performing the AdaBoost algorithm:
1. Initially, all observations in the dataset are given equal weights.

2. A model is built on a subset of data.
3. Using this model, predictions are made on the whole dataset.
4. Errors are calculated by comparing the predictions and actual
values.
5. While creating the next model, higher weights are given to the
data points which were predicted incorrectly.
6. Weights can be determined using the error value. For instance,
higher the error more is the weight assigned to the observation.
7. This process is repeated until the error function does not
change, or the maximum limit of the number of estimators is
reached.
Code:
from sklearn.ensemble import AdaBoostClassifier
model = AdaBoostClassifier(random_state=1)

0.81081081081081086
from sklearn.ensemble import AdaBoostRegressor
model = AdaBoostRegressor()
Parameters
base_estimators:
It helps to specify the type of base estimator, that is, the
machine learning algorithm to be used as base learner.
n_estimators:
It defines the number of base estimators.
The default value is 10, but you should keep a higher
value to get better performance.
learning_rate:
This parameter controls the contribution of the estimators
in the final combination.
There is a trade-off between learning_rate and
n_estimators.
max_depth:
Defines the maximum depth of the individual estimator.
Tune this parameter for best performance.
n_jobs
Specifies the number of processors it is allowed to use.
Set value to -1 for maximum processors allowed.
random_state :
An integer value to specify the random data split.
A definite value of random_state will always produce same
results if given with same parameters and training data.

4.4 Gradient Boosting (GBM)
Gradient Boosting or GBM is another ensemble machine learning

algorithm that works for both regression and classification problems.
GBM uses the boosting technique, combining a number of weak
learners to form a strong learner. Regression trees used as a base
learner, each subsequent tree in series is built on the errors calculated
by the previous tree.
We will use a simple example to understand the GBM algorithm. We

have to predict the age of a group of people using the below data:
1. The mean age is assumed to be the predicted value for all

observations in the dataset.
2. The errors are calculated using this mean prediction and actual
values of age.
3. A tree model is created using the errors calculated above as

target variable. Our objective is to find the best split to minimize
the error.
4. The predictions by this model are combined with the predictions
1.

5. This value calculated above is the new prediction.

6. New errors are calculated using this predicted value and actual
value.
7. Steps 2 to 6 are repeated till the maximum number of iterations

is reached (or error function does not change).
Code:
from sklearn.ensemble import GradientBoostingClassifier
model= GradientBoostingClassifier(learning_rate=0.01,random_st
ate=1)
0.81621621621621621
from sklearn.ensemble import GradientBoostingRegressor
model= GradientBoostingRegressor()
Parameters

min_samples_split
Defines the minimum number of samples (or observations)
which are required in a node to be considered for splitting.
Used to control over-fitting. Higher values prevent a model
from learning relations which might be highly specific to
the particular sample selected for a tree.
min_samples_leaf
Defines the minimum samples required in a terminal or
leaf node.
Generally, lower values should be chosen for imbalanced
class problems because the regions in which the minority
class will be in the majority will be very small.
min_weight_fraction_leaf
Similar to min_samples_leaf but defined as a fraction of
the total number of observations instead of an integer.
max_depth
The maximum depth of a tree.
Used to control over-fitting as higher depth will allow the
model to learn relations very specific to a particular
sample.
Should be tuned using CV.
max_leaf_nodes
The maximum number of terminal nodes or leaves in a
tree.
Can be defined in place of max_depth. Since binary trees
are created, a depth of ‘n’ would produce a maximum of
2^n leaves.
If this is defined, GBM will ignore max_depth.
max_features
The number of features to consider while searching for the
best split. These will be randomly selected.
As a thumb-rule, the square root of the total number of
features works great but we should check up to 30-40% of
the total number of features.
Higher values can lead to over-fitting but it generally
depends on a case to case scenario.

4.5 XGBoost
XGBoost (extreme Gradient Boosting) is an advanced implementation

of the gradient boosting algorithm. XGBoost has proved to be a highly
effective ML algorithm, extensively used in machine learning
competitions and hackathons. XGBoost has high predictive power and
is almost 10 times faster than the other gradient boosting techniques.
It also includes a variety of regularization which reduces overfitting and
improves overall performance. Hence it is also known as ‘regularized
boosting‘ technique.
Let us see how XGBoost is comparatively better than other

techniques:
1. Regularization:
Standard GBM implementation has no regularisation like
XGBoost.
Thus XGBoost also helps to reduce overfitting.
2. Parallel Processing:
XGBoost implements parallel processing and is faster than
GBM .
XGBoost also supports implementation on Hadoop.
3. High Flexibility:
XGBoost allows users to define custom optimization
objectives and evaluation criteria adding a whole new
dimension to the model.
4. Handling Missing Values:

XGBoost has an in-built routine to handle missing values.
5. Tree Pruning:
XGBoost makes splits up to the max_depth specified and
then starts pruning the tree backwards and removes splits
beyond which there is no positive gain.
6. Built-in Cross-Validation:
XGBoost allows a user to run a cross-validation at each
iteration of the boosting process and thus it is easy to get
the exact optimum number of boosting iterations in a
single run.
Code:

Since XGBoost takes care of the missing values itself, you do not have
to impute the missing values. You can skip the step for missing value
imputation from the code mentioned above. Follow the remaining
steps as always and then apply xgboost as below.
import xgboost as xgb
model=xgb.XGBClassifier(random_state=1,learning_rate=0.01)
0.82702702702702702
import xgboost as xgb
model=xgb.XGBRegressor()
Parameters
nthread
This is used for parallel processing and the number of
cores in the system should be entered..
If you wish to run on all cores, do not input this value. The
algorithm will detect it automatically.
eta
Analogous to learning rate in GBM.
Makes the model more robust by shrinking the weights on
each step.
min_child_weight
Defines the minimum sum of weights of all observations
required in a child.
Used to control over-fitting. Higher values prevent a model
from learning relations which might be highly specific to
the particular sample selected for a tree.
max_depth
It is used to define the maximum depth.
Higher depth will allow the model to learn relations very

specific to a particular sample.
max_leaf_nodes
The maximum number of terminal nodes or leaves in a
tree.
Can be defined in place of max_depth. Since binary trees
are created, a depth of ‘n’ would produce a maximum of
2^n leaves.
If this is defined, GBM will ignore max_depth.
gamma
A node is split only when the resulting split gives a positive
reduction in the loss function. Gamma specifies the
minimum loss reduction required to make a split.
Makes the algorithm conservative. The values can vary
depending on the loss function and should be tuned.
subsample
Same as the subsample of GBM. Denotes the fraction of
observations to be randomly sampled for each tree.
Lower values make the algorithm more conservative and
prevent overfitting but values that are too small might lead
to under-fitting.
colsample_bytree
It is similar to max_features in GBM.
Denotes the fraction of columns to be randomly sampled
for each tree.
4.6 Light GBM
Before discussing how Light GBM works, let’s first understand why we
need this algorithm when we have so many others (like the ones we
have seen above). Light GBM beats all the other algorithms when
the dataset is extremely large. Compared to the other algorithms,
Light GBM takes lesser time to run on a huge dataset.
LightGBM is a gradient boosting framework that uses tree-based

algorithms and follows leaf-wise approach while other algorithms work
in a level-wise approach pattern. The images below will help you
understand the difference in a better way.

Leaf-wise grwth may cause over-fitting on smaller datasets but that

can be avoided by using the ‘max_depth’ parameter for learning. You
can read more about Light GBM and its comparison with XGB in this
article.
Code:
import lightgbm as lgb
train_data=lgb.Dataset(x_train,label=y_train)
#define parameters
params = {'learning_rate':0.001}
model= lgb.train(params, train_data, 100)
y_pred=model.predict(x_test)
for i in range(0,185):
if y_pred[i]>=0.5:
y_pred[i]=1
else:
y_pred[i]=0
0.81621621621621621
import lightgbm as lgb

train_data=lgb.Dataset(x_train,label=y_train)
params = {'learning_rate':0.001}
model= lgb.train(params, train_data, 100)
from sklearn.metrics import mean_squared_error
rmse=mean_squared_error(y_pred,y_test)**0.5
Parameters
num_iterations:
It defines the number of boosting iterations to be
performed.
num_leaves :
This parameter is used to set the number of leaves to be
formed in a tree.
In case of Light GBM, since splitting takes place leaf-wise
rather than depth-wise, num_leaves must be smaller than
2^(max_depth), otherwise, it may lead to overfitting.
min_data_in_leaf :
A very small value may cause overfitting.
It is also one of the most important parameters in dealing
with overfitting.
max_depth:
It specifies the maximum depth or level up to which a tree
can grow.
A very high value for this parameter can cause overfitting.
bagging_fraction:
It is used to specify the fraction of data to be used for each
iteration.
This parameter is generally used to speed up the training.
max_bin :
Defines the max number of bins that feature values will be
bucketed in.
A smaller value of max_bin can save a lot of time as it
buckets the feature values in discrete bins which is
computationally inexpensive.

4.7 CatBoost
Handling categorical variables is a tedious process, especially when

you have a large number of such variables. When your categorical
variables have too many labels (i.e. they are highly cardinal),
performing one-hot-encoding on them exponentially increases the
dimensionality and it becomes really difficult to work with the dataset.
CatBoost can automatically deal with categorical variables and does

not require extensive data preprocessing like other machine learning
algorithms. Here is an article that explains CatBoost in detail.
Code:
CatBoost algorithm effectively deals with categorical variables. Thus,

you should not perform one-hot encoding for categorical variables.
Just load the files, impute missing values, and you’re good to go.
from catboost import CatBoostClassifier
model=CatBoostClassifier()
categorical_features_indices = np.where(df.dtypes != np.float)
[0]
model.fit(x_train,y_train,cat_features=([ 0, 1, 2, 3, 4, 10])
,eval_set=(x_test, y_test))
0.80540540540540539
from catboost import CatBoostRegressor
model=CatBoostRegressor()
categorical_features_indices = np.where(df.dtypes != np.float)
[0]
model.fit(x_train,y_train,cat_features=([ 0, 1, 2, 3, 4, 10])
,eval_set=(x_test, y_test))
Parameters
loss_function:

Defines the metric to be used for training.
iterations:
The maximum number of trees that can be built.
The final number of trees may be less than or equal to this
number.
learning_rate:
Defines the learning rate.
Used for reducing the gradient step.
border_count:
It specifies the number of splits for numerical features.
It is similar to the max_bin parameter.
depth:
Defines the depth of the trees.
random_seed:
This parameter is similar to the ‘random_state’ parameter
we have seen previously.
It is an integer value to define the random seed for
training.
This brings us to the end of the ensemble algorithms section. We have

covered quite a lot in this article!
End Notes
Ensemble modeling can exponentially boost the performance of your

model and can sometimes be the deciding factor between first place
and second! In this article, we covered various ensemble learning
techniques and saw how these techniques are applied in machine
learning algorithms. Further, we implemented the algorithms on our
loan prediction dataset.
This article will have given you a solid understanding of this topic. If
you have any suggestions or questions, do share in the comment
section below. Also, I encourage you to implement these algorithms at
your end and share your results with us!

Learn, train, compete, hack and get hired!
You can also read this article on Analytics Vidhya's Android APP
Share this:
    
Like this:
Loading...
Related Articles
30 Top Videos,
Tutorials & Courses
11 most read Machine on Machine Learning
Learning articles from & Artificial Intelligence
Analytics Vidhya in from 2016
2017 How to build December 21, 2016
December 22, 2017 Ensemble Models in In "Machine Learning"
In "Analytics Vidhya" machine learning?
(with code in R)
February 15, 2017
In "Machine Learning"
TAGS : ADABOOST, AVERAGING, BAGGING, BLENDING, BOOSTING, ENSEMBLE
LEARNING, GBM, LIGHT GBM, MAX VOTING, RANDOM FOREST, STACKING, WEIGHTED
AVERAGING, XGB

PREVIOUS ARTICLE NEXT ARTICLE

T DeepMind’s h Facebook’s Machine U
Computer Vision Learning Model
Algorithm Brings the Manipulates Images
Power of to Open Closed
Imagination to Build Eyes!
3D Scenes from 2D
Images
Aishwarya Singh
An avid reader and blogger who loves exploring the endless

world of data science and artificial intelligence. Fascinated by
the limitless applications of ML and AI; eager to learn and
discover the depths of data science.
This article is quite old and you might not get a prompt response
from the author. We request you to post this comment on
Analytics Vidhya's Discussion portal to get your queries
resolved
55 COMMENTS
JOAQUIN Reply
June 18, 2018 at 1:07 pm
Really nice article! And just when I needed the most. Could you please
upload the dataset you used? Im having an error regarding the shapes
when implementing the Stacking Ensemble.
Thank you!
AISHWARYA SINGH Reply

June 18, 2018 at 1:27 pm

Hi Joaquin,
Glad you found this useful. You can download the dataset
from this link.
SURAJ PANDEY Reply

July 8, 2018 at 7:47 pm
model.score(df_test, y_test) is failing as the

shape of df_test and y_test doesn’t match

July 9, 2018 at 9:36 am
Hi Suraj,
Can you please print the shape and

head of your df_test and y_test, and
show the results?
AHMED Reply
September 5, 2018 at
11:31 pm
I have the same problem

dealing with titanic dataset
The shape of `df`: (623, 2)
The shape of `df_test`:
(2680, 2)
The shape of `test_y`: (268,)
The shape of `test_X`: (268,
11)
The shape of `train_X`:
(623, 11)
The shape of `train_X`:
(623,)
AISHW
Reply
SINGH
Septemb

6, 2018
at 11:04
am
Ideally df_test
should have 268
rows.
CHRIS
Reply
PALME
October
11, 2018
at 4:57
pm
Yeah, bad
formatting in the
method. The line
before the return
statement should
be on the same
level as return
statement:
test_pred=np.appe
so it gets run
once, at the
moment it gets
run for each fold.
Also, so set up
the empty array it
should be
test_pred=np.emp
VARSH
Reply
October
26, 2018
at 11:33
pm
Same problem
here. with Pima
Indian Diabetes

dataset
Shape of df_test:
(847, 2)
Shape of y_test:
(77, )
ADITYA Reply
June 18, 2018 at 1:49 pm
Nice Article !!!

June 18, 2018 at 9:38 pm
Thanks Aditya
SANJOY DATTA Reply

June 18, 2018 at 5:00 pm
Thank you. This is great content. Been following it from the beginning.
2 issues:
Getting NameError: tree is not defined.
Secondly, from section 4 onwards, there is dataset to work on. But no

dataset referred to for sections before 4. So cannot run the code on
data.
NameError Traceback (most recent call last)

in ()
3 from sklearn.ensemble import BaggingClassifier
4 #model = tree.DecisionTreeClassifier()
—-> 5 model =
BaggingClassifier(tree.DecisionTreeClassifier(random_state=1))
6 model.fit(x_train, y_train)
7 model.score(x_test,y_test)
NameError: name ‘tree’ is not defined
For beginners like me, will need a little more detail to follow the full
notebook.


June 18, 2018 at 6:11 pm
Hi Sanjoy,
The codes for voting and averaging can be used with any
dataset, and hence no particular dataset is attached to that
section. You can try implementing the codes on loan
prediction dataset and if you face any issues do let me
know.
Regarding the error ‘tree not found’ , please use the

following code line : from sklearn import tree. Thank you for
pointing it out. I will update the same in the post.
SANJOY DATTA Reply

June 19, 2018 at 3:48 pm
Thank you for your response.
Now getting this error. Changed Y/N to 1/0

hoping that would take care of NaN at least. But
the problem persists.
ValueError: Input contains NaN, infinity or a value

too large for dtype(‘float64’).

June 19, 2018 at 4:05 pm
Please impute the missing values in

your dataset. Steps for data
preprocessing have not been included
in this article. You can fill the missing
values using
df['Column_name'].fillna('value',i
SANJOY Reply
DATTA
June 19, 2018 at 7:15

pm
Thank you for your

patience. Missed the
instruction for other fields.
Done that. Now have a
different problem.
After getting dummies for

x_train and x_test, the
number of X variables are
turning out to be different –
449 for train and 205 for test
for 30% test set.
It changes as we change to
20% – to
511 for train and 143 for
test.
For 10% test, it further

changes to
572, 82 respectively.
Obviously the range of

unique values within train
and test are causing this.
Loan_ID is the main
contributor here since each
614 examples have unique
id.
Hence the error now for

30% test separation is:
ValueError: Number of
features of the model must
match the input. Model
n_features is 449 and input
n_features is 205.
If I remove Loan_ID as
input, I get model score 1.0.

Do these make any sense?
AISHW
Reply
SINGH
June 19,
2018 at
7:24 pm
Remove the
Loan_ID before
creating
dummies.
Generally ID is
unique for every
data point and
hence should not
be used for
training the data.
After removing
the loan ID, fit the
model on the
remaining
features and test
on the validation
set.
UMAR L U Reply
June 18, 2018 at 7:45 pm
Wonderful article

June 18, 2018 at 9:38 pm
Thank you
ABHINAV JAIN Reply

June 19, 2018 at 2:44 pm
Hi, In section 3.3 bagging at one point you mentioned that “The size of

the subsets is the same as the size of the original set.” when you are
explaining bootstraping and in the next paragraph you are saying “The
size of subsets created for bagging may be less than the original set.”
Please make it clear. Nice Article !!

June 19, 2018 at 3:20 pm
Hi Abhinav,
In bootstrapping, the size of subsets are same as the size of

the original set. While in bagging, the size of each subset
may be equal to or lesser than the size of the original set.
MEHARUNNISA Reply
June 24, 2018 at 12:37 am
kindly explain the same using R language

June 25, 2018 at 11:54 am
Hi meharunnisa,
Here is an article on ensemble learning in R : How to build

Ensemble Models in machine learning? (with code in R)
ISHIT Reply
June 29, 2018 at 4:23 pm
Can you please explain how did you calculate the Prediction 2 in
gradient boosting?

June 29, 2018 at 5:00 pm
Hi Ishit,
In this case, I have taken a simple example to explain the

concept. So the residuals are considered as target for next

decision tree. The decision tree splits such that the similar
target are in the same node. Further the average of the
node is calculated. This is assigned to all the values in the
as new predictions. I have made some update, please
check if it clarifies your doubt now. If you still face any issue,
do let me know.
AJAY Reply
July 1, 2018 at 12:25 am
Thanks for the article, Aishwarya!

July 2, 2018 at 10:23 am
Hi Ajay,
Glad you liked it!
RAM Reply
July 2, 2018 at 12:03 pm
Can you please explain how did you calculate the Prediction 2 in
gradient boosting?
For Prediction 1 you use following method
mean age =combine all age / number of person age
Residual 1 = age – mean age
in same way how you have calculated predication 2

July 2, 2018 at 7:38 pm
Hi,
We create a decision tree on the residuals. Let us suppose

that the decision tree splits such that all positive numbers
are in one leaf node while negative in other (Just an
example, the results are much more complicated). The

average for each leaf node is taken as the predicted value.

Further these values are combined with the mean and new
residuals are created.
JORGE REYES-SPINDOLA Reply

July 11, 2018 at 10:17 pm
Thank you for a very informative article. Just one issue: When fitting
the BaggingRegressor to the training data, I get the following error:
ValueError: could not convert string to float: ‘Y’
I’m assuming it’s because we need to convert the y_train to {1,0}
Am I correct?
Thanks much
Jorge

July 12, 2018 at 11:34 pm
Hi Jorge,
If your target variable is ‘Y’ and ‘N’ , you should use

BaggingClassifier instead of BaggingRegressor.
HRUSHIKESH Reply
July 16, 2018 at 1:25 pm
Great article. Keep up the good work !

July 17, 2018 at 4:55 pm
Thank you
CHIRANJIT Reply
July 18, 2018 at 12:48 pm

This is very nice, Hoe do you see this in Prod ?

Anyway great work.
AYMEN Reply
July 22, 2018 at 1:36 pm
A nice article
But if i need to use boosting or bagging using different models like

(decision tree, random forest, logistic regression ) how can i implement
it ?

July 31, 2018 at 3:21 pm
Hi Aymen,
If you see the code for bagging classifier, you will observe
that we can provide the classifier we wish to use. As an
example, I have used decision tree, you can use random
forest or logistic regression.
MANOJ Reply
August 27, 2018 at 12:16 pm
Hi !, in the stacking function,

you are initiating “test_pred” with some random floats of shape
(test.shape[0],1)
code: test_pred = np.empty(test.shape[0],1,float)
later in the same function , you are appending the predicted values of
“test” dataset to the already existing “test_pred”.,
1) if “test_pred” is the predictions of the “test” dataset that we pass in
the funciton, they should have same number of rows, but in the way it
was coded, the number of rows will be twice the number of rows of
“test” dataset, since “test_pred” was already initiated with some
random numbers(the empty commad generates) of rows equal to rows
in “test”, and then adding the equal number of additional predictions to
those already existing rows(while appending the predicitons of test),
need some clarification…
ex: in example show, the shape of “test_pred” should be (154,1) since

“test” dataset passed was of shape(154,8)., but the shape the function
“test_pred” the function is returning is twice ie., (308,1)
2) And any particular reason, why “test_pred” was not initiated like
“train_pred” with empty array shape (0,1), instead of shape
(test.shape[0],1) ?

August 27, 2018 at 8:15 pm
Hi manoj,
When I use np.empty(shape), it should give me an empty

array of the shape assigned. If you are getting an error,
replace this line and define test_pred in the same way as
train_pred is defined.
PO-YU KAO Reply

December 22, 2018 at 5:55 am
Hello Manoj,
I think `test_pred=np.append(test_pred,model.predict(test))`
should be placed outside the for loop.
SHIVA Reply
October 17, 2018 at 5:05 am
Thanks for this awesome post. Do you hav any posts explaining
stacked ensemble of pretrained Deep Learning models with image
inputs? Can you point to any resources otherwise?

October 17, 2018 at 2:12 pm
Hi Shiva,
I haven’t researched on ensemble of pretrained models yet.

If I come across a relevant post, I’d share it with you.

ROSA Reply
Hey thanks very much for your help. I am trying to run the stacking
method and I got this error AttributeError: ‘numpy.ndarray’ object has
no attribute ‘values’. Can you please explain me why. Pd. I am new on
programming Thanks in advance

Hi,
Looks like the you are using .values on an array. Convert

it into a dataframe and use the command.
JINGMIAO SHEN Reply

November 1, 2018 at 2:09 am
How should I miss such a great article before???

I have become your fan now, AISHWARYA!!!
Love “Concept + Code” blog, easy to follow and implement.
Appreciate your time !!!

November 1, 2018 at 12:18 pm
Thanks a lot!
SADIQ Reply
December 27, 2018 at 7:45 pm
Thanks for the detailed and organized article.

Could you please help me on following issue?
df has 2 features and we fitted the level one model to df and y_train,
my question is how can we use this model to predict x_test as we
need to get y_test (predicted y for test data set) for x_test. Model fitted
with 2 features but x_test has 20 features so could not use the model
for x_test.
For example if I want to use level one model to predict Loan_Status for

Loan Prediction competition after(model.fit(df,y_train)) how can I use

model.predict(x_test)? Showing following error.!
ValueError: X has 20 features per sample; expecting 2

January 2, 2019 at 11:45 am
Hi Sadiq,
The dataset you train and the daatset on which you want to
predict should have the same number of features. df should
have 20 features or you will have to drop the remaining 18
features from x_test. Which part of the code in the article (or
uder which section) did you face the error?
SARWOJOWO Reply
January 30, 2019 at 2:02 pm
same issue i have error like this ; ValueError: Found array

with dim 4. Estimator expected <= 2. how to solved this ?
HUSEYIN Reply
Well organized and informative article.
I have a question:
What do you think about their usage in real life. Although there are
powerful boosting algoritms( like “XGBoost”), do we still need stacking,
blending or voting based learning?
Thank you

February 20, 2019 at 2:16 pm
Hi,

I am aware that we have powerful algorithms that are able

to give excellent performance. But the idea behind covering
the concepts of stacking, blending was to start with the
basics and then move to complex algorithms
SHUKRITY SI Reply
It is a very useful article for ensemble methods. But while using

blending, I get the error “cannot concatenate a non-NDFrame object.
Can you please guide me to avoid the error?

Hi shukrity, Please check the type of the data you are using.
is it a dataframe?
DENNIS CARTIN Reply

Nice article. However, I am looking for ensemble for Keras model. Can
you share your knowledge please?
SHUKRITY SI Reply
Can you please help me out of my problem regarding stacking. In my

dataset, size of train set is 7116 and size of test set is 1780. So,
df_test and y_test should same in size(1780). But, size of df_test is
shown 10680. So,value error arises for this inconsistency.
Please tell me how can I solve this problem?

Could you share the notebook with me? Or the code so that
I can copy paste and check at my end.

SHAMEER Reply
Nice Article !!!
DATA SCIENTISTS
COMPANIES
JOIN OUR
Don't have an account? Sign up here. COMMUNITY :
© Copyright 2013-2019 Analytics Vidhya.


46336

20474


7513

A Comprehensive Guide To Ensemble Learning (With Python Codes) PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

A Comprehensive Guide To Ensemble Learning (With Python Codes) PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

A Comprehensive Guide to Ensemble Learning (with Python codes)

× Registrations Closing: Certified AI & ML BlackBelt Program (Beginner to Master) Closes In

AI & ML BLACKBELT PROGRAM

Home n Machine Learning n A Comprehensive Guide to Ensemble Learning (with

MACHINE LEARNING PYTHON

A Comprehensive Guide to Ensemble

AISHWARYA SINGH, JUNE 18, 2018 LOGIN TO BOOKMARK THIS ARTICLE

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

Ensemble models in machine learning operate on a similar idea. They

The objective of this article is to introduce the concept of ensemble

Note: This article assumes a basic understanding of Machine Learning

1. Introduction to Ensemble Learning

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

2.3 Weighted Average

C: How about asking 50 people to rate the movie?

The responses, in this case, would be more generalized and

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

2. Simple Ensemble Techniques Learn how to Build and

The result of max voting would be something like this:

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

Here x_train consists of independent variables in training data, y_train

final_pred = np.append(final_pred, mode([pred1[i], pred2[i

Alternatively, you can use “VotingClassifier” module in sklearn as

from sklearn.ensemble import VotingClassifier

model = VotingClassifier(estimators=[('lr', model1), ('dt', mo

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

i.e. (5+4+5+4+4)/5 = 4.4

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating

2.3 Weighted Average

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

This is an extension of the averaging method. All models are assigned

The result is calculated as [(5*0.23) + (4*0.23) + (5*0.18) + (4*0.18) +

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating

weight 0.23 0.23 0.18 0.18 0.18

3. Advanced Ensemble techniques

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

Stacking is an ensemble learning technique that uses predictions from

1. The train set is split into 10 parts.

2. A base model (suppose a decision tree) is fitted on 9 parts and

5. Steps 2 to 4 are repeated for another base model (say knn)

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

7. This model is used to make final predictions on the test

We first define a function to make predictions on n-folds of train and

for train_indices,val_indices in folds.split(train,y.values

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

test_pred1 ,train_pred1=Stacking(model=model1,n_fold=10, train

Create a third model, logistic regression, on the predictions of the

df = pd.concat([train_pred1, train_pred2], axis=1)

df_test = pd.concat([test_pred1, test_pred2], axis=1)

In order to simplify the above explanation, the stacking model we have

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

Blending follows the same approach as stacking but uses only a

1. The train set is split into training and validation sets.

2. Model(s) are fitted on the training set.

4. The validation set and its predictions are used as features to

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

Combining the meta-features and the validation set, a logistic

The idea behind bagging is combining the results of multiple models

Bootstrapping is a sampling technique in which we create subsets of

Bagging (or Bootstrap Aggregating) technique uses these subsets

https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/[30-Apr-19 3:15:10 AM]

1. Multiple subsets are created from the original dataset, selecting

The result is calculated as [(50.23) + (40.23) + (50.18) + (40.18) +