Anda di halaman 1dari 14

Outlines

Introduction

Statements of Problems

Objectives

Bagging

Random Forest

Boosting

AdaBoost
 Machine learning studies automatic techniques for learning to make accurate
predictions based on past observations.
 Machine learning was defined in 90’s by Arthur Samuel described
as it is a field of study that gives the ability to the computer for
self-learn without being explicitly programmed.
 Supervised machine learning is one of the methods associated with machine
learning which involves allocating labeled data so that a certain pattern or
function can be deduced from that data.
◦ Classification: Majority vote within the region.
◦ Regression: Mean of training data within the region.
◦ CART: Classification and regression trees.
 Bootstrap aggregation or bagging is a general-purpose procedure for reducing the
variance of a statistical learning method we introduce it here because it is
particularly useful and frequently used in the context of decision trees.
 Bagging (bootstrap + aggregating) or simple Bagging equal to
Bootstrap Aggregating.
 Bagging can drastically improve the performance of CART.
 Bagging seems to work especially well for high variance, low bias
procedures such as trees.
 Bagging (stands for Bootstrap Aggregating) is a way to decrease the variance
of your prediction by generating additional data for training from your original
dataset using combinations with repetitions to produce multisets of the same
cardinality/size as your original data.
 Bagging (Breinman, 1996) fit many large trees to bootstrap resampled versions of the training data and classify by
majority vote.
 Bagging can be :
 Parallel ensemble each model is built independently
 Aim to decrease variance not bias
 Suitable for high variance low bias models (complex models)

 Method
 Train multiple (k) models on different samples (data splits) and average their predictions.

 Predict (test) by averaging the results of k models.

 Goal
 Improve the accuracy of one model by using its multiple copies.

 Average of misclassification errors on different data splits gives a better estimate of the predictive
ability of a learning method.
 The population build a separate prediction model using each training set and
average the resulting predictions.
 While bagging can improve predictions for many regression methods it is very
useful for regression trees.
 To apply bagging to regression trees we:
 Construct B regression trees using B bootstrapped training sets.
 We then average the predictions.
 These trees are grown deep and are not pruned.
 Each tree has a high variance with low bias. Averaging the B trees brings down the variance.
 Bagging has been shown to give impressive improvements [[cite]] by
 combining hundreds or thousands of trees in a single procedures.
 Random Forest is a supervised learning algorithm.
 Random forests provide an improvement over bagged trees by way of a small tweak that
decorrelates the trees.
 Random forests is a substantial modification of bagging that builds a large collection of de-
correlated trees and then averages them.
 Random Forest is also one of the algorithms used in regression technique and it is very flexible
easy to use machine learning algorithm that produces even without hyper-parameter tuning.
 This algorithm widely used because of its simplicity and the fact that it can use for both
regression and classification tasks.

 Also this algorithm widely used because of its simplicity and the fact that it can use
for both regression and classification tasks.
 Random Forests can be :
 We fit a decision tree to different Bootstrap samples.
 When growing the tree we select a random sample of m < p predictors to consider in each step. This will lead to very different (or
uncorrelated) trees from each sample.
 Finally average the prediction of each tree.

 The optimal m is usually around √p but this can be used as a tuning parameter.
 Statistically Random Forests are appealing because of the additional features they provide such as
 measures of variable importance
 differential class weighting
 missing value imputation
 visualization
 outlier detection
 unsupervised learning.
 Random forests provide an improvement over bagged trees by way of a small
tweak that decorrelates the trees. As in bagging, we build a number of decision
trees on bootstrapped training samples.
 Random Forest has tremendous potential of becoming a popular technique for
future classifiers because its performance has been found to be comparable with
ensemble techniques bagging and boosting.
 It is collection of unpruned CARTS which used rule to combine individual tree
decisions.
 It is used for the purpose of improve prediction accuracy.
 Hyper parameters are the arguments that can be set before training and which define how the
training is done.
 The main hyper parameters in Random Forests are
 The number of decision trees to be combined
 The maximum depth of the trees
 The maximum number of features considered at each split
 Whether bagging/bootstrapping is performed with or without replacement

 Random Forest implementations are available in many machine learning libraries for R and
Python like caret (R imports the random Forest and other RF packages) Scikit-learn (Python)
and H2O (R and Python).

 The pros of Random Forests are that they are a relatively fast and powerful
algorithm for classification and regression learning. Calculations can be
parallelized and perform well on many problems, even with small datasets and
the output returns prediction probabilities.
 Like bagging boosting is a general approach that can be applied to many statistical learning

methods for regression or classification.


 Boosting works in a similar way except that the trees are grown sequentially each tree is
grown using information from previously grown trees.
 Boosting is a two-step approach where one first uses subsets of the original data to produce a
series of averagely performing models and then boosts their performance by combining them
together using a particular cost function (=majority vote).

 Unlike bagging in the classical boosting the subset creation is not random and depends upon
the performance of the previous models every new subsets contains the elements that were
(likely to be) misclassified by previous models.

 Boosting can Reduce variance (the same as Bagging) But also to eliminate the effect
of high bias of the weak learner (unlike Bagging).

 Boosting works by primarily reducing bias in the early stages and primarily
reducing variance in latter stages.
 Boosting can be:
 sequential ensemble try to add new models that do well where previous models lack
 aim to decrease bias not variance
 suitable for low variance high bias models
 an example of a tree based method is gradient boosting

 Boosting is an ensemble technique that attempts to create a strong classifier from


a number of weak classifiers.
 It is the best starting point for understanding boosting. Modern boosting methods
build on AdaBoost most notably stochastic gradient boosting machines.
 Boosting often Robust to overfitting. Test set error decreases even after training
error is zero. Boosting is all about Combine weak classifiers to obtain very strong
classifier , Weak classifier slightly better than random on training data and
Resulting very strong classifier can eventually provide zero training error.
 It is boosting by sampling , the AdaBoost algorithm introduced in 1995 by Freund and
Schapire solved many of the practical difficulties of the earlier boosting algorithms.
 Instead of resampling reweight misclassified training examples.
 AdaBoost can be:
 Generate a sequence of base learners each focusing on previous one’s errors.
 The probability of a correctly classified instance is decreased and the probability of a miss classified instance
increases. This has the effect that the next classifier focuses more on instances miss classified by the previous
classifier.

 AdaBoost is used with short decision trees. After the first tree is created, the
performance of the tree on each training instance is used to weight how much
attention the next tree that is created should pay attention to each training
instance.
 Bagging, random forests and boosting are good methods for improving the
prediction accuracy of trees.
 They work by growing many trees on the training data and then combining the
predictions of the resulting ensemble of trees.
 The latter two methods random forests and boosting are among the state-of-
the-art methods for supervised learning.
 Combining multiple learners has been a popular topic in machine learning
since the early 1990s and research has been going on ever since.

Anda mungkin juga menyukai