Random Forest

Random Forests
classification, variable selection and

consistency
Mikhail Traskin
University of Pennsylvania
The Wharton School
Department of Statistics
Random Forests, Stat 900, November 26, 2007 – p. 1/26

Random Forests
Ensemble classification (and regression) algorithm

Random Forests
Proposed by Leo Breiman in 1999

Random Forests
Easy to implement

Random Forests
Easy to implement
Very effective in applications, has good generalization
properties

Random Forests
Easy to implement
Very effective in applications, has good generalization
properties
Algorithm outputs more information than just class
label

Breiman’s Experiments
Dataset AdaBoost RF Time ratio

Votes 4.8 4.1 N/A
German credit 23.5 24.4 N/A
Letters 3.4 3.5 N/A
Sat-images 8.8 8.6 N/A
Zip-code 6.2 6.3 0.025
Waveform 17.8 17.2 N/A
Twonorm 4.9 3.9 N/A

Classification or Regression Problem
We are given
Sn = {(Xi , Yi )}ni=1 — set of i.i.d. observations
distributed as P.
Xi ∈ X — predictors
Yi ∈ Y — responses
Goal: find fn = A(Sn ) s.t. E(ℓ(fn (X), Y )) is minimized.

Abstract Definition
Breiman (2001) defines random forest as follows.
Definition 1 A random forest is a classifier consisting of a
collection of tree-structured classifiers {h(x, Θk ), k = 1, . . .}
where the Θk are independent identically distributed
random vectors and each tree casts a unit vote for the
most popular class at input x.

The Random Forests Algorithm
1. Choose T —number of trees to grow.
2. Choose m—number of variables used to split each node.
m ≪ M , where M is the number of input variables. m is hold
constant while growing the forest.
3. Grow T trees. When growing each tree do the following.
(a) Construct a bootstrap sample of size n sampled from Sn with
replacement and grow a tree from this bootstrap sample.
(b) When growing a tree at each node select m variables at
random and use them to find the best split.
(c) Grow the tree to a maximal extent. There is no pruning.
4. To classify point X collect votes from every tree in the forest and
then use majority voting to decide on the class label.

Compare to: Bagging
Breiman, 1996
Works with any classification algorithm
Like Random Forests uses bootstrapping
Treats the underlying classification algorithm as a
"black box"
Variance reduction technique

Compare to: Random Split Selection
Dietterich, 2000
Grow multiple trees
When splitting, choose split uniformly at random from
K best splits
Can be used with or without pruning

Compare to: Random Subspace
Ho, 1998
Grow multiple trees
Each tree is grown using a fixed subset of variables
Do a majority vote or averaging to combine votes from
different trees

RF and Error Estimation
1. For each pairs (xi , yi ) in the training sample
Select only trees that do not contain the pair
Classify the pair with each of the selected trees
Compute misclassification rate for the pair
2. Average over computed estimates

RF and Variable Selection
1. For each tree in the forest
Classify out-of-bag cases and count number of
correct votes
Permute variable m in the out-of-bag sample
Classify permuted out-of-bag sample and count
number of correct votes
Compute the difference between the unpermuted
and permuted counts
2. Compute the average and sd of the differences
3. Compute z-statistic

RF and Interactions
Gini importance for each variable
Rank gini importance scores for each tree
For each pair of variables compute the average rank
difference over all trees

Unsupervised Learning
(Dis)similarity measure
For each tree put all the training sample down the tree
For each pair of observations compute fraction of trees
sij where they end up in the same node
p
Compute dissimilarity as dij = 1 − sij

Synthetic datasets
Mark observed data as “observed”
Generate a synthetic sample from the product of
marginal of observed data
Mark generated data as “unobserved”

Clustering
Train random forest on the synthetic data
Use the forest to compute the dissimilarity measure
only for the observed data
Use any clustering algorithm with the computed
dissimilarity measure

Universal Consistency
Assume i.i.d. data (X, Y ), Sn = {(Xi , Yi )}ni=1 from
X × Y, with Y = {−1, 1}.
Consider a method fn = A(Sn ), for example
fn = AdaBoost(Sn , tn ).
Definition 2 Method is universally consistent if for any
distribution P
a.s. ∗
L(fn ) →L ,
where L is the risk and L∗ is the Bayes risk:
L(fn ) = P(sign(fn (X)) 6= Y |Sn ), L∗ = inf L(f ).

f

Is Random Forests Consistent?
Breiman (2001) wrote:
Section 2 gives some theoretical background for random
forests. Use of the Strong Law of Large Numbers shows
that they always converge so that overfitting is not a
problem.

Is Random Forests Consistent?
Breiman (2001) wrote:
Section 2 gives some theoretical background for random
forests. Use of the Strong Law of Large Numbers shows
that they always converge so that overfitting is not a
problem.
···
This result explains why random forests do not overfit as
more trees are added, but produce a limiting value of the
generalization error.

One-Dimensional Case
Theorem 3 Consider binary classification problem. If
X = R then classification Random Forests algorithm is
equivalent to 1-nearest neighbor classifier and hence is not
consistent.
Theorem 4 Consider binary classification problem. If
X = R and bootstrap sample size k → ∞ s.t. k = o(n) then
classification Random Forests algorithm is consistent.

X = [0, 1], η(x) = P(Y = 1|x) = 0.25 + 0.5I{x≥0.5} ,
L1N N = 0.375


Two-Dimensional Case

Two-Dimensional Case

Four-Dimensional Case

Eight-Dimensional Case

Four-Dimensional Case
Decision boundary: hyperplane

Other versions of ensemble classifiers
Biau et al. (2007)
Consistency of purely random forest
Consistency of bagged nearest neighbor rules
Consistency of forest consisting of trees based on the
partitioning the space into nested rectangles

Random Forest

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Random Forest

Diunggah oleh

Hak Cipta:

Format Tersedia

Random Forests

classification, variable selection and

Random Forests, Stat 900, November 26, 2007 – p. 1/26

Random Forests, Stat 900, November 26, 2007 – p. 2/26

Random Forests, Stat 900, November 26, 2007 – p. 2/26

Random Forests, Stat 900, November 26, 2007 – p. 2/26

Random Forests, Stat 900, November 26, 2007 – p. 2/26

Random Forests, Stat 900, November 26, 2007 – p. 2/26

Dataset AdaBoost RF Time ratio

Random Forests, Stat 900, November 26, 2007 – p. 3/26

Random Forests, Stat 900, November 26, 2007 – p. 4/26

Random Forests, Stat 900, November 26, 2007 – p. 5/26

Random Forests, Stat 900, November 26, 2007 – p. 6/26

Random Forests, Stat 900, November 26, 2007 – p. 7/26

Random Forests, Stat 900, November 26, 2007 – p. 8/26

Random Forests, Stat 900, November 26, 2007 – p. 9/26

Random Forests, Stat 900, November 26, 2007 – p. 10/26

Random Forests, Stat 900, November 26, 2007 – p. 11/26

Random Forests, Stat 900, November 26, 2007 – p. 12/26

Random Forests, Stat 900, November 26, 2007 – p. 13/26

Random Forests, Stat 900, November 26, 2007 – p. 14/26

Random Forests, Stat 900, November 26, 2007 – p. 15/26

L(fn ) = P(sign(fn (X)) 6= Y |Sn ), L∗ = inf L(f ).

Random Forests, Stat 900, November 26, 2007 – p. 16/26

Random Forests, Stat 900, November 26, 2007 – p. 17/26

Random Forests, Stat 900, November 26, 2007 – p. 17/26

Random Forests, Stat 900, November 26, 2007 – p. 18/26

Random Forests, Stat 900, November 26, 2007 – p. 19/26

Random Forests, Stat 900, November 26, 2007 – p. 20/26

Random Forests, Stat 900, November 26, 2007 – p. 21/26

Random Forests, Stat 900, November 26, 2007 – p. 22/26

Random Forests, Stat 900, November 26, 2007 – p. 23/26

Random Forests, Stat 900, November 26, 2007 – p. 24/26

Random Forests, Stat 900, November 26, 2007 – p. 25/26

Random Forests, Stat 900, November 26, 2007 – p. 26/26

Anda mungkin juga menyukai