0 Suka0 Tidak suka

102 tayangan6 halamanNon technical guide to tree based methods:
Regression and classification trees
Bagging
Random Forest
Boosting

Jun 04, 2014

© © All Rights Reserved

PDF, TXT atau baca online dari Scribd

Non technical guide to tree based methods:
Regression and classification trees
Bagging
Random Forest
Boosting

© All Rights Reserved

102 tayangan

Non technical guide to tree based methods:
Regression and classification trees
Bagging
Random Forest
Boosting

© All Rights Reserved

- The Woman Who Smashed Codes: A True Story of Love, Spies, and the Unlikely Heroine who Outwitted America's Enemies
- Steve Jobs
- NIV, Holy Bible, eBook
- NIV, Holy Bible, eBook, Red Letter Edition
- Hidden Figures Young Readers' Edition
- Cryptonomicon
- Console Wars: Sega, Nintendo, and the Battle that Defined a Generation
- Make Your Mind Up: My Guide to Finding Your Own Style, Life, and Motavation!
- The Golden Notebook: A Novel
- Alibaba: The House That Jack Ma Built
- Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
- Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
- Autonomous: A Novel
- The 10X Rule: The Only Difference Between Success and Failure
- Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
- Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy

Anda di halaman 1dari 6

Tree-Based Methods

Tree-based methods for regression and classication involve stratifying or segmenting the predictor space into a

number of simple regions. Since the set of splitting rules used to segment the predictor space can be summarized

in a tree, these types of approaches are known as decision-tree methods.

Tree-based methods are easy to interpret, but not competitive with the best supervised learning approaches in

terms of prediction accuracy. They can be made competitive when they are combined in ensembles of trees

(bagging, random forests, and boosting). Combining a large number of trees can often result in dramatic

improvements in prediction accuracy, at the expense of some loss interpretation

Example on how we would stratify some data and the resulting tree:

Building a tree

Final nodes (red and grey) are called terminal nodes (Red, Grey). The points along the tree where the predictor

space is split are referred to as internal nodes (X<60, Y<20, X<85). Variable that is split on rst is the most

important factor.

How does the automatic tree growing algorithm work?

Regression Tree

The idea is, we want to divide the predictor space into J distinct, non-overlapping regions. For every observation

that falls into the region Rj, we make the same prediction, which is simply the mean of the response values for the

training observations in Rj.

So each region J represents one of the terminal leaves, where we represent the observations by an average, and we

have in this case J such terminal leaves and were going to choose a set of boxes so the total variation of

2 of 6 2 of 6

observations around their mean in a box is as small as possible (we want to minimize RSS):

where is the mean response for the training observations within the box.

It is computationally infeasible to consider every possible partition of the feature space into J boxes, so trees use

an approximation called top-down greedy approach.

Its top-down because it starts at the top with a whole set of observations, and then it splits them into two pieces,

one a time at each level. Its greedy because it doesnt nd the best split among all possible splits, but only the

best split at that particular step.

Tree-growing process:

We start at the top with the full set of data and all the predictors, and we look for the predictor and the split that

leads to greatest possible deduction in RSS, and make a split to produce two nodes.

Next, we repeat the process, looking for the best predictor and best cutpoint in order to split the data further so as

to minimize the RSS within each of the resulting regions. The process continues until a stopping criterion is

reached; for instance, we may continue until no region contains more than ve observations.

Prediction

Once a tree is built, we predict the test observation by passing it down the tree, obeying each of the splits. It will

end up in a terminal node, and then we will use the mean of the training observations in that region to make the

prediction.

One question that arises is how large should the tree be? When should we stop growing the tree? One possibility

would just be to grow as large a tree as possible. At the extreme we could have a tree that has one observation in

each terminal node. But would overt the data, producing poor test set performance. If you have a tree that is so

large that each observation has its own terminal node, its going to have a training error of 0. But its going to be

adapted to that data too much and its going to overt.

One possible alternative to the process described above is to grow the tree only so long as the decrease in the RSS

due to each split exceeds some (high) threshold. This strategy will result in smaller trees, but is too short-sighted:

a seemingly worthless split early on in the tree might be followed by a very good split that is, a split that leads

to a large reduction in RSS later on.

A better strategy is to grow a very large tree , and then prune it back in order to obtain a subtree. Cost-

complexity (weakest-link) pruning is used for this.

(

j=1

J

iR

j

y

i

y

^

R

j

)

2

y

^

R

j

j

th

T

0

3 of 6 3 of 6

We consider a sequence of trees indexed by a nonnegative tuning parameter . For each value of there

corresponds a subtree such that

is as small as possible.

|T| - number of terminal nodes in the tree T.

- subset of predictor space corresponding to the terminal node.

- mean of the training observations in

The tuning parameter controls a trade-o! between the subtrees complexity and its t to the training data.

We select an optimal value using cross-validation. We then return to the full data set and obtain the subtree

corresponding to .

Classication Tree

Very similar to a regression tree, except that it is used to predict a qualitative response rather than a quantitative

one. For a classication tree, we predict that each observation belongs to the most commonly occurring class of

training observations in the region to which it belongs.

We grow the tree in the very same way as we did for regression trees. But we dont use residual sum of squares as

a criterion for making the splits. We need a criterion thats more geared towards classication.

One such measure is the Gini index. It is a variance measure across the classes. If the Gini index is small, that

means that one class is favored. In the extreme case, if the region is pure, so all observations are in one class, Gini

is zero. For this reason the Gini index is referred to as a measure of node purity a small value indicates that a

node contains predominantly observations from a single class.

- proportion of training observations in the m

th

region that are from the k

th

class.

An alternative to the Gini index is cross-entropy, that gives very similar results. Another is classication error

rate, but in practice two other measures are preferable.

Trees are easy to interpret, but generally do not have the same level of predictive accuracy as some of the other

regression and classication approaches. However, by aggregating many decision trees, the predictive

T T

0

( + |T|

m=1

|T|

x

i

R

m

y

i

y

^

R

m

)

2

R

m

m

th

y

^

R

m

R

m

^

G = (1 )

k=1

K

p

^

mk

p

^

mk

p

^

mk

4 of 6 4 of 6

performance of trees can be substantially improved.

Bagging

Bootstrap aggregation, or bagging, is a general-purpose procedure for reducing the variance of a statistical

learning (frequently used in the context of decision trees).

At the heart of the bagging idea is that averaging a set of observations reduces variance. But in reality we are

likely to have access to only one available training set and wont be able to resample.

Bagging is going to try to achieve variance reduction, by taking bootstrap samples from a training set of the same

size as our original sample. to create pseudo training sets. Then we grow a tree on each one and then take their

average for the regression trees, and majority vote for classication trees.

Trees dont need to be pruned, because bagging them reduces variance. The whole idea of pruning a tree back

was to reduce the variance, but pruning increases bias. Bagging has low bias and reduces variance by averaging.

Out-of-Bag Error Estimation

There is a very straightforward way to estimate the test error of a bagged model. On average, each bagged tree

makes use of around two-thirds of the observations. The remaining one-third of the observations not used to t a

given bagged tree, are referred to as the out-of-bag (OOB) observations.

We can predict the response for the i

th

observation using each of the trees in which that observation was OOB.

This will yield around (n trees)/3 predictions for the i

th

observation, which we average. This estimate is

essentially the leave-one-out (LOO) cross-validation error for bagging, if the number of trees is large.

Random Forests

Idea of random forest is to build trees in such a way as to actually make the correlation between trees smaller.

This reduces the variance when we average the trees.

In random forest, when we build the trees and make splits, every time we consider a split in a tree we dont

consider all the possible predictors as we normally do, but rather we select at random m predictors among the p

total predictors. And the m is typically about the . So, if there are 100 predictors in our data set, every time we

go to make a split we dont consider all 100, but rather we take a random subset of 10 of them, the other 90 are

not considered. A new selection is made at every possible split.

Random forest improves bagging test error by one or 2 percent. By throwing away predictors we have

decorrelated the trees and the resulting average is a little better. Out-of-bag estimate is also a little better than it

was for bagging.

p

5 of 6 5 of 6

Boosting

Like bagging, boosting is a general approach that can be applied to many statistical learning methods for

regression or classication. And similar to bagging and random forest, it gives prediction models that are

averages over trees.

But theres a fundamental difference. For the random forest and bagging, the trees that are averaged are all

equivalent and independent, and the averaging is just used to reduce variance.

With boosting trees are grown sequentially: each tree is grown using information from previously grown trees.

And each of the trees thats added into the mix is added to improve on the performance of the previous collection

of trees.

Boosting Algorithm for Regression Trees

is a function that is an average of trees. We start with , so residuals are just data observations and

we have for all i in the training set.

For B trees (b=1,2,.B), repeat:

1) t a tree with d splits (d+1 terminal nodes) to the training data (X, r), where r is the current

residual

2) update by adding in a shrunken version of the new tree (shrunken by some factor ):

3) update the residuals:

Output the boosted model:

is the sum of B shrunken trees

Idea behind this procedure?

With a single tree, we can t a large tree to the data, and we can overt. In contrast, the idea boosting is to learn

more slowly. So we start with a tree. But rather than accept the full tree we shrink it back by for example, a factor

(x) f

^

(x) = 0 f

^

y

i

= r

i

y

i

f

b

^

f

^

^

f

^

f

b

^

(x) r

i

r

i

f

b

^

(x) = (x) f

^

b=1

B

f

b

^

6 of 6 6 of 6

of 0.01. And then we take residuals and repeat. The idea being that instead of overtting, we are tting very

slowly.

At each try we pick up a small piece of the signal, instead of trying to grab a lot of signal with a large amount of

tting, it ts very slowly in small parts, shrinking each time in order to approximate the signal without

overtting. Boosting also works for classication. Its similar in spirit but is slightly more complex.

Tuning Parameters

1. The number of trees B. Unlike bagging and random forests, boosting can overt if B is too large, although

this overtting tends to occur slowly if at all. We use cross-validation to select B.

2. The shrinkage parameter , a small positive number. This controls the rate at which boosting learns.

Typical values are 0.01 or 0.001, and the right choice can depend on the problem. Very small can require

using a very large value of B in order to achieve good performance.

3. The number of splits d in each tree, which controls the complexity of the boosted ensemble. Often d = 1

(single split) works well, in which case each tree involves a single variable, resulting in an additive model,

so there are no interactions.

if d = 2 it can involve at most two variables, so thats pairwise interactions.

More generally d is the interaction depth, and controls the interaction order of the boosted model,

since d splits can involve at most d variables. Typically you would try a few values of d (say 1, 2, 4,

and 8.)

Variable importance measure

For bagged and random forrest regression trees, we use the total drop in RSS (residual sum of squares) for a given

predictor over all splits in the tree. We look at all the splits to see whether the variable was involved in that split.

If it was, we measure how much it dropped the RSS and average over all the trees.

Similarly, for bagged/RF classication trees, we add up the total amount that the Gini index is decreased by splits

over a given predictor, averaged over all B trees.

Tree Based Methods

Gabriela Hromis

Notes are based on different books and class notes from different universities, especially from Statistical Learning,

Hastie & Tibshirani

- Distributions of Random VariablesDiunggah olehghromis
- Principles of SurveyingDiunggah olehSajid Nazir
- RFID pptDiunggah olehapi-3806325
- Resampling MethodsDiunggah olehghromis
- Anti-Collision Algorithm for RFID TagsDiunggah olehNEO_PATEL
- RFIDDiunggah olehHitesh Mohapatra
- Dimensionality ReductionDiunggah olehghromis
- Introduction to RfidDiunggah olehCoreRFID
- Bit by BitDiunggah olehWorm of Nilus
- The Failure Mode and Effect Analysis Fmea Implementation for Csd An17582 12Diunggah olehreemaroy08
- Cluster AnalysisDiunggah olehghromis
- Short Non-Technical Guide to Statistical InferenceDiunggah olehghromis
- RFID Tag Identification Using Bit Tracking TechnologyDiunggah olehIJARTET
- A Case Study of an Rfid-based System for Pilgrims Identification and TrackingDiunggah olehdadkhahnader
- BIT BY BITSDiunggah olehSurya Tej Borra
- Proposal for RFID Tag Anti Collision Using MCCDMA TechniqueDiunggah olehghasanf
- Anti CollisionDiunggah olehNishita Shah
- rfidDiunggah olehKarthick Smk
- Spanning Tree ProtocolDiunggah olehsuresh
- Analysis and Design of Tree Based InterleaverDiunggah olehEng Burhaanudiin Cumar
- Fault ToleranceDiunggah olehDan Adrian
- Fault toleranceDiunggah olehMadalina Vieriu
- Perfect tag identification protocol in RFIDDiunggah olehdemorajesh
- RFIDDiunggah olehTijana Dejanović
- Rfid Based Identification System _by_bal_emdDiunggah olehMohit Khatri
- Trivino Etal 2011 Contribution of Vegetation Landscape Iberian Birds ModellingDiunggah olehAnna Oliveira
- Presentation 1Diunggah olehendale
- MedTerm Machine learningDiunggah olehMOhmedSharaf

- discourse community paperDiunggah olehapi-240765529
- Indian Pharmaceutical IndustryDiunggah olehDev Hisaria
- 109476781 Sample Library for Instructions-V14SP1 DOKU v1 03 EnDiunggah olehgeorgel1605
- [William Baer] Classic American Films ConversatioDiunggah olehBianca David
- Assignment+Measure+of+DispersionDiunggah olehAhmed tariq
- IQA HandbookDiunggah olehdale2741830
- 12XV_ii_ List of Experimental Setup-InFTDiunggah olehopenid_lyVWlhMH
- Presentation from Sem Sepulveda, Forum on Residential Recycling, 10/7/19Diunggah olehState Senator Liz Krueger
- The Future of Healthcare in EuropeDiunggah olehKathi Apostolidis
- QEC 2012 - Competitors Package FINALDiunggah olehaka_katrin
- Syllabus Computational Heat TransferDiunggah olehArturo Rubio
- Webb Faience Vessels SF AVDiunggah olehPassent Chahine
- ECO WOODDiunggah olehMohd Hafizz
- Eval System Change MethodbrDiunggah olehManuel Sanchez Lopez
- Authorised Gas Tester ManualDiunggah olehAkhtar Quddus
- 20010206 Suria ReportDiunggah olehccohen6410
- 1027 issue of the Daily JournalDiunggah olehSan Mateo Daily Journal
- Astr0 Ats Page 1Diunggah olehJMCISME
- Air and gases properties UK units.pdfDiunggah olehJavier Lemus
- Quality Characteristics of Lanhouin a TraditionallyDiunggah olehMohd Ferdaus
- NTD - Lab Manual of Analysis of Milk Lipids (Ghee) - 2007 - IndiaDiunggah olehdmshahidch
- BOP - Pressure TestingDiunggah olehChitta Gogoi
- Promax RG5410A (recolha de fluídos) frioDiunggah olehCarlos Vasconcelos
- Software Lab Final Record.docxDiunggah olehvarun
- Characterization, Differentiation and Identification of Wild-typeDiunggah olehFiqa Success
- Coping With DiarrheaDiunggah olehSamuel Miller
- Lesson Plan Template for MICROTEACHINGDiunggah olehleydis vidal
- ROB-PSC1Diunggah olehRohit Bali
- Power 911 Installation - 8 Steps to SuccessDiunggah olehIng Anthony Gil
- 20190109 Sy Mhvqyc-ArDiunggah olehRabih Gerges

## Lebih dari sekadar dokumen.

Temukan segala yang ditawarkan Scribd, termasuk buku dan buku audio dari penerbit-penerbit terkemuka.

Batalkan kapan saja.