Anda di halaman 1dari 28

Decision Trees

What is a tree in CS?

• A tree is a non-linear data structure
• It has a unique node called the root
• Every non-trivial tree has one or more leaf
nodes, arranged in different levels
• Trees are always drawn with the root at the
top or on the left
• Nodes at a level are connected to nodes at
higher (parent) level or lower (child) level
• There are no loops in a tree
Decision Trees
• A decision tree (DT) is a hierarchical
classification and prediction model
• It is organized as a rooted tree with 2 types
of nodes called decision nodes and class nodes
• It is a supervised data mining model used for
classification or prediction
An Example Data Set and Decision Tree

# Attribute Class outlook

Outlook Company Sailboat Sail? sunny rainy
1 sunny big small yes
2 sunny med small yes yes company
3 sunny med big yes
no big
4 sunny no small yes med
5 sunny big big yes
6 rainy no small no no sailboat yes

7 rainy med small yes small big

8 rainy big big yes
9 rainy no big no yes no

10 rainy med big no

• What is classification?
• What are some applications of Decision Tree
Classifiers (DTC)
• What is a BDTC?
• Misclassification errors

# Attribute Class sunny



Outlook Company Sailboat Sail? yes company

1 sunny no big ? no

2 rainy big small ?

no sailboat yes

small big

yes no
Chance and Terminal nodes
• Each internal node of a DT is a decision point,
where some condition is tested
• The result of this condition determines which
branch of the tree is to be taken next
• Thus they are called decision node, chance
node or non-terminal node
• Chance nodes partition the available data at
that point to maximize dependent variable
Terminal nodes
• The leaf nodes of a DT are called terminal
• They indicate the class into which a data
instance will be classified
• They have just one incoming node
• They do not have child nodes (outgoing nodes)
• There are no conditions tested at terminal
• Tree traversal from the root to the leaf
produces the production rule for that class
Advantages of DT
• Easy to understand and interpret
• Works for categorical and quantitative data
• DT can grow to any depth
• Attributes can be chosen in any desired order
• Pruning a DT is very easy
• Works for missing or null values
Advantages contd.
• Can be used to identify outliers
• Production rules can be obtained directly
from the built DT
• They are relatively faster than other
classification models
• DT can be used even when domain experts are
• A DT induces sequential decisions
• Class-overlap problem
• Correlated data
• Complex production rules
• A DT can be sub-optimal
Quinlan’s classical example

# Attribute Class
Outlook Temperature Humidity Windy Play
1 sunny hot high no N
2 sunny hot high yes N
3 overcast hot high no P
4 rainy moderate high no P
5 rainy cold normal no P
6 rainy cold normal yes N
7 overcast cold normal yes P
8 sunny moderate high no N
9 sunny cold normal no P
10 rainy moderate normal no P
11 sunny moderate normal yes P
12 overcast moderate high yes P
13 overcast hot normal no P
14 rainy moderate high yes N
Simple Tree


sunny rainy

Humidity P Windy

high normal yes no

Complicated Tree


cold moderate

Outlook Outlook Windy

sunny rainy sunny rainy yes no

overcast overcast

P P Windy Windy P Humidity N Humidity

yes no yes no high normal high

N P P N Windy P Outlook P

yes no sunny rainy

N P N P null
Production rules
• Rules abstracted by a DT can be converted
into production rules
• These are obtained by traversing each branch
of the DT from root to each of the leaves
• A DT can be reconstructed if all production
rules are known
General View of DT Induction
ID3 induction algorithm
• ID3 (Interactive dichotomiser)
• Introduced in 1986 by Quinlan
• Uses greedy tree-growing method
• Works on binary attributes
• Uses entropy measure
C4.5 induction algorithm

• Invented by Quinlan in 1993

• Is an extension of ID3 algorithm
• Uses greedy tree-growing method
• Works on general attributes
• Uses entropy measure
• Uses multi-way splits
CART induction algorithm
• Invented by Breiman, in 1984
• Uses binary recursive partitioning method
• Works on general attributes
• Uses Gini measure
• Uses two-way splits
Measures for node splitting
• Gini’s Index measure
• Modified Gini Index
• Normalized, symmetric and asymmetric Gini
Index measure
• Shannon’s entropy measure
• Minimum classification error measure
• Chi-square statistic

• The average amount of information I needed

to classify an object is given by the entropy

• For a two-class problem:

Chi-squared Automatic Interaction Detector

• As the name implies, this is a statistical

technique for tree induction that uses Karl
Pearson's X2 test for contingency tables.
• It works for categorical variables (with 2 or
more categories), and can be used as an
alternative to logistic regression.
• There is no pruning step as it stops growing
the DT when a certain condition is met.
Pruning DT
• Once the decision tree has been constructed, a
sensitivity analysis should be performed to test the
suitability of the model to variations in the data
instances. Expected values of each alternative are
evaluated to determine optimal model. But the
decision maker's attitude towards high risk
alternatives can negatively influence the outcome of a
sensitivity analysis. Most of the decision tree
software packages allows the user to carry out
sensitivity analysis.
Pre Vs Post-pruning
• There are two approaches to prune a DT -- pre-
pruning and post-pruning. In pre-pruning, the tree
growing is halted when a stopping condition is met.
• Post-pruning works with a completely grown tree. In
post-pruning, test cases are used to prune the DT to
minimize the classification error or to adjust the
tree to data changes.
• Tree pruning is usually a post-processing step with an
intention to minimize over fitting, and to remove
Decision Tables
• A decision table is a hierarchical structure akin to
decision trees, except that data are enumerated into
a table using a pair of attributes, rather than a single
• Quantitative variables should be categorized using
the discretisation technique discussed in chapter 1.
Fraud Detection
• Fraud detection is increasingly becoming a
necessity due to the large number of
uncaught frauds. Fraudulent financial
transaction amounts to billions of dollars
every year throughout the world. Fraud
prevention is different from fraud detection,
as the former is pre-transaction safety, and
the later is used during or immediately after
a transaction.
Software for DT
• DTREG is a powerful statistical analysis program that
generates classification and regression trees (
• GATree (
• Weka (University of Waikato, NZ)
• TreeAge Pro (
• YaDT (

Anda mungkin juga menyukai