=
=
) ( * ) ( ) , (
1
i
n
i
i
S Entropy
S
S
S Entropy A S Gain
=
=
Gain of an attribute
Calculate Gain (S, A) for each attribute A
expected reduction in entropy due to sorting on A
Choose the attribute with highest gain as root of tree
Gain (S, A) = Entropy(S) Expectation(A)
{S
1
, ..., S
i
, , S
n
} = partitions of S according to
values of attribute A
n = number of attributes A
|S
i
| = number of cases in the partition S
i
|S| = total number of cases in S
Which attribute is root?
If Outlook is made root of the tree
There are 3 partitions of the cases
S
1
for Sunny,
S
2
for Cloudy, S
3
for Rainy
S
1
(Sunny)= {cases 1,2,8,9,11}
|S
1
| = 5
In these 5 cases
values for Play are
3 No and 2 Yes
Entropy(S
1
)
= - 2/5 (log
2
2/5) 3/5(log
2
3/5)
= 0.97
Similarly
Entropy(S
2
)= 0
Entropy(S
3
)= 0.97
Outlook Tempe
rature
Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Cloudy Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Cloudy Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Cloudy Mild High True Yes
Cloudy Hot Normal False Yes
Rainy Mild High True No
Choosing the Root Attribute
humidity
Yes
Yes
Yes
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
No
temperature
Yes
Yes
No
No
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
No
outlook
Yes
Yes
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
windy
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
No
No
No
Which attribute is best for the root of the tree?
- the one that gives the best information gain
- in this case outlook (as we are going to see)
sunny
cloudy
rainy high low true false hot
mild
cold
Which attribute is root?
Gain(S, Outlook) = Entropy(S) Expectation(Outlook) =
Gain(S, Outlook)
= 0.94 [5/14 * 0.97 + 4/14 * 0 + 5/14 * 0.97]
= 0.247
Similarly
Gain(S, Temperature)= 0.059
Gain(S, Humidity)= 0.051
Gain(S, Windy)= 0.048
Gain(S, Outlook) is the highest gain
Outlook should be the root of the decision tree (index)
(
+ + ) 3 ( *
| |
| |
) 2 ( *
| |
| |
) ( *
| |
| |
) (
3 2
1
1
S Entropy
S
S
S Entropy
S
S
S Entropy
S
S
S Entropy
Repeat for Sunny Node
outlook
Yes
sunny
cloudy
rainy
?
temperature
No
No
Yes
No
Yes
hot mild cold
outlook
Yes
sunny
cloudy
rainy
?
windy
Yes
No
No
Yes
No
false true
outlook
Yes
sunny
cloudy
rainy
?
humidity
No
No
No
Yes
Yes
Yes
high normal
Repeat for Rainy Node
outlook
Yes
sunny
cloudy
rainy
humidity
No
Yes
high normal
Mild High False Yes
Cool Normal False Yes
Cool Normal True No
Mild Normal False Yes
Mild High True No
Decision Tree (Index) for Playing
Tennis
outlook
Yes
sunny
cloudy
rainy
humidity
No
Yes
high
normal
windy
No
Yes
true false
Case Retrieval via DTree Index
Typical implementation
e.g.
Case-Base indexed
using a decision-tree
Cases are
stored in the
index
leaves
DTree created from cases
Automated indexing of case-base
Summary
Decision tree is built from cases
Decision tree is often used for problem-solving
In CBR, decision tree is used to partition
cases
Similarity matching is applied to cases in leaf
node
Indexing pre-selects relevant cases for k-NN
retrieval
BRING CALCULATOR on MONDAY