Decision trees
Rule-based induction
Neural networks
Memory(Case) based reasoning
Genetic algorithms
Bayesian networks
Outlook Temperature
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Sunny
Sunny
Overcast
Rain
Rain
Rain
Overcast
Sunny
Sunny
Rain
Sunny
Overcast
Overcast
Rain
Humidity
Hot
Hot
Hot
Mild
Cool
Cool
Cool
Mild
Cool
Mild
Mild
Mild
Hot
Mild
High
High
High
High
Normal
Normal
Normal
High
Normal
Normal
Normal
High
Normal
High
Wind
Play Tennis
Weak
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Weak
Weak
Strong
Strong
Weak
Strong
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
Outlook
Sunny
Humidity
High
No
Overcast
Rain
Wind
Yes
Strong
Normal
Yes
No
Weak
Yes
Definition :
selection: used to partition training data
termination condition: determines when to stop partitioning
pruning algorithm: attempts to prevent overfitting
E (n) pi (c ci | n) log 2 pi (c ci | n)
i 1
G (n, A) E (n)
vValue( A)
nv
n
E (nv )
ID3 and C4.5 branch on every value and use an entropy minimisation
Tree Induction:
Outlook
Sunny
Overcast
{1, 2,8,9,11 }
{4,5,6,10,14}
Yes
Rain
?
Overfitting
Consider eror of hypothesis H over
training data : error_training (h)
entire distribution D of data : error_D (h)
Hypothesis h overfits training data if there is an
alternative hypothesis h such that
error_training (h) < error_training (h)
error_D (h) > error (h)
Preventing Overfitting
Problem: We dont want to these algorithms to fit to
``noise
Reduced-error pruning :
breaks the samples into a training set and a test set.
The tree is induced completely on the training set.
Working backwards from the bottom of the tree, the
subtree starting at each nonterminal node is
examined.
If the error rate on the test cases improves by pruning it, the
subtree is removed. The process continues until no
improvement can be made by pruning a subtree,
The error rate of the final tree on the test cases is used as
an estimate of the true error rate.
Predicted
False Positives
True Positives
False Negatives
Actual
Accuracy: percentage of examples in
the test set that are classified correctly.