School of EECS
Washington State University
CptS 570 - Machine Learning 1
Nonparametric method
Representation
Learning algorithm
Entropy and information gain
Overfitting
Enhancements
3
Each internal node tests one (univariate) or
more (multivariate) features
Each branch corresponds to a test result
Yes/no for numeric features or multivariate
One for each value of discrete feature
Leaf node
Classification: Class value
Regression: Real value
CptS 570 - Machine Learning 6
CptS 570 - Machine Learning 7
w
11
x
1
+w
12
x
2
+w
10
> 0
Create root node with all examples X
Call GenerateTree (root, X)
CptS 570 - Machine Learning 8
GenerateTree (node, X)
If node is pure enough
Then assign class to node and return
Else Choose best split feature F
Foreach value v of F
X = examples in X where F = v
Create childNode with examples X
GenerateTree (childNode, X)
S = set of training
examples
p = proportion of
positive examples in S
p
= proportion of
negative examples in S
Entropy measures the
impurity of S
Entropy(S) p log
2
p
p
log
2
p
( log
2
p
)
Entropy(S) p log
2
p p
log
2
p