In this course
1. How should objects to be classified be
represented?
2. What algorithms can be used for recognition
(or matching)?
3. How should learning (training) be done?
Classification in Statistical PR
A class is a set of objects having some important
properties in common
A feature extractor is a program that inputs the
data (image) and extracts features that can be
used in classification.
A classifier is a program that inputs the feature
vector and assigns it to one of a set of designated
classes or to the reject class.
With what kinds of classes do you work?
4
Some Terminology
Classes: set of m known categories of objects
Discriminant functions
Functions f(x, K)
perform some
computation on
feature vector x
Knowledge K
from training or
programming is
used
Final stage
determines class
8
Euclidean distance
between feature vector
X and the mean of
each class.
Choose closest class,
modes; where is
its mean?
But if modes are
detected, two
subclass mean
vectors can be
used
10
11
12
detection rate
versus false
alarm rate
Generally, false
alarms go up
with attempts to
detect higher
percentages of
known objects
13
14
Bayesian decision-making
15
16
Decision Trees
#holes
0
moment of
inertia
best axis
direction
0
60
#strokes
t
<t
90
#strokes
2
x
#strokes
4
w
17
Entropy-Based Automatic
Decision Tree Construction
Training Set S
x1=(f11,f12,f1m)
x2=(f21,f22, f2m)
.
.
xn=(fn1,f22, f2m)
Node 1
What feature
should be used?
What values?
Entropy
Given a set of training vectors S, if there are c classes,
c
Information Gain
The information gain of an attribute A is the expected
reduction in entropy caused by partitioning on this attribute.
|Sv|
Gain(S,A) = Entropy(S)
----- Entropy(Sv)
v Values(A) |S|
where Sv is the subset of S for which attribute A has
value v.
Choose the attribute A that gives the maximum
information gain.
21
v1
Set S
vk
S={sS | value(A)=v1}
repeat
recursively
Information gain has the disadvantage that it prefers
attributes with large number of values that split the
data into small, pure subsets.
22
Gain Ratio
Gain ratio is an alternative metric from Quinlans 1986
paper and used in the popular C4.5 package (free!).
Gain(S,a)
GainRatio(S,A) = -----------------SplitInfo(S,A)
ni
|Si|
|Si|
SplitInfo(S,A) = - ----- log
-----2
|S|
i=1
|S|
where Si is the subset of S in which attribute A has its ith value.
23
Information Content
Note:
A related method of decision tree construction using
a measure called Information Content is given in the
text, with full numeric example of its use.
24
.
.
.
Inputs
.
.
.
Outputs
25
Node Functions
a1
a2
aj
w(1,i)
neuron i
w(j,i)
output
an
output = g ( aj * w(j,i) )
Function g is commonly a step function, sign function,
or sigmoid function (see text).
26
A kernel trick.
28
Maximal Margin
Margin
1
0
0
0
1
1
Hyperplane
29
Non-separable data
0
0
0
0
11
1
0
1 1 0
0
1
0
0
0
0
1
1
1
0
0
1
Feature space Rn
1
1
0
Kernel
trick
0
0
1
0
1
31
32
33