Preview
What is a support vector machine?
The perceptron revisited
Kernels
Weight optimization
Handling noisy data
X
i
i yi K(xq , xi )
X
wj xqj
f (xq ) = sign
j
But
wj =
i yi xij
So
f (xq ) = sign
X X
j
i yi xij
xqj = sign
"
X
i
i yi (xq xi )
Examples of Kernels
Linear: K(x, x ) = x x
Polynomial: K(x, x ) = (x x )d
(u v)2
= (u1 v1 + u2 v2 )2
= u21 v12 + u22 v22 + 2u1 v1 u2 v2
2 2
2
2
= (u1 , u2 , 2u1 u2 ) (v1 , v2 , 2v1 v2 )
= (u) (v)
Learning SVMs
So how do we:
Choose the kernel? Black art
Choose the examples? Side effect of choosing weights
Choose the weights? Maximize the margin
for all i
for i = 1, 2, . . .
i hi (w)
Inequality Constraints
Minimize f (w)
Subject to gi (w) 0, for i = 1, 2, . . .
hi (w) = 0, for i = 1, 2, . . .
Lagrange multipliers for inequalities: i
KKT Conditions:
L(w , , )
i
gi (w )
i gi (w )
0
0
0
0
Solution Techniques
Use generic quadratic programming solver
Use specialized optimization algorithm
E.g.: SMO (Sequential Minimal Optimization)
Simplest method: Update one i at a time
But this violates constraints
Iterate until convergence:
1. Find example xi that violates KKT conditions
2. Select second example xj heuristically
3. Jointly optimize i and j
for all i
Bounds
Margin bound:
Bound on VC dimension decreases with margin
Leave-one-out bound:
E[# support vectors]
E[errorD (h)]
# examples