jerryzhu@cs.wisc.edu
Computer Sciences
University of WisconsinMadison
d1
d3
c(f (x), y) convex loss function, e.g., the hinge loss. d2
Batch risk
T T
1X 1 2 2 X
J(f ) = (yt )c(f (xt ), yt ) + kf kK + (f (xs ) f (xt ))2 wst
l 2 2T
t=1 s,t=1
Instantaneous risk
t
T 1 X
Jt (f ) = (yt )c(f (xt ), yt ) + kf k2K + 2 (f (xi ) f (xt ))2 wit
l 2
i=1
t1
(t)
X
ft () = i K(xi , )
i=1
Init: t = 1, f1 = 0
Repeat
Pt1 (t)
1 receive xt , predict ft (xt ) = i=1 i K(xi , xt )
2 occasionally receive yt
3 update ft to ft+1 by
(t+1) (t)
i = (1 t 1 )i 2t 2 (ft (xi ) ft (xt ))wit , i<t
t
(t+1)
X T
t = 2t 2 (ft (xi ) ft (xt ))wit t (yt )c0 (f (xt ), yt )
i=1
l
4 store xt , let t = t + 1
Pt (t+1)
Reduce to representers ft+1 = i=t +1 i K(xi , ) by
min kf 0 ft+1 k2
(t+1)
||x xt ||2
wi t = ExN (i ,i ) exp
2 2
d 1 1
12
= (2) 2 |i | 2 |0 | 2 ||
1 > 1 > 1 >
exp i i i + xt 0 xt
2
A further approximation is
2 /2 2
wi t = eki xt k
400 400
Online RPtree
300 300
200 200
100 100
0 0
0 2000 4000 6000 0 5000 10000
T T
PT
Online MR risk Jair (T ) 1
T t=1 Jt (ft ) approaches batch risk J(f ) as
T increases.
J(f*) Batch MR
1.7 Jair(T) Online MR
1.3
Risk
1.2
1.1
0.9
0.8
0.7
0 500 1000 1500 2000 2500 3000 3500 4000 4500
T
Online RPtree
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0 0
0 1000 2000 3000 4000 5000 6000 7000 0 2000 4000 6000 8000 10000
T T
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
T T
0.7 Batch MR
Online MR (buffer)
0.6
Generalization error rate
0.5
0.4
0.3
0.2
0.1
0
0 1000 2000 3000 4000 5000 6000 7000
T