Your Federal Quarterly Tax Payments are due April 15th

Online Passive Aggressive Algorithms (PowerPoint) by forrests

VIEWS: 9 PAGES: 17

• pg 1
```									The Hebrew University Jerusalem, Israel

Online Learning with a Memory Harness using the

Forgetron
joint work with

Shai Shalev-Shwartz
Ofer Dekel and Yoram Singer
Large Scale Kernel Machine NIPS’05, Whistler
Slide 1

Forgetron

Overview
• Online learning with kernels

• Goal: strict limit on the number of “support vectors”
• The Forgetron algorithm • Analysis • Experiments

Forgetron

Slide 2

Kernel-based Perceptron for Online Learning
Online Learner

yt  sign(ft(xt)) x current classifier: ft(x) = i 2 I yi K(xi,x)
Current Active-Set I = {1,3,4} {1,3}
2 3 4 5 6 7

#Mistakes M

1

...

Forgetron

Slide 3

Kernel-based Perceptron for Online Learning
Online Learner

yt = sign(ft(xt)) x current classifier: ft(x) = i 2 I yi K(xi,x)
Current Active-Set I = {1,3,4}
2 3 4 5 6 7

#Mistakes M

1

...

Forgetron

Slide 4

Learning on a Budget
• |I| = number of mistakes until round t
• Memory + time inefficient • |I| might grow unboundedly

• Goal: Construct a kernel-based online algorithm for which:
• |I| · B for each t • Still performs “well”  comes with performance guarantee
Forgetron

Slide 5

Mistake Bound for Perceptron
• {(x1,y1),…,(xT,yT)} : a sequence of examples • A kernel K s.t. K(xt,xt) · 1 • g : a fixed competitor classifier in RKHS • Define `t(g)= max(0,1 – yt g(xt))

• Then,

Forgetron

Slide 6

Previous Work
Previous online budget algorithms do not provide a mistake bound Is our goal attainable ?
• Crammer, Kandola, Singer (2003)
• Kivinen, Smola, Williamson (2004) • Weston, Bordes, Bottu (2005)

Forgetron

Slide 7

Mission Impossible
• Input space: {e1,…,eB+1}

• Linear kernel: K(ei,ej) = ei ¢ ej = i,j

• Budget constraint: |I| · B . Therefore, there exists j s.t. i2 I i K(ei,ej) = 0 • We might always err • But, the competitor: g=i ei never errs !

• Perceptron makes B+1 mistakes
Forgetron

Slide 8

Redefine the Goal
• We must restrict the competitor g somehow. One way: restrict ||g|| • The counter example implies that we cannot compete with ||g|| ¸ (B+1)1/2

• Main result: The Forgetron algorithm can compete with any classifier g s.t.

Forgetron

Slide 9

The Forgetron
ft(x) = i 2 I i yi K(xi,x) Step (1) - Perceptron I’ = I [ {t} Step (2) – Shrinking i  t i
1 2 3 ... ... t-1 t 1 2 3 ... ... t-1 t

1

2

3

...

...

t-1 t

Step (3) – Remove Oldest r = min I
Forgetron

I  I [ {t}

1

2

3

...

...

t-1 t

Slide 10

Quantifying Deviation
• “Progress” measure: t = ||ft – g||2 - ||ft+1-g||2 • “Progress” for each update step

 t = t + t +  t
||ft-g||2-||f’-g||2 ||f’-g||2-||f’’-g||2 ||f’’-g||2-||ft+1-g||2

after Perceptron

after shrinking

after removal

• “Deviation” is measured by negative progress
Forgetron

Slide 12

Quantifying Deviation
Gain from Perceptron step: Damage from shrinking: Damage from removal:

The Forgetron sets:

Forgetron

Slide 13

Resulting Mistake Bound
For any g s.t. the number of prediction mistakes the Forgetron makes is at most

Forgetron

Slide 14

Experiment I: MNIST dataset
0.35

average error

0.3 0.25 0.2 0.15 0.1 0.05 1000 2000

Forgetron CKS

3000

budget size - B
Forgetron

Slide 22

Experiment II: Census-income (adult)
0.3

average error

0.25 0.2 0.15 0.1 0.05

Forgetron CKS

1000 2000 3000 4000 5000 6000

budget size - B
Forgetron

… (Perceptron makes 16,000 mistakes)

Slide 23

Experiment III: Synthetic Data with Label Noise
0.45 0.4 0.35 0.3 0.25 0.2 0 500 1000 1500 2000 Forgetron CKS

average error

budget size - B
Forgetron

Slide 24

Summary
• No budget algorithm can compete with arbitrary hypotheses • The Forgetron can compete with norm-bounded hypotheses

• Works well in practice
• Does not require parameters Future work: the Forgetron for batch learning
Forgetron

Slide 25

```
To top