# Softky plot ratio

Shared by:
Categories
Tags
-
Stats
views:
0
posted:
10/17/2012
language:
English
pages:
26
Document Sample

```							  Learning to make specific
predictions using Slow Feature
Analysis
Memory/prediction hierarchy with temporal invariances

Slow: temporally invariant abstractions

Fast: quickly changing input

But… how does each module work: learn, map, and predict?
My (old) module:
1.   Quantize high-dim input space
2.   Map to low-dim output space
3.   Discover temporal sequences in input space
4.   Map sequences to low-dim sequence language
5.   Feedback = same map run backwards

Problems:
• Sequence-mapping (step #4) depends on several previous
steps  brittle, not robust
• Sequence-mapping not well-defined statistically
New module design: Slow Feature Analysis (SFA)

Pro’s of SFA:
•Nearly guaranteed to find some slow features
•No quantization
•Defined over entire input space
•Hierarchical “stacking” is easy
•Statistically robust building blocks (simple
polynomials, Principal Components Analysis,
variance reduction, etc)

 a great way to find invariant functions
 invariants change slowly, hence easily
predictable
BUT…
….No feedback!

• Can’t get specific output from invariant
input
• It’s hard to take a low-dim signal and
turn it into the right high-dim one
(underdetermined)

Here’s my solution (straightforward,
probably done before somewhere):
Do feedback with separate map
First, show it working…
… then, show how & why

Input space: 20-dim “retina”
Input shapes: Gaussian blurs (wrapped) of 3 different widths
Input sequences: constant-velocity motion (0.3 pixels/step)

T=0

…
T=2
…
T=4
Pixel 21 = pixel 1

T = 23

…
T=25
…

T=27
Sanity-check: slow features extracted match generating parameters:

Gaussian std dev.

“What”                                                     Slow feature #1

“Where”                                                   Gaussian center pos’n

Slow feature #2

(… so far, this is plain vanilla SFA, nothing new…)
New contribution:
Predict all pixels of next image, given previous images…

T=0

…
T=2
…
T=4

T=5 
? ???????????????????

Reference prediction is to use previous image
(“tomorrow’s weather is just like today’s”)
T=4

T=5 
Plot ratio:

(mean-squared prediction error )
(mean-squared reference error)

Reference prediction

Median ratio over all points = 0.06
(including discontinuities)
…over high-confidence points = 0.03
(toss worst 20%)
Take-home messages:

–   SFA can be inverted
–   SFA can be used to make specific predictions
–   The prediction works very well
–   The prediction can be further improved by using
confidence estimates

So why is it hard, and how is it done?....
Why it’s hard:

Low-dim slow features: S1 = 0.3 x1 + 0.1 x12 + 1.4 x2 x3 + 1.1 x42 +…. + 0.5 x5 x9 + …

easy
High-dim: x1 x2 x3 ……………………………………………..…………………..x20

But given S1 = 1.4 S2 = -0.33

x1= ?
x2=?                        HARD
x3=?
x4=?
x5=?
x6=?
.                          •Infinitely many possibilities of x’s
.                          •Vastly under-determined
.                          •No simple polynomial-inverse formula (e.g. “quadratic formula”)
x20=?
Very simple, graphable example:
(x1, x2) 2-dim  S1 1-dim

S1(t) = x12 + x22    nearly constant, i.e. slow

x1(t), x2(t) approx circular motion in plane

Illustrate a series of six clue/trick pairs for learning specific-prediction mapping
Clue #1: The actual input data is a small subset of all
possible input data (i.e. on a “manifold”)

actual
≠
possible

Trick #1: Find a set of points which represent where the
actual input data is
20-80 “anchor points” Ai



(Found using k-means, k-medoids, etc. This is quantization, but only for feedback)
Clue #2: The actual input data is not distributed evenly

yes                               no

Trick #2: Calculate covariance matrix Ci of data around Ai


data                      Eigenvectors of Ci
Clue #3: S(x) is locally linear about each anchor point



Trick #3: Construct linear (affine) Taylor-series mappings
SLi approximating S(x) about each Ai

(NB: this doesn’t require polynomial SFA, just differentiable)
Good news: Linear SLi can be pseudo-inverted (SVD)
Bad news: We don’t want any old (x1,x2), we want (x1,x2) on
the data manifold

Clue #4: Covariance eigenvectors tell us about the local
data manifold

Trick #4:
1.     Get SVD pseudo-inverse DX = SLi-1(Snew – S(Ai))
2.     Then stretch DX onto manifold by multiplying by chopped* Ci
Snew          DS
S(Ai)                                                                                     Stretched DX
DX

DX                                         …stretch…

* Projection matrix, keeping only as many eigenvectors as dimensions of S
Good news: Given Ai and Ci, we can invert Snew  Xnew

Bad news: How do we choose which Ai and SLi-1 to
use?
?            ?
These three all have the
same value of Snew
?
Clue #5:
a) We need an anchor Ai such that S(Ai) is close to Snew

Snew
Close
candidates
S(Ai)

b) Need a “hint” of which anchors are close in X-space

Hint region

Trick #5: Choose anchor Ai such that
– Ai is “close to” the hint   AND
– S(Ai) is close to Snew
All tricks together:
each anchor point

S(Ai) neighbors   x
Anchors   +
Clue #6: The local data scatter can decide if a given point is
probable (“on the manifold”) or not

improbable

probable

Trick #6: Use Gaussian hyper-ellipsoid probabilities about
closest Ai     (this can tell if a prediction makes sense or not)

improbable

probable
Estimated uncertainty increases away from anchor points

-log(P)
Summary of SFA inverse/prediction method:

We have X(t-2), X(t-1), X(t)… we want X(t+1)

S
1.    Calculate slow features                       t

S(t-2), S(t-1), S(t)

2. Extrapolate that trend linearly to Snew (NB: S varies slowly/smoothly in
time)
S           S    new

t

3. Find candidate S(Ai)’s close to Snew
all S(Ai)                              Snew

e.g. candidate i = {1, 16, 3, 7}
Summary cont’d

4. Take X(t) as “hint,” and find candidate Ai’s close to it

e.g. candidate i = {8, 3, 5, 17}

5. Find “best” candidate Ai , whose index is high on both
candidate lists:
S(Ai)’s close to       Ai close to X(t)
Snew

i                      i
1                      8

16                     3
3                      5
6                      17
6. Use chosen Ai and pseudo-inverse                S(Ai)

(i.e. SLi-1(Snew – S(Ai) ) with SVD) to get DX
DX

7. Stretch DX onto low-dim manifold using chopped Ci

Stretched DX
DX

…stretch…

8. Add stretched DX back onto Ai to get final prediction

Ai
Stretched DX
9. Use covariance hyper-ellipsoids to estimate confidence in this
prediction

improbable

probable

This method uses virtually everything we know about the data;
any improvements presumably would need further clues…
– Discrete sub-manifolds
– Discrete sequence steps
– Better nonlinear mappings
Next steps

• Online learning
– Adjust anchor points and covariance as new data arrive
– Use weighted k-medoid clusters to mix in old with new data

• Hierarchy
– Set output of one layer as input to next
– Enforce ever-slower features up the hierarchy

• Test with more complex stimuli and natural movies
• Let feedback from above modify slow feature polynomials
• Find slow features in the unpredicted input (input – prediction)

```
Related docs
Other docs by alicejenny
to view Lesson from Teachers
GUIDELINES FOR POST EXPOSURE PROPHYLAXIS PEP
FIRST BANK ADDITION City of Bloomington