Softky plot ratio
Shared by: alicejenny
-
Stats
- views:
- 0
- posted:
- 10/17/2012
- language:
- English
- pages:
- 26
Document Sample


Learning to make specific
predictions using Slow Feature
Analysis
Memory/prediction hierarchy with temporal invariances
Slow: temporally invariant abstractions
Fast: quickly changing input
But… how does each module work: learn, map, and predict?
My (old) module:
1. Quantize high-dim input space
2. Map to low-dim output space
3. Discover temporal sequences in input space
4. Map sequences to low-dim sequence language
5. Feedback = same map run backwards
Problems:
• Sequence-mapping (step #4) depends on several previous
steps brittle, not robust
• Sequence-mapping not well-defined statistically
New module design: Slow Feature Analysis (SFA)
Pro’s of SFA:
•Nearly guaranteed to find some slow features
•No quantization
•Defined over entire input space
•Hierarchical “stacking” is easy
•Statistically robust building blocks (simple
polynomials, Principal Components Analysis,
variance reduction, etc)
a great way to find invariant functions
invariants change slowly, hence easily
predictable
BUT…
….No feedback!
• Can’t get specific output from invariant
input
• It’s hard to take a low-dim signal and
turn it into the right high-dim one
(underdetermined)
Here’s my solution (straightforward,
probably done before somewhere):
Do feedback with separate map
First, show it working…
… then, show how & why
Input space: 20-dim “retina”
Input shapes: Gaussian blurs (wrapped) of 3 different widths
Input sequences: constant-velocity motion (0.3 pixels/step)
T=0
…
T=2
…
T=4
Pixel 21 = pixel 1
T = 23
…
T=25
…
T=27
Sanity-check: slow features extracted match generating parameters:
Gaussian std dev.
“What” Slow feature #1
“Where” Gaussian center pos’n
Slow feature #2
(… so far, this is plain vanilla SFA, nothing new…)
New contribution:
Predict all pixels of next image, given previous images…
T=0
…
T=2
…
T=4
T=5
? ???????????????????
Reference prediction is to use previous image
(“tomorrow’s weather is just like today’s”)
T=4
T=5
Plot ratio:
(mean-squared prediction error )
(mean-squared reference error)
Reference prediction
Median ratio over all points = 0.06
(including discontinuities)
…over high-confidence points = 0.03
(toss worst 20%)
Take-home messages:
– SFA can be inverted
– SFA can be used to make specific predictions
– The prediction works very well
– The prediction can be further improved by using
confidence estimates
So why is it hard, and how is it done?....
Why it’s hard:
Low-dim slow features: S1 = 0.3 x1 + 0.1 x12 + 1.4 x2 x3 + 1.1 x42 +…. + 0.5 x5 x9 + …
easy
High-dim: x1 x2 x3 ……………………………………………..…………………..x20
But given S1 = 1.4 S2 = -0.33
x1= ?
x2=? HARD
x3=?
x4=?
x5=?
x6=?
. •Infinitely many possibilities of x’s
. •Vastly under-determined
. •No simple polynomial-inverse formula (e.g. “quadratic formula”)
x20=?
Very simple, graphable example:
(x1, x2) 2-dim S1 1-dim
S1(t) = x12 + x22 nearly constant, i.e. slow
x1(t), x2(t) approx circular motion in plane
Illustrate a series of six clue/trick pairs for learning specific-prediction mapping
Clue #1: The actual input data is a small subset of all
possible input data (i.e. on a “manifold”)
actual
≠
possible
Trick #1: Find a set of points which represent where the
actual input data is
20-80 “anchor points” Ai
(Found using k-means, k-medoids, etc. This is quantization, but only for feedback)
Clue #2: The actual input data is not distributed evenly
about those anchor-points
yes no
Trick #2: Calculate covariance matrix Ci of data around Ai
data Eigenvectors of Ci
Clue #3: S(x) is locally linear about each anchor point
Trick #3: Construct linear (affine) Taylor-series mappings
SLi approximating S(x) about each Ai
(NB: this doesn’t require polynomial SFA, just differentiable)
Good news: Linear SLi can be pseudo-inverted (SVD)
Bad news: We don’t want any old (x1,x2), we want (x1,x2) on
the data manifold
Clue #4: Covariance eigenvectors tell us about the local
data manifold
Trick #4:
1. Get SVD pseudo-inverse DX = SLi-1(Snew – S(Ai))
2. Then stretch DX onto manifold by multiplying by chopped* Ci
Snew DS
S(Ai) Stretched DX
DX
DX …stretch…
* Projection matrix, keeping only as many eigenvectors as dimensions of S
Good news: Given Ai and Ci, we can invert Snew Xnew
Bad news: How do we choose which Ai and SLi-1 to
use?
? ?
These three all have the
same value of Snew
?
Clue #5:
a) We need an anchor Ai such that S(Ai) is close to Snew
Snew
Close
candidates
S(Ai)
b) Need a “hint” of which anchors are close in X-space
Hint region
Trick #5: Choose anchor Ai such that
– Ai is “close to” the hint AND
– S(Ai) is close to Snew
All tricks together:
Map local linear inverse about
each anchor point
S(Ai) neighbors x
Anchors +
Clue #6: The local data scatter can decide if a given point is
probable (“on the manifold”) or not
improbable
probable
Trick #6: Use Gaussian hyper-ellipsoid probabilities about
closest Ai (this can tell if a prediction makes sense or not)
improbable
probable
Estimated uncertainty increases away from anchor points
-log(P)
Summary of SFA inverse/prediction method:
We have X(t-2), X(t-1), X(t)… we want X(t+1)
S
1. Calculate slow features t
S(t-2), S(t-1), S(t)
2. Extrapolate that trend linearly to Snew (NB: S varies slowly/smoothly in
time)
S S new
t
3. Find candidate S(Ai)’s close to Snew
all S(Ai) Snew
e.g. candidate i = {1, 16, 3, 7}
Summary cont’d
4. Take X(t) as “hint,” and find candidate Ai’s close to it
e.g. candidate i = {8, 3, 5, 17}
5. Find “best” candidate Ai , whose index is high on both
candidate lists:
S(Ai)’s close to Ai close to X(t)
Snew
i i
1 8
16 3
3 5
6 17
6. Use chosen Ai and pseudo-inverse S(Ai)
(i.e. SLi-1(Snew – S(Ai) ) with SVD) to get DX
DX
7. Stretch DX onto low-dim manifold using chopped Ci
Stretched DX
DX
…stretch…
8. Add stretched DX back onto Ai to get final prediction
Ai
Stretched DX
9. Use covariance hyper-ellipsoids to estimate confidence in this
prediction
improbable
probable
This method uses virtually everything we know about the data;
any improvements presumably would need further clues…
– Discrete sub-manifolds
– Discrete sequence steps
– Better nonlinear mappings
Next steps
• Online learning
– Adjust anchor points and covariance as new data arrive
– Use weighted k-medoid clusters to mix in old with new data
• Hierarchy
– Set output of one layer as input to next
– Enforce ever-slower features up the hierarchy
• Test with more complex stimuli and natural movies
• Let feedback from above modify slow feature polynomials
• Find slow features in the unpredicted input (input – prediction)
Get documents about "