Kalman Filter by 47e665z

VIEWS: 10 PAGES: 64

									580.691 Learning Theory
    Reza Shadmehr


State estimation theory
               A                                            B




Subject was instructed to pull on a knob that was fixed on a rigid wall. A) EMG recordings from arm and leg muscles.
Before biceps is activated, the brain activates the leg muscles to stabilize the lower body and prevent sway due to the
anticipated pulling force on the upper body. B) When a rigid bar is placed on the upper body, the leg muscles are not
activated when biceps is activated. (Cordo and Nashner, 1982)
                     A                                      B



                               100 Hz




Effect of eye movement on the memory of a visual stimulus. In the top panel, the filled circle represents the fixation
point, the asterisk indicates the location of the visual stimulus, and the dashed circle indicates the receptive field a cell
in the LIP region of the parietal cortex. A) Discharge to the onset and offset of a visual stimulus in the cell’s receptive
field. Abbreviations: H. eye, horizontal eye position; Stim, stimulus; V. eye, vertical eye position. B) Discharge during
the time period in which a saccade brings the stimulus into the cell’s receptive field. The cell’s discharge increased
before the saccade brought the stimulus into the cell’s receptive field. (From (Duhamel et al., 1992)
Why predict sensory consequences of motor commands?
Subject looked at a moving cursor while a group of dots appeared on the screen for 300ms. In some trials the dots would
remain still (A) while in other trials they would move together left or right with a constant speed (B). Subject indicated the
direction of motion of the dots. From this result, the authors estimated the speed of subjective stationarity, i.e., the speed of
dots for which the subject perceived them to be stationary. C) The unfilled circles represent performance of control subjects.
Regardless of the speed of the cursor, they perceived the dots to be stationary only if their speed was near zero. The filled
triangles represent performance of subject RW. As the speed of the cursor increased, RW perceived the dots to be stationary if
their speed was near the speed of the cursor. (Haarmeier et al., 1997)
Disorders of agency in schizophrenia relate to an inability to compensate for sensory consequences of self-generated
motor commands. In a paradigm similar to that shown in the last figure, volunteers estimated whether during motion of a
cursor the background moved to the right or left. By varying the background speed, at each cursor speed the
experimenters estimated the speed of perceptual stationarity, i.e., the speed of background motion for which the subject
saw the background to be stationary. They then computed a compensation index as the difference between speed of
eye movement and speed of background when perceived to be stationary, divided by speed of eye movement. The
subset of schizophrenic patients who had delusional symptoms showed a greater deficit than control in their ability to
compensate for sensory consequences of self-generated motor commands. (From (Lindner et al., 2005))
    Combining predictions with observations



A

               C




B




               D
       Parameter variance depends only on input selection and noise

A noisy process produces     y*(i )  w*T x(i )
n data points and we form
an ML estimate of w.
                              y (i )  y*(i )                    
                                                                 N 0, 2   
                                                       
                             D (1)  x(1) , y (1,1) , x(2) , y (1,2) ,          , x( n) , y (1,n)   
                                              
                                                   1
                            w ML  X X     T
                                                        X T y (1)

We run the noisy process
again with the same
                                                          
                             D (2)  x(1) , y (2,1) , x(2) , y (2,2) ,          ,x(n) , y(2,n) 
                                              
sequence of x’s and re-                            1
                                           T
                            w ML  X X                  X T y (2)
estimate w:

The distribution of the
resulting w will have a var- w ML
cov that depends only on
                                         * 2 T 1 
                                      N  w , X X
                                        
                                                   
                                                   
                                                                   
the sequence of inputs, the
bases that encode those
inputs, and the noise
sigma.
                 The Gaussian distribution and its var-cov matrix
                                                           1     ( x   )2 
A 1-D Gaussian distribution is defined as    p ( x)        exp  
                                                      2 2
                                                                    2 2   

                                                               1            1                       
       In n dimensions, it generalizes to    p ( x)                   exp   (x  x)T C 1 (x  x) 
                                                          (2 )n | C |      2                       

When x is a vector, the variance is    cij    E ( xi  xi )( x j  x j )   E  xi x j   xi x j
                                                                                       
expressed in terms of a covariance
matrix C, where ρij corresponds to the
degree of correlation between variables            12                12 1 2         1n 1 n 
xi and xj                                                                                          
                                                                      22
                                                                                         2 n 2 n 
                                             C   12 1 2
                                                                                                  
                                                                                                  
                                                  1n 1 n           2 n 2 n           n     
                                                                                             2
                                                                                                  
                                                    ( xi   x )( yi   y )            C xy           C xy
                                                  i                                             
                                                                                                        x y
                                               ( xi   x )2  ( yi   y )2           Cxx C yy
                                               i                   i
                                                        x    N  μ, C 
                                                        12                12 1 2 
                                                    C                               
                                                       12 1 2              22 


     x1 and x2 are positively correlated            x1 and x2 are not correlated              x1 and x2 are negatively correlated

                 0   1   0.9 2                           0  1    0.1      2                  0   1     0.9 2  
          x   N  ,                            x       N  ,                            x   N  ,                
                 0 0.9 2    2                            0 0.1 2                             0  0.9 2    2 
                                                                       2                                            
     3                                         4                                              3

                                               3
     2                                                                                        2
                                               2
     1                                                                                        1
                                               1

x2   0                                         0                                              0

                                               -1
     -1                                                                                   -1
                                               -2
     -2                                                                                   -2
                                               -3


              -2   -1   0    1     2       3            -3    -2   -1   0   1   2   3     4           -2   -1   0    1    2    3

                        x1
                          Parameter uncertainty: Example 1
   • Input history:
                      *                    y  w1x1  w2 x2  xT w
                                           ˆ
        x1   x2   y
                                                          var  w1       cov  w1 , w2  
       1     0    0.5         1   0   w ML   N  E[w ],                                 
                              1
                                                 
                                                           cov  w2 , w1    var  w2   
       1     0    0.5
                                  0
                                    
                                                                                            

       1     0    0.5     X  1
                              
                                   0
                                    
                                               N  w , X X
                                                 
                                                           
                                                  * 2 T 1 
                                                                  
                                                                  
                                                                       
       1     0    0.5         1   0
                              0   1             0.5    0.25 0  
                                             N    , 2          
       0     1    0.5                             0.5
                                                          0   1 
                                                                    
                                                                  2
x1 was “on” most of the time. I’m pretty
certain about w1. However, x2 was “on”                          1.5

only once, so I’m uncertain about w2.                             1


                                                           w2   0.5


                                                                  0


                                                                -0.5


                                                                           -0.5   0   0.5   1   1.5   2

                                                                                      w1
                          Parameter uncertainty: Example 2
   • Input history:
                                                          var  w1      cov  w1 , w2  
                      *                 w ML   N  E[w ],                                
                                                          cov  w2 , w1    var  w2   
        x1   x2   y                              
                                                                                           
       1     1    1           1   1
       1     1    1           1
                                  1
                                    
                                               N  w , X X
                                                 
                                                           
                                                  * 2 T 1 
                                                            
                                                            
                                                                      
                          X  1   1
       1     1    1
                                                0.5    1    1  
                                               N    , 2           
                              1                  0.5        1 1.25 
                                   1
       1     1    1
                              1                                 
                                  0
                                    
       1     0    0.5
                                                                 2


x1 and x2 were “on” mostly together. The                       1.5

weight var-cov matrix shows that what I                          1
learned is that: w1  w2  1
                                                          w2   0.5
I do not know individual values of w1 and w2
                                                                 0
with much certainty.
                                                               -0.5
x1 appeared slightly more often than x2, so
I’m a little more certain about the value of                              -0.5   0   0.5   1   1.5   2
w1.                                                                                  w1
                         Parameter uncertainty: Example 3
  • Input history:
                                                        var  w1      cov  w1 , w2  
                     *                w ML   N  E[w ],                                
                                                        cov  w2 , w1    var  w2   
       x1   x2   y                             
                                                                                         
       0    1    0.5
       0    1    0.5                         N  w , X X
                                               
                                                         
                                                * 2 T 1 
                                                          
                                                          
                                                                    
       0    1    0.5                            0.5     1.25 0.25 
                                             N    , 2               
       0    1    0.5                            0.5
                                                        0.25 0.25  
                                                                       
       1    1    1                                             2


                                                             1.5
x2 was mostly “on”. I’m pretty certain about
w2, but I am very uncertain about w1.                          1

Occasionally x1 and x2 were on together, so                  0.5
I have some reason to believe that: w  w  1          w2
                                        1    2
                                                               0


                                                             -0.5


                                                                        -0.5   0   0.5   1   1.5   2

                                                                                   w1
                    Effect of uncertainty on learning rate
• When you observe an error in trial n, the amount that you should change w
should depend on how certain you are about w. The more certain you are, the
less you should be influenced by the error. The less certain you are, the more
you should “pay attention” to the error.

                   mx1             mx1
                                                     error


                                              
                    w ( n1)  w ( n)  k ( n) y ( n)  x( n)T w ( n)   
                                      Kalman gain




                 Rudolph E. Kalman (1960) A new approach to linear filtering and
                 prediction problems. Transactions of the ASME–Journal of Basic
                 Engineering, 82 (Series D): 35-45.

                 Research Institute for Advanced Study
                 7212 Bellona Ave, Baltimore, MD
          Example of the Kalman gain: running estimate of average


     x (i )  1
    y*(i )  w* ;     y (i )  y*(i )                  
                                                        N 0,  2   
                    n
     X T  1 1                1
                                                               w(n) is the online estimate of the mean of y

                     
                          1                n
                                          
                                        1
    w( n)  X T X              XTy            y (i )
                                        n i 1

              1 n 1 (i )
( n 1)
w                  
            n  1 i 1
                       y
                                                                              Past estimate      New measure

              n 1
          1                   1
    w  
     ( n)
          n  i 1
                   (i ) ( n)
                               n                         1
                                                                          1
                   y  y    n  1 w( n 1)  y ( n)  1   w( n 1)  y ( n)
                                                           n             n
                             
    w( n)  w( n 1) 
                           n
                               
                           1 ( n)
                             y  w( n 1)                             As n increases, we trust our past estimate
                                                                       w(n-1) a lot more than the new
                                                                       observation y(n)
                                   Kalman gain: learning rate decreases as the number of
                                   samples increase
           Example of the Kalman gain: running estimate of variance
                            sigma_hat is the online estimate of the var of y



                              
           1 n (i )          2
 (2n)
 ˆ               y  E  y
           n i 1


                             
               n             2
           1       (i ) ( n)
                 y w
           n i 1

          1  n 1 (i )                               2
                                                    
                                 2
                            ( n)        ( n)     ( n)
                 y w             y w              
          n  i 1
                                                      
                                                       
          1
                                               
                                            ( n) 
                                                2
           n  1 ( n 1)  y  w
                      ˆ 2          ( n)
                                                   
          n                                       

                                                   
                    1           1                   2
        ˆ (2n1)   (2n 1)  y ( n)  w( n)
                       ˆ
                    n           n

                        
                    1  ( n)
                                                    
                                       2
 ( n)   ( n1)   y  w
 ˆ 2      ˆ  2                    ( n)
                                           (2n 1) 
                                            ˆ
                    n                               
              Some observations about variance of model parameters

                          y ( n)  x( n)T w*   ( n)             
                                                                N 0, 2        
                     w
                          n 1        n
                                                  
                                   w    k ( n) y ( n)  x( n)T w  
                                                                      n
                                                                           
                       P   var  w   
                         n             n
                                  
                                         
                                          
                                    n                                              T
                               E  w  E      w  n    w  n   E  w  n    
                                             
                                                         
                                                                       
                                                                                    
                                                                                    
                                                                                       
                                   n
                                        
                               E w w w
                                  
                                  
                                               *
                                                      
                                                       n   w* T 
                                                                      
                                                                      
                                                                      
                                                                       
                               E w w  
                                       n   nT
                                  
                                                
                                                 
              trace  P     E  w   w   
                         n             nT    n
                    
                           
                                 
                                                
                                                 

We note that P is simply the var-cov matrix of our model weights. It represents the uncertainty in our
estimates of the model parameters.
We want to update the weights in such a way as to minimize the trace of this variance. The trace is
the sum of the squared errors between our estimates of w and the true estimates.
 Trace of parameter var-cov matrix is the sum of squared parameter errors

         P  E  wwT 
                    
             w                       0   var  w1     cov  w1 , w2   
         w   1      N  0, P   N    ,                               
              w2                     0  cov  w2 , w1    var  w2   
                                                                             



                                     
                   n              2 1 n (i )2
                                                
               1
  var  w1          w  E  w1  
                       (i )
                                           w
               n i 1 1              n i 1 1

                                         1 n (i )2
 trace  P   var  w1   var  w2       
                                         n i 1
                                                      (
                                                w1  w2i )2


Our objective is to find learning rate k (Kalman gain) such that we minimize the
sum of the squared error in our parameter estimates. This sum is the trace of the
P matrix. Therefore, given observation y(n), we want to find k such that we
minimize the variance of our estimate w.

                           n n   w n n1  k (n)  y (n)  x(n)T w n n1 
                      w                                                       
                                                                              
   Objective: adjust learning gain in order to minimize model uncertainty

Hypothesis about data observation in trial n         y ( n)  x( n)T w*   ( n)                 
                                                                                               N 0,  2    
                                                    w
 my estimate of w* before I see y in trial n,             n n 1
        given that I have seen y up to n-1
                                                     y ( n)  x( n)T w 
                                                                           n n 1
                                 error in trial n
                                                          n n   w  n n1  k ( n)  y ( n)  x( n)T w  n n 1 
          my estimate after I see y in trial n      w                                                              
                                                                                                                   

                      P
                           n n 1
                                      var  w 
                                            n n 1 
                                                            a prior variance of parameters
                                                    
                                                     
                                   n n 1                                                          T
                               E  w 
                                                    n n 1    n n 1
                                              E w                              E w
                                                                                        n n 1   
                                                
                                                                 w
                                                                                               
                                                                                                  
                                                                                                     

                         P   var  w   
                           nn        nn         a posterior variance of parameters
                                           
                                   n n
                                          E  w  n n    w  n n   E  w  n n    
                                                                                             T
                               E  w                                                
                                                                                               
                                                                                   
                                                                                              
                                              Evolution of parameter uncertainty
                            n n   w  n n1  k ( n)  y ( n)  x( n)T w  n n1 
                       w                                                                      
                                                                                              

                       w
                             n n
                                     w
                                             n n 1
                                                        k ( n)  x( n)T w*   ( n)  x( n)T w 
                                                                                                 n n 1 
                                                                                                          
                                                                                                         
                  w*  w 
                             n n
                                     w*  w 
                                                       n n 1
                                                                  k ( n) x( n)T w*  k ( n) ( n)  k ( n) x( n)T w 
                                                                                                                         n n 1


                                         
                                     I  k ( n) x( n)T  w*  w 
                                                        
                                                        
                                                                  n n 1 
                                                                           k 
                                                                           
                                                                              ( n) ( n)



              
   n n   E  w*  w n n   w*  w n n  
                                                             T
P                                                         
            
                                                          
                                                              
           
                                         n n1   k ( n) ( n)  I  k ( n) x( n)T  w*  w  n n1   k ( n) ( n)  
                                                                                                                           T

           
                                *
        E  I  k ( n) x( n)T  w  w
                                
                                                
                                                 
                                                                  
                                                                  
                                                                                      
                                                                                       
                                                                                                         
                                                                                                         
                                                                                                                          
                                                                                                                           
                                                                                                                            
                                  n n1  w*  w n n1  I  k ( n) x( n)T T  k ( n) ( n) ( n)T k ( n)T 
                                                                                                           
                                                              T
                ( n) ( n)T  *
        E I k x         w  w                                                                             
          
                                                                                                            
                                                                                                                 

                                   
                                     n n1
                                                        E k (n) (n) (n)T k(n)T 
                                                                        T
        I  k ( n) x( n)T P                     I  k ( n) x( n)T

P                                                  
   n n   I  k (n) x(n)T P n n1 I  k (n) x( n)T T  k ( n) 2k ( n)T
                         Find K to minimize trace of uncertainty


         n n   w n n1  k (n)  y (n)  x(n)T w  n n1 
    w                                                    
                                                         

    w   w
              n n 1
                       k ( n)  x( n)T w*   ( n)  x( n)T w 
      nn                                                        n n 1 
                                                                         
                                                                        
    w
        n n 
                                       
                      I  k ( n) x( n)T w
                                               n n1  k ( n) ( n)  k ( n) x( n)T w*

P
     n n1  var  w n n1 
                    
                                
                                 

    P
       n n   var  w n n  
                    
                              
                               

                                         I  k x   k (n) var  (n)  k (n)T
                                               n n1                     T
                           ( n) ( n)T                         ( n) ( n)T
               I k      x                 P

    P                                                    
       n n   I  k (n) x( n)T P n n1 I  k ( n) x( n)T T  k ( n) 2k ( n)T
                                            Find K to minimize trace of uncertainty

P     I  k                     P
                                        n n 1
                                                  I  k                  
  nn                                                                      T
                  ( n ) ( n)T
                        x                                  ( n ) ( n )T
                                                               x               k ( n) 2k ( n)T

        P
              n n 1
                         P
                               n n 1 ( n ) ( n)T
                                                       k ( n) x( n)T P
                                                                               n n 1
                                                                                          k ( n) x( n)T P
                                                                                                              n n 1 ( n) ( n)T
                                        x    k                                                                      x    k          k ( n) 2k ( n)T


 tr  P    tr  P
                   n n 1 
                                 tr  P
                                      n n 1 ( n) ( n)T 
                                                             tr k ( n) x( n)T P
     nn                                                                         n n 1 
                                               x k                                       
                                                                                    
                                                                                                                               tr  A  tr  AT 
                                                                                                                                             
             tr k ( n)  x( n)T P
                                   n n 1 ( n)             
                                            x   2  k ( n)T                                                                      P  PT
                                                            

                 tr  P
                      n n1 
                                 2tr  P
                                       n n1 ( n) ( n)T 
                                                             tr k ( n)  x( n)T P
                                                                                   n n 1 ( n)            
                                             x k                                          x   2  k ( n)T 
                                                                                                       

                                                                                         scalar
            
  tr k ( n)  x( n)T P
                         n n1 x(n)   2  k (n)T   tr  x(n)T P n n1 x(n)   2  k (n)k ( n)T 
                                                                                                     
                                                                                                   
                                                                                                                                     tr  aB   atr  B 
                                                         
                                                         x( n)T P
                                                                     n n1 x(n)   2  tr k (n)k (n)T 
                                                                                                        
                                                                                       
                                                                     
                                                                     x( n)T P
                                                                                 n n1 x(n)   2  k (n)T k (n)
                                                                                                    
                                                                                                   
                                                         The Kalman gain

        tr  P    tr  P 
                          n n 1 
                                         2tr  P 
            nn                               n n 1 ( n ) ( n )T   ( n )T  n n 1 ( n )      
                                                         x k            x
                                                                               P           x   2  k ( n)T k ( n)
                                                                                               

                     tr  P
                          n n 1 
                                         2k ( n)T P
                                                      n n 1 ( n )  ( n )T  n n 1 ( n )    
                                                            x x           P        x   2  k ( n)T k ( n)
                                                                                             
  d
        tr  P    2 P
dk ( n ) 
            nn 
                  
                              n n 1 ( n )  ( n )T  n n 1 ( n )
                                     x x
                                            
                                                     P
                                                                           
                                                                           
                                                                                 
                                                                x   2  2k ( n )  0   

                                    P
                                         n n1 x( n)
                   ( n)
               k          
                               ( n)T  n n 1 ( n)  
                               x    P         x  2 
                                                     


                                                            If I have a lot of uncertainty about my model, P is large
                                                            compared to sigma. I will learn a lot from the current error.
                                                            If I am pretty certain about my model, P is small compared
                                                            to sigma. I will tend to ignore the current error.
                                               Update of model uncertainty

P
     n n
                P
                      n n 1
                                 P
                                       n n 1 ( n ) ( n )T
                                                               k ( n) x( n)T P 
                                                                                     n n 1
                                                                                                k ( n)  x( n)T P 
                                                                                                                    n n 1 ( n )  
                                              x    k                                                                        x   2  k ( n)T
                                                                                                                                   
                                                                              1
        ( n)
               P
                   n n1 x( n)  x( n)T P n n1 x( n)   2 
    k                                                                    
                                                                         
                                                                                               T
P   P
          n n 1
                   P
                       n n 1 ( n )  ( n )T  n n 1 ( n )
                                                                                                    x( n)T P 
  nn                                                                                                            n n 1
                              x       x      P        x  2                              
                                                                                          
                                                                              1
               P
                   n n1 x(n)  x(n)T P n n1 x(n)   2                      x( n)T P
                                                                                               n n1
                                                                         
                                                                         
                                                                              1
                P
                    n n 1 ( n )  ( n )T  n n 1 ( n )                         ( n)T  n n 1 ( n)  
                           x       
                                   x      P         x  2                         x    P         x  2 
                                                                                                        
                                                        T
                 
                 x( n)T P
                             n n1 x( n)   2             x( n)T P
                                                                          n n1
                                                 
                                                
                                                                                               1
                P
                    n n 1
                             P
                                 n n 1 ( n )  ( n )T  n n 1 ( n )
                                                                                                    x( n)T P 
                                                                                                                n n 1
                                        x       x     P         x  2                    
                                                                                          

    n n 
                                            n n1    Model uncertainty decreases with
P                    I  k ( n) x( n)T P                 every data point that you observe.
  Hidden variable       w*             w*                  w*

                   x              x                    x              In this model, we hypothesize that the
 Observed                                                             hidden variables, i.e., the “true”
 variables                                                            weights, do not change from trial to
                         y              y                  y          trial.

                                                            w*n 1)  w*n)
                                                             (         (

                                                            y ( n)  x( n)T w*   ( n)                              
                                                                                                                 N 0,  2   
     A priori estimate of mean and variance of the
                                                            w
                                                                1 0  , P1 0 
hidden variable before I observe the first data point
                                                                 n n   w  n n1  k (n)  y ( n)  x( n)T w  n n 1 
                                                            w                                                                  
                                                                                                                               
               Update of the estimate of the hidden                                P
                                                                                        n n1 x(n)
                                                                (n)
                                                            k         
                                                                          x( n)T P
                                                                                        n n 1 ( n )
             variable after I observed the data point
                                                                                               x         2
                                                            P
                                                                n n 
                                                                              I  k ( n) x( n)T P         n n1

                             Forward projection of the      w
                                                                 n1 n   w  n n 
                                                                 n1 n   P n n 
                              estimate to the next trial
                                                            P
                    w*            w*                        w*

              x              x                          x            In this model, we hypothesize that
                                                                     the hidden variables change from
                                                                     trial to trial.
                     y             y                        y

                                                            w*n 1)  Aw*n)  ε ( n)
                                                             (          (       w                        εw        N  0, Q 

                                                            y ( n)  x( n)T w*   ( n)
                                                                                   y                    y           
                                                                                                               N 0,  2         
     A priori estimate of mean and variance of the
                                                            w
                                                                1 0  , P1 0 
hidden variable before I observe the first data point

                                                                                   P
                                                                                        n n 1 ( n )
                                                                                               x
                                                            k (n) 
                                                                        x( n)T P
                                                                                        n n 1 ( n )
                                                                                         2   x
                                                            w   w
                                                                      n n 1
                                                                               k ( n)  y ( n)  x( n)T w 
              Update of the estimate of the hidden            nn                                            n n 1 
            variable after I observed the data point                                                                 
                                                                                                                    
                                                            P
                                                                n n 
                                                                               I  k ( n) x( n)T P     n n1

                                                            w
                                                                 n 1 n 
                                                                             Aw 
                                                                                        n n
                          Forward projection of the

                                                            P
                                                                 n 1 n 
                                                                             AP
                           estimate to the next trial
                                                                                     n n T
                                                                                            A Q
 Uncertainty about my model parameters


                                              P
                                                   n n1 x( n)
                            k ( n) 
                                       x( n)T P
                                                   n n 1 ( n )
                                                           x        2
                                 Uncertainty about my measurement

• Learning rate is proportional to the ratio between two uncertainties: my model vs. my
measurement.
• After we observe an input x, the uncertainty associated with the weight of that input
decreases.

                             P
                                n n
                                            (n) (n)T
                                         I k         x       P
                                                         n n1

• Because of state update noise Q, uncertainty increases as we form the prior for the
next trial.


                                  P
                                        n1 n   AP n n  AT     Q
2. Bayesian state estimation
3. Causal Inference
4. The influence of priors
5. Behaviors that are not Bayesian
                                    Comparison of Kalman gain to LMS

                                   w*n1)  w*n)
                                    (        (

                                   y ( n)  x( n)T w*   ( n)                             
                                                                                       N 0, 2               
                               P
                                    n n  x( n)
See derivation of   k ( n) 
this in homework                2
                       n
                                                      
                    w    w    k ( n) y ( n)  x( n)T w  
                               n 1                           n 1
                                                                                        
                                                    n n
                           w  
                              n 1 P
                                    2
                                                               y ( n)  x( n)T w 
                                                                                      n 1
                                                                                                   x( n )

 In the Kalman gain approach, the P matrix depends on the history of all previous and current
 inputs. In LMS, the learning rate is simply a constant that does not depend on past history.

                                     n
                                                                    
                                   w   w    y(n)  x(n)T w  x(n)
                                            n1                   n1
                                                                                                         
 With the Kalman gain, our estimate converges on a single pass over the data set. In LMS,
 we don’t estimate the var-cov matrix P on each trial, but we will need multiple passes before
 our estimate converges.
                   Effect of state and measurement noise on the Kalman gain

 w(n1)  aw(n)   wn)
  *         *       (
                                       a  0.99    w         
                                                         N 0, q 2         
     y ( n)  x( n)T w*   ( n)
                            y              x=1    y      
                                                        N 0,  2      
             5                                                                5
            4.5                                                           4.5



P
  n1 n    4
            3.5
                                                         P
                                                           n1 n        3.5
                                                                              4


             3                                                                3
            2.5                                                           2.5
             2                                                                2

                   2      4        6         8    10                              2   4     6      8         10



                                                                                          q 2  2, 2  1
                                                                          0.8
            0.8
                                                                          0.75

 k   (n)                       q 2  2, 2  1
                                                              k (n)
            0.75                                                          0.7

            0.7
                                                                          0.65            q 2  1,  2  1
                                                                          0.6
            0.65               q  1,   1
                                   2         2                            0.55
                                                                                          q 2  1,  2  2
                                                                          0.5
                    2     4            6     8    10                              2   4     6      8         10



  High noise in the state update model                            High noise in the measurement also
  produces increased uncertainty in model                         increases parameter uncertainty. But this
  parameters. This produces high learning                         increase is small relative to measurement
  rates.                                                          uncertainty. Higher measurement noise
                                                                  leads to lower learning rates.
         Effect of state transition auto-correlation on the Kalman gain

w(n1)  aw( n)   wn)
 *         *        (
                                  w          
                                         N 0, q 2              q2  1

  y (n)  x( n)T w*   ( n)
                        y          x=1            y         
                                                           N 0,  2          2 1
                                    5


                                    4

                      P
                         n1 n    3


                                    2


                                    1
                                         2         4        6       8       10


                                   0.8
                                  0.75
                          k (n)    0.7
                                  0.65                                           a  0.99
                                   0.6
                                  0.55                                           a  0.50
                                   0.5
                                          2        4        6       8       10   a  0.10

           Learning rate is higher in a state model that has high auto-
           correlations (larger a). That is, if the learner assumes that the
           world is changing slowly (a is close to 1), then the learner will
           have a large learning rate.
                                         Kalman filter as a model of animal learning
                                                                                                                            light 
           Suppose that x represents inputs from the environment: a light and a tone.                                     x      
                                                                                                                             tone 
           Suppose that y represents rewards, like a food pellet.



     Animal’s model of the experimental setup                                                   Animal’s expectation on trial n

                   *                 *                             *                                   y (n)  x( n)T w
                                                                                                       ˆ                     n n1
               w                 w                             w

     x                   x                            x                                          Animal’s learning from trial n


               y                 y                             y                                  P
                                                                                                       n n1 x(n)
                                                                           k (n) 
                                                                                      x( n)T P
                                                                                                       n n1 x( n)   2
w*n1)  Aw*n)  ε( n)
 (         (      w                         εw    N  0, Q                     n n   w  n n1  k (n)  y ( n)  x( n)T w  n n 1 
                                                                           w                                                            
y   ( n)
                 *
               x1w1          *
                        x2 w2     ( n)
                                     y           y       
                                                      N 0,        2
                                                                                                                                       
                                                                           P
                                                                               n n 
                                                                                                                
                                                                                               I  k ( n) x( n)T P
                                                                                                                          n n1

                                                                           w
                                                                                n 1 n 
                                                                                            Aw 
                                                                                                       n n


                                                                           P
                                                                                n1 n   AP n n  AT        Q
Various forms of classical conditioning in animal psychology




                                                            Not explained
                                                            by LMS, but
                                                            predicted by
                                                            the Kalman
                                                            filter.




                                                     Table from Peter Dayan
Sharing Paradigm                              1
                                                                                       y*
Train: {x1,x2} -> 1                           0
                                                                                       x2
                                                                                       x1
                                                  0        10         20         30        40
Test: x1 -> ?, x2 -> ?
Result: x1->0.5, x2->0.5
              Learning with Kalman gain                                                         LMS
       yhat                      0.5
       y
1.5                              0.45                                      1.5
                                 0.4
 1                               0.35                                       1

                                 0.3
0.5                                                                        0.5
                                 0.25                           P22                                        yhat
                                                                P11                                        y
 0                               0.2                                        0
        10    20      30    40          10   20       30         40                   10         20   30    40
0.8                              0.4                                       0.7
                                 0.35                                      0.6
0.6
                                 0.3                                       0.5

0.4                              0.25                                      0.4
                                                                           0.3
                                 0.2
0.2                                                                        0.2
                                 0.15
                           w2                                   k2         0.1                             w2
                           w1    0.1                            k1                                         w1
  0                                                                         0
   0    10    20      30   40           10   20       30         40          0        10         20   30    40
                                      Blocking
Kamin (1968) Attention-like processes in classical conditioning. In: Miami
symposium on the prediction of behavior: aversive stimulation (ed. MR Jones), pp. 9-
33. Univ. of Miami Press.
Kamin trained an animal to continuously press a lever to receive food. He then
paired a light (conditioned stimulus) and a mild electric shock to the foot of the rat
(unconditioned stimulus). In response to the shock, the animal would reduce the
lever-press activity. Soon the animal learned that the light predicted the shock, and
therefore reduced lever pressing in response to the light. He then paired the light
with a tone when giving the electric shock. After this second stage of training, he
observed than when the tone was given alone, the animal did not reduce its lever
pressing. The animal had not learned anything about the tone.
      Blocking Paradigm                                 1
                                                                                                      y*
                                                                                                      x2
      Train: x1 -> 1, {x1,x2} -> 1                      0
                                                                                                      x1

                                                            0        10         20          30         40
      Test: x2 -> ?, x1 -> ?
      Result: x2 -> 0, x1 -> 1
                       Learning with Kalman gain                                                            LMS
                                        0.5

1.5                                     0.4                                          1.5

                                        0.3
 1                                                                                    1

                                        0.2
0.5                                                                                  0.5
                                 yhat   0.1                               P22                                          yhat
                                 y                                        P11                                          y
 0                                                                                    0
           10     20       30     40          10   20           30         40                    10          20   30    40


1.25                                    0.6                                          1.2

      1                                                                                1
                                        0.5
                                                                                     0.8
0.75                                    0.4
                                                                                     0.6
 0.5                                    0.3
                                                                                     0.4
0.25                                    0.2                                          0.2
      0                          w2     0.1                               k2           0                               w2
                                 w1                                       k1                                           w1
-0.25                                    0                                           -0.2
    0       10    20       30     40          10   20           30         40           0        10          20   30    40
      Backwards Blocking Paradigm
                                                                   1

      Train: {x1,x2} -> 1, x1 -> 1                                                                                    y*
                                                                                                                      x2
                                                                                                                      x1
                                                                   0
      Test: x2 -> ?                                                    0   10        20   30           40        50        60


      Result: x2 -> 0
                         Learning with Kalman gain                                                                          LMS
                                            0.5

1.5                                         0.4                                            1.5

                                            0.3                                                1
 1

                                            0.2
0.5                                                                                        0.5
                                     yhat   0.1                                 P22                                                            yhat
                                     y                                          P11                                                            y
 0                                                                                             0
     0   10    20   30     40   50    60       0    10   20   30   40      50    60                0        10        20        30   40   50    60



  1                                         0.4
                                                                                               1
0.8
                                            0.2                                            0.8
0.6
                                                                                           0.6
0.4                                           0
0.2                                                                                        0.4
                                            -0.2
  0                                  w2                                         k2         0.2                                                 w2
                                     w1                                         k1                                                             w1
-0.2                                        -0.4                                               0
    0     10   20   30     40   50    60        0   10   20   30   40      50    60                0        10        20        30   40   50    60
                                   Different output models
                                                                                light 
Suppose that x represents inputs from the environment: a light and a tone.    x      
                                                                                 tone 
Suppose that y represents a reward, like a food pellet.


Case 1: the animal assumes an additive model.      w*n1)  Aw*n)  ε( n)
                                                    (         (      w         εw    N  0, Q 

                                                                                               
If each stimulus predicts one reward, then if
the two are present together, they predict two     y ( n)  xT w*n)   ( n)  y
                                                                (       y           N 0, 2
rewards.



Case 2: the animal assumes a weighted
average model. If each stimulus predicts
one reward, then if the two are present
together, they still predict one reward, but     w*n1)  Aw*n)  ε( n)
                                                  (         (      w           εw    N  0, Q 

                                                                        
with higher confidence.
                                                                                     1
                                                 y  b1x1w1  b2 x2 w2  b1  b2 
                                                          *          *
                                                                                          y
The weights b1 and b2 should be set to the
inverse of the variance (uncertainty) with
which each stimulus x1 and x2 predicts the
reward.
                                        General case of the Kalman filter


                                   nx1
                                           w*n)  Aw*n 1)  ε ( n)
                                            (       (          w                             εw       N  0, Q 
                               mx1

                                                             ( 
                                           y ( n)  H x( n) w*n)  ε (yn)                    εy       N  0, R 

                                           w
                                                1 0
                                                       , P
                                                           1 0
     A priori estimate of mean and
    variance of the hidden variable
before I observe the first data point
                                                                                                                        1
                                                            n n1
                                                                                  
                                                                                               n n 1
                                                                                                                    
                                                                                T                                T
                                           k ( n)  P                 H x( n)       H x
                                                                                         ( n)
                                                                                              P           H x( n)  R 
                                                                                                                     
      Update of the estimate of the
   hidden variable after I observed
                     the data point
                                           w
                                                n n   w  n n1  k (n)  y (n)  H
                                                                                
                                                                                
                                                                                                   x( n) w
                                                                                                                n n1 
                                                                                                                       
                                                                                                                       
                                           P
                                               n n 
                                                                             
                                                               I  k ( n) H x( n)     P
                                                                                           n n1

                                           w
                                                n 1 n 
                                                            Aw 
                                                                      n n
         Forward projection of the

                                                n1 n   AP n n  AT
          estimate to the next trial
                                           P                                  Q
                               How to set the initial var-cov matrix

                                w*n)  Aw*n 1)  ε( n)
                                 (       (         w                              εw     N  0, Q 

                                 y ( n)  H ( n) w*n)  ε(yn)
                                                  (                                εy    N  0, R 

                               P
                                  1 0
                                         ?
                                                            1                          1
                                                n n               n n 1 
In homework, we will show that in general:     P                 P                       H T R 1H
                                                                             
Now if we have absolutely no prior information on w, then before we see the first data point P(1|0) is
infinity, and therefore its inverse in zero. After we see the first data point, we will be using the above
equation to update our estimate. The updated estimate will become:
                                                    1
                                          11 
                                         P              H T R 1H
                                               

                                               P       H                  
                                                                                  1
                                                 11            T     1
                                                                   R H

A reasonable and conservative estimate of the initial value of P would be to set it to the above value.
That is, set:

                                              P       H               
                                                                              1
                                                10            T    1
                                                                  R H
                                             Data fusion

Suppose that we have two sensors that independently measure something. We would like
to combine their measures to form a better estimate. What should the weights be?
Suppose that we know that sensor 1 gives us measurement
y1 and has Gaussian noise with variance:  1
                                           2

And similarly, sensor 2 has gives us measurement y2 and
has Gaussian noise with variance:  22

A good idea is to weight each sensor inversely proportional
to its noise:
                                                            1
                     1            1        1      1 
                 x
                 ˆ          y1         y2           
                    2            2
                                    2        2  2 
                     1                     1       2 
                     1            1         2 2 
                          y1         y2  1 2 
                    2            2
                                    2        2   2 
                     1                     1       2 

                       2
                        2
                                           1
                                            2
                              y                 y2
                             2 1
                      1
                       2
                           2     1
                                    2
                                            2
                                              2
                                 Data fusion via Kalman filter
                                         2
                                          2
                                                              1
                                                               2
                                 x
                                 ˆ               y                     y2
                                               2 1
                                        1
                                         2
                                             2     1
                                                      2
                                                              2
                                                                2

To see why this makes sense, let’s put forth a generative model that describes
our hypothesis about how the data that we are observing is generated:


                 Hidden variable         x*                   x*




            Observed variables     y1         y2         y1            y2



                  x( n1)  ax( n)   xn)
                   *          *        (
                                         *      x*       
                                                        N 0, q 2   
                                                                               2 0 
                     ( n) 1 *
                    y    x( n)  ε(yn)          εy   N  0, R           R 1    
                          1                                                       2
                                                                               0 2 
                                                                              
                             1 *                                             2 0                1
                y   ( n)
                              x( n)  ε(yn)      εy    N  0, R         R 1                H  
                             1                                                    2
                                                                               0 2 
                                                                                                    1

               ˆ
                 1 0
               x           0

               P
                 1 0            priors


                          y (1) 
                 y (1)   1          our first observation
                          y (1) 
                          2 
                    1                  1
        11 
                             P  
                              10                            1        1
       P                                    H T R 1H         
                                                         1
                                                              2
                                                                       2
                                                                        2


               P
                 11  12 2
                             2
                                             variance of our posterior estimate
                             1
                              2
                                  2
                                    2
See
                                                                                                1   2 
                                                                                                              2
homework for                                             1                      
this                                                     2                    0               2  2            
                                                                                               1    1   2 
                                                                                                                 2
                k (1)  P  H T R 1 
                                         1  2
                                           2 2
                                                                                    1  2
                                                                                      2 2
                                                   1 1  1
                                                                               
                          11
                                        1   2
                                         2      2                              1  1   2
                                                                                    2      2    1               
                                                                                                2   1 
                                                                                                              2
                                                         0                     2
                                                        
                                                                             2 
                                                                                                2   1   2 
                                                                                                      2
                                                                                                        
                                                                                                                 2
                                                                                                                   
                                                                                                                   

               x   x   k (1) y (1)  Hx  
                 11     10                   10                       2
                                                                         2
                                                                            (1)          1
                                                                                          2
                                                                                                   (1)
               ˆ      ˆ                     ˆ                            y1                    y2
                                                                1   2
                                                                   2     2
                                                                                       1   2
                                                                                        2     2
                         The real world               x*




                 What our sensors tell us       y1         y2


                                            P
                                              11  1  2
                                                      2 2

                                                      1   2
                                                       2     2
Notice that after we make our first
                                                       1  2
                                                                                  
                                                        2 2
observation, the variance of our             1
                                              2
                                                              because 1 1   2  1  2
                                                                        2  2     2    2 2
posterior is better than the                          1   2
                                                       2     2
variance of either sensor.
                                                       1  2
                                                                                  
                                                        2 2
                                             2
                                              2
                                                              because  2 1   2   2 1
                                                                         2  2     2     2 2
                                                      1   2
                                                       2     2
              Combining equally noisy sensors                         Combining sensors with unequal noise


   y1                 
                 N y1 ,1
                    (1) 2
                                          y2          
                                                     N y2 ,  2
                                                        (1) 2
                                                                           Sensor 1
                     Sensor 1                   Sensor 2                                   Combined
                                 Combined
                                                                                                           Sensor 2
               0.5                                                        0.4
probability




               0.4
                                                                          0.3
               0.3
                                                                          0.2
               0.2

               0.1                                                        0.1

                0                                                          0
                     -2     0    2    4    6     8         10                   -2.5   0   2.5   5   7.5    10   12.5   15



                 2       (1)   12           2 2 
                                         (2) 1  2
ˆ
x              N    2    y            y ,          
                 2  2 1     2     2 1
                               1   2             2
                                            1   2 
                                             2
                 1     2




                          Mean of the posterior, and its variance
                Puzzling results: Savings and memory despite “washout”




      Gain=eye displacement
      divided by target
      displacement



                                1
  Result 1: After
  changes in gain,
  monkeys exhibit
  recall despite
  behavioral
  evidence for
  washout.


Kojima et al. (2004) Memory of learning facilitates saccade adaptation in the monkey. J Neurosci 24:7531.
    Puzzling results: Improvements in
   performance without error feedback



Result 2: Following changes in
gain and a long period of washout,
monkeys exhibit no recall.




Result 3: Following changes in
gain and a period of darkness,
monkeys exhibit a “jump” in
memory.



Kojima et al. (2004) J Neurosci 24:7531.
        The learner’s hypothesis about the structure of the world

1. The world has many hidden states. What I observe is a linear combination of
   these states.
2. The hidden states change from trial to trial. Some change slowly, others
   change fast.
3. The states that change fast have larger noise than states that change slow.



            w*      A    w*

       x             x


            y             y              w*  slow system
                                     w  1
                                       *
                                         w2  fast system
                                         
                                           *

                                        0.99    0 
                                      A
                                         0    0.50 
                                                    
      state transition equation   w*( n)  Aw*( n 1)  ε ( n)
                                                          w            εw   N  0, Q 
                output equation    y ( n)  x( n )T w*( n)  ε (yn )   εy     
                                                                            N 0, 2      
                                                                1
 Simulations for savings                                                                                        y*
                                                                0                                               x2
                                                                                                                x1
           w*                                                -1
                                                                     0      50   100   150   200   250    300
     w*   1                                                1.5
           w2 
           
             *
                                                                1

          0.99   0                                          0.5
      A
           0    0.50 
                      
                                                                0

                                                              -0.5
                    *( n 1)
w   *( n )
              Aw               ε(n)
                                  w     εw   N  0, Q         -1                                        yhat
                                                                                                         y
         0.00004  0                                         -1.5
                                                               0.6 0        50   100   150   200   250    300
       Q
            0    0.01
                                                             0.4


 y ( n)  x( n )T w*( n)  ε (yn )      εy     
                                             N 0, 2         0.2

                                                                0
      0.04
        2
                                                              -0.2
                                                                                                         w2
                                                                                                         w1
                                                              -0.4
                                                                 0          50   100   150   200   250    300
The critical assumption is that in the fast                   1.25
system, there is much more noise than in the                        1
                                                              0.75
slow system. This produces larger learning rate
                                                               0.5
in the fast system.
                                                              0.25
                                                                    0
                                                              -0.25                                      k2
                                                                                                         k1
                                                              -0.5
                                                                        0   50   100   150   200   250    300
 Simulations for spontaneous                                  1
 recovery despite zero error
                                                            0.5
 feedback                                                                                error clamp
                                                              0
 In the error clamp period, estimates are made yet
 the weight update equation does not see any error.         -0.5
 Therefore, the effect of Kalman gain in the error-                                                 y*
                                                                                                    x2
                                                                                                    x1
 clamp period is zero. Nevertheless, weights                 -1
                                                                   0   50   100   150   200   250    300
 continue to change because of the state update
 equations. The fast weights rapidly rebound to             1.5
 zero, while the slow weights slowly decline. The             1
 sum of these two changes produces a                        0.5
 “spontaneous recovery” after washout.
                                                              0

                                                            -0.5

                                                             -1                                     yhat
                                                                                                    y
   n n   w n n1  k (n)  y (n)  x(n)T w n n1    -1.5
                                                                   0   50   100   150   200   250    300
w                                                     
                                                      
w
   n1 n   Aw n n                                      0.4

                                                            0.2

                                                              0

                                                            -0.2
                                                                                                    w2
                                                                                                    w1
                                                            -0.4
                                                                   0   50   100   150   200   250    300
                  Changes in representation without error feedback

            Target visible during recovery            Target extinguished during recovery




              Mean gain at start Mean gain at end           Mean gain at start Mean gain at end
              of recovery = 0.83 of recovery = 0.95         of recovery = 0.86 of recovery = 0.87
                     % gain change = 14.4%                        % gain change = 1.2%




Seeberger et al. (2002) Brain Research 956:374-379.
Massed vs. Spaced training: effect of changing the inter-trial interval
Learning reaching in a force field




                                      Discrimination performance (sec)




                                                                                                   Han, J.S., Gallagher, M. & Holland, P.
                                                                         ITI = 8min




                                                                                                   Hippocampus 8:138-46 (1998)
                                                                                      ITI = 1min




                                     Rats were trained on an operant conditional
                                     discrimination in which an ambiguous stimulus (X)
                                     indicated both the occasions on which responding in the
                                     presence of a second cue (A) would be reinforced and the
                                     occasions on which responding in the presence of a third
                                     cue (B) would not be reinforced (X --> A+, A-, X --> B-, B+).
                                     Both rats with lesions of the hippocampus and control
                                     rats learned this discrimination more rapidly when the
                                     training trials were widely spaced (intertrial interval of 8
                                     min) than when they were massed (intertrial interval of 1
                                     min). With spaced practice, lesioned and control rats
                                     learned this discrimination equally well. But when the
                                     training trials were massed, lesioned rats learned more
                                     rapidly than controls.
       Massed vs. Spaced training: effect of changing the inter-trial interval

                                        Performance in a water maze



                                                                                                    4 trials a day for 4 days
                                                                                                    16 trials in one day




Sisti, Glass, Shors (2007) Neurogenesis and the spacing effect: learning over time enhances memory and the survival of new neurons.
Learning and Memory 14:368.
         The learner’s hypothesis about the structure of the world
1. The world has many hidden states. What I observe is a linear combination of
   these states.
2. The hidden states change from trial to trial. Some change slowly, others
   change fast.
3. The states that change fast have larger noise than states that change slow.
4. The state changes can occur more frequently than I can make observations.



    w*    A    w*   A   w*   A             A   w*   A    w*

x                                                    x


    y                                                    y
                    Inter-trial interval
         w* 
   w*   1 
         w2 
         
           *

        0.999  0 
    A
         0    0.40 
                    
w*( n)  Aw*( n 1)  ε ( n)
                        w                  εw     N  0, Q 
      0.00008 0 
    Q
         0    0.1
                  
 y ( n)  x( n )T w*( n)  ε (yn )         εy       
                                                  N 0, 2      
   2  0.04
                          ITI=2                                                           ITI=20
    1.5                                                            1.5


     1                                                              1


    0.5                                                            0.5

                                           y
     0                                     yhat                     0                                y
                                                                                                     yhat


          0   50    100    150       200    250   300                    0   500   1000    1500   2000   2500   3000
                                                                           n n 
                                                                                                              n n1
 When there is an observation, the uncertainty for
 each hidden variable decreases proportional to its                    P                 I  k ( n) x( n)T P
 Kalman gain.


 When there are no observations, the uncertainty                   P
                                                                        n1 n   AP n n  AT        Q
 decreases in proportion to A squared, but increases
 in proportion to state noise Q.                                                   a2 P                a11a22 P 
                                                                                  11 11                       12
                                                                                                                   Q
                                                                                   a11a22 P
                                                                                           12            a22 P 
                                                                                                           2
                                                                                                               11 

                                               ITI=20
         Uncertainty for the slow state                        Uncertainty for the fast state

0.0136                                              0.118
0.0134                                              0.116
                                                    0.114
0.0132
                                                    0.112
0.013
                                                        0.11
0.0128                                              0.108
                                        P11                                                            P22
0.0126                                              0.106
    1000    1020   1040   1060   1080   1100            1000       1020       1040       1060   1080    1100



 Beyond a minimum ITI, increased ITI continues to increase the uncertainty of the
 slow state but has little effect on the fast state uncertainty. The longer ITI increases
 the total learning by increasing the slow state’s sensitivity to error.
        Performance in spaced training depends largely on the slow state. Therefore,
         spaced training produces memories that decay little with passage of time.
  1.5
                                                                0.8
                                                                               w1spaced
    1
                                                                0.6                             w1massed

                                                                0.4
  0.5
                                                                                                w2massed
                                 y                              0.2
                                 yhatspaced
    0
                                 yhatmassed                      0
                                                                                     w2spaced

                                                               -0.2
        0   20   40   60   80   100   120   140                       0   20    40   60   80    100 120 140
 0.3
                                                                0.8
                                                                                                     k2massed
0.25

 0.2                                                                                                 k2spaced
                                                                0.6

0.15
                                      P22spaced
 0.1                                  P22massed                 0.4

0.05
                                      P11spaced
   0                                              P11massed     0.2
                                      P12spaced                                                      k1spaced
-0.05                                 P12massed
                                                                                                                k1massed
        0   20   40   60   80   100   120   140                       0   20    40   60   80     100 120 140
                                                   Observation number
         Spaced training results in better retention in learning a second language
On Day 1, subjects learned to translate written Japanese words into English. They were given a Japanese word
(written phonetically), and then given the English translation. This “study trial” was repeated twice. Afterwards,
the were given the Japanese word and had to write the translation. If their translation was incorrect, the correct
translation was given.
The ITI between word repetition was either 2, 14, or 98 trials.
Performance during training was better when the ITI was short. However, retention was much better for words
that were observed with longer ITI. (The retention test involved two groups; one at 1 day and other at 7 days.
Performance was slightly better for the 1 day group but the results were averaged in this figure.)




                                                                                                                     Pavlik, P. I. and Anderson, J. R. ( 2005). Practice and forgetting effects on
                                                                                                                     vocabulary memory: An activation-based model of the spacing effect.
       Performance during training                                Testing at 1 day or 1 week (averaged together)

          ITI=2
                         ITI=14

                                                                               ITI=98




                                                                          ITI=14        ITI=2
                   ITI=98




                                                                                                                     Cognitive Science, 29, 559-586.
                       Test at 1 week
Application of Kalman filter to problems in sensorimotor control




            Motor command      u     u


           State of our body   x      x            x




        Sensory measurement    y      y            y
                                          DM Wolpert et al. (1995) Science 269:1880
When we move our arm in darkness, we may estimate the position of our hand based
on three sources of information:
• proprioceptive feedback.
• a forward model of how the motor commands have moved our arm.
• by combining our prediction from the forward model with actual proprioceptive
feedback.


Experimental procedures:
Subject holds a robotic arm in total darkness. The hand is briefly illuminated. An
arrow is displayed to left or right, showing which way to move the hand. In some
cases, the robot produces a constant force that assists or resists the movement. The
subject slowly moves the hand until a tone is sounded. They use the other hand to
move a mouse cursor to show where they think their hand is located.




                                                   DM Wolpert et al. (1995) Science 269:1880
                              assistive

                              resistive




DM Wolpert et al. (1995) Science 269:1880
                        The generative model, describing actual dynamics of the limb

      Motor     u              u
   command                B

                          A                         x(n1)  Ax( n)  Bu ( n)  ε( n)
                                                                                 x       εx   N  0, Q 
     State of   x               x          x
    the body
                                                    y (n)  Cx(n)  ε(yn)                εy   N  0, R 
                    C


    Sensory     y               y          y
measurement


                The model for estimation of sensory state from sensory feedback



                                                    x( n1)  Ax( n)  Bu ( n)  ε( n)
                                                              ˆ        ˆ
                                                                                  x      εx   N  0, Q 
                                                               ˆ
                                                      y ( n)  Cx( n)  ε(yn)            εy   N  0, R 
                                                         ˆ
                                                         B  1.4 B
                                                    For whatever reason, the brain has an incorrect
                                                    model of the arm. It overestimates the effect of
                                                    motor commands on changes in limb position.
                                              x
                                                   0 0
                                                           x 
                                                              0
Initial conditions: the subject can see the   ˆ

                                              P
                                                   1 0
hand and has no uncertainty regarding its
position and velocity                                     0

                                              x
                                                   1 0
                                              ˆ            Ax   Bu  
                                                            ˆˆ 00   ˆ 1
          Forward model of state change
                           and feedback                ˆ ˆ 1 0 
                                               y    Cx
                                               ˆ1
                        Actual observation     y (1)  Cx(1)
                                                                                                  1
                                                           P C T CP C T
                                                    (1)       10 ˆ  ˆ 10 ˆ                 
                                               k                                         R
                                                                                           
            Estimate of state incorporates
             the prior and the observation
                                              ˆ
                                              x
                                                   11
                                                           ˆ
                                                          x
                                                               1 0
                                                                              
                                                                       k (1) y (1)  y  
                                                                                      ˆ1      
                                              P
                                                   11
                                                           
                                                           I  k (1)C T P  
                                                                     ˆ      10
                                                                              
                                              x
                                                   21
           Forward model to establish the     ˆ            Ax   Bu  
                                                            ˆ ˆ 11  ˆ 2
          prior and the uncertainty for the
                                              P
                                                   21
                                next state
                                                           AP  AT  Q
                                                            ˆ 11 ˆ
                          A single movement

           20

                                 x  SD  x 
                                 ˆ        ˆ               x t 
Pos (cm)

           15

           10
                                                Actual and
            5                                   estimated
                                                position                           For movements of various length
            0
                 0       0.2 0.4 0.6 0.8          1       1.2 1.4
                                                                         1.8
           1.5
                                                                                                            P
            1                                                            1.6
                         Motor command u                                               Variance at end of movement (cm^2)
           0.5
                                                                         1.4
            0
                                                                         1.2

                                                                                                            xx
      -0.5

           -1
                               Time of “beep”
                                                                           1
                                                                                                            ˆ
      -1.5                                                               0.8
                 0       0.2    0.4   0.6   0.8       1    1.2     1.4                  Bias at end of movement (cm)
       0.014
                                                                               0       0.5         1       1.5       2
       0.012
           0.01                                                                              Total movement time (sec)
       0.008
       0.006
       0.004                                Kalman gain
       0.002
                0
                     0   0.2 0.4 0.6 0.8          1       1.2 1.4
                                 Time sec

								
To top