Reza Shadmehr home

Document Sample
Reza Shadmehr home Powered By Docstoc
					      580.691 Learning Theory
           Reza Shadmehr


Neural mechanisms of classification
Generalization in linear classification
Patient H.M.
27 year old assembly line worker who had suffered from untreatable and debilitating
temporal lobe seizures for many years. Surgeon removed medial portion of the temporal
lobes bilaterally (only right lobe’s removal is shown on the figure on the right).
H.M.’s seizures were improved, but there was a devastating side effect: he could no longer
form long-term memories.




                                                                                    Kandel et al. Principles of Neural Science 2000 (62-1)
                                   R. Carter (1998) Mapping the Mind
Patient H.M.

• After recovery from surgery, he maintained his vocabulary and language skills,
maintained his high IQ, and ability to recall facts about his life that preceded the surgery:

     • could remember job that he held, where he had lived, and events of childhood. His
     memory of public and personal events extend only to when he was 16 years old
     (1942), 11 years before his operation. This is not typical of an amnesic individual,
     who generally remember facts and events up to near the date of their brain damage.

     • normal immediate memory: he can retain a number for a short period of time. He
     can carry on a conversation.

     • could not recognize people that he had talked to just the day before at the hospital.
     He does not know where he lives, who cares for him, or what he ate at his last meal.

     • He rarely complains. There could be something seriously wrong with him, but you
     would have to guess. At the nursing home, when H.M. is observed to be acting
     differently, the nurses question him by running through a list of possible complaints,
     such as toothache, headache, stomachache, until they hit upon the correct one. He
     will not spontaneously say that “I have a toothache”.


                                                                     Corkin, Seminars in Neurology 4:249-259 1984.
Immediate memory is intact in amnesia
• Subjects with medial temporal lobe damage and normal individuals were read a
sequence of digits (for example, 5-7-4-1) and then asked immediately to repeat back the
sequence.

• Each time the subject was successful, the number of digits in the test sequence was
increased by one.

• Digit span: the number of digits that was successfully repeated back before a subject
failed twice at the same sequence.

• The amnesic patients and the control subjects both repeated back an average of 6.8
digits.




                                                                                     Cave and Squire
 Delayed recall in H.M. became severely impaired within 1 minute




Delayed paired-comparison task.

Clicks, flashes, tones, or hues were
presented and then some seconds
later, the same or another cue was
presented and the subject was asked to
determine whether the two stimuli were
the same or different.

Average performance of H.M.




                                                            Source: Brenda Milner
Mirror tracing task in H.M.
While viewing hand in mirror, H.M. tries to trace between the
two lines. Number of errors refers to times that the border was
crossed.




• Could learn to do mirror writing: performance would
improve with practice and remain good on next day,
despite no conscious recall of prior practice.

Lesions of the temporal lobe appear to affect forms of
learning and memory that require a conscious record, and
are called declarative memories.


                                                                  Kandel et al. Principles of Neural Science 2000 (62-2)
Memory systems of the brain

                        Non-declarative memory is expressed
                        through performance rather than
                        recollection.




                Squire (2004) Neurobiology of Learning and Memory
                                     Review of online linear classification
                                                        T
                    x f   x1
                                               xf 
                                                                                          Linear classification with linear
                                                                    T                      encoding of feature space
                      x   1 x1
                                                           xf 
                                                               
                       y  0,1

      
 P y ( n )  1 x( n )                     1
                                                                    q(n)
                                           
                               1  exp wT x            i 
                                                               
                                                                    x n 
                                                    (n) 
                                                          y  q   n T  n 
                                                 1
              w   ( n 1)
                            w     (n)
                                           (n)           (n) (n)

                                            q 1  q              x x

                                                                                                    Linear classification with
               g x             1          g1  x                      gm  x         
                                                                                                T
                        (n)                             (n)                          (n)
                                                       f                            f             non-linear encoding of
                                                                                                    feature space
  
P y ( n )  1 g  x( n )                         1
                                                                             q(n)
                                                
                                    1  exp wT g  x( n )             
                                                                                g  x( n ) 
                   w ( n 1)       w(n)   (n)
                                                 1
                                            q 1  q ( n ) 
                                                              y(n)  q(n)  (n) T (n)
                                                                            g x  g x 
    Knowlton et al. (1996) “A neo-striatal habit learning system in humans”
                               Science 273:1399

Task: Individuals learned to predict
which of two outcomes would occur
on each trial, given the particular cue
that appeared.




                                          x      p x                 P  s  1 x
     Setting up the Knowlton et al. (1996) task in on-line learning


   x1                              P  xi  1 s  1   i
x 
              x1  0,1                                             P  s  1     P  s  1 x  ?
   x4                              P  xi  1 s  0   i
   
P  xi  0 s  1  1   i
                                                                   Let’s begin with the simpler problem of
p  xi s  1
                     x
                  i i   1  i 1 xi                         observing only one cue. We want to know
                                                                   the probability of sunshine, given that the

p  xi s  0   i i 1  i 
                      x               1 xi                       one cue was observed.


                   p  xi s  1 P  s  1              ixi 1   i  i 
                                                                          1 x
P  s  1 xi                                      
                              p  xi                           p  xi 

                    p  xi s  0  P  s  0         1    ixi 1  i 1 xi 
P  s  0 xi                                      
                               p  xi                           p  xi 

P  s  1 xi      ixi 1   i  i 
                                    1 x
              
P  s  0 xi  1     xi 1   1 xi 
                                 i              i
      P  s  1 xi 
log                     log   xi log  i  1  xi  log 1   i   log 1   
      P  s  0 xi 
                        log 1     xi log i  1  xi  log 1  i 
                        wi xi  c
      P  s  1 xi 
                        exp  wi xi  c 
      P  s  0 xi 
      P  s  1 xi   exp  wi xi  c  P  s  0 xi 

                                             
                        exp  wi xi  c  1  P  s  1 xi    
                             exp  wi xi  c 
                       
                           1  exp  wi xi  c 

      P  s  1 xi  
                                     1
                           1  exp   wi xi  c 
                        
       p xi , x j s  1  p  xi s  1 p x j s  1                   
                               i i 1   i 
                                                    1 xi                     1 x j 
                                   x                            x
                                                                       
                                                               j j 1 j      
                                                                                   1 x j 
                        
      p xi , x j s  0   i i 1   i 
                                   x                1 xi         x
                                                                       
                                                               j j 1  j     
        
      P s  1 xi , x j     p  xi , x j s  1 P  s  1
      P  s  0 xi , x j  p  xi , x j s  0  P  s  1

                                        ixi 1   i  i   j j 1   j
                                                                                         1 x j 
                             
                                                         1 x   x
                                                                                       
                                                                                 1 x 
                                 1    ixi 1  i 1 xi   j j 1   j  j
                                                                   x



log
        
      P s  1 xi , x j    w x w x c
      P  s  0 xi , x j 
                               i i     j j


      P  s  1 xi , x j  
                                            1
                             1  exp   wi xi  w j x j  c 
                                                                                     Therefore, the weather
             P  s  1 x 
                                                1                                    forecasting task is linear

                                            
                                 1  exp  wT x  c                                 classification in the feature
                                                                                     space of the cards.
   Parkinson patients were impaired in learning the classification task, while
                        amnesic patients were normal




                                                            PD-star represents the PD patients with
                                                            the most severe symptoms. PD also
                                                            involves damage to the frontal lobe. They
                                                            tested frontal patients and found that they
                                                            were normal in learning the classification
                                                            problem. When PD patients were tested
                                                            on an additional 100 trials, their
                                                            performance was now comparable to
                                                            control subjects. This was a little
                                                            puzzling.




Similar to PD patients, Huntington’s disease patients exhibited impaired ability to learn the weather
prediction task. (Knowlton et al., Dissociations within nondeclarative memory in Huntington’s
disease, Neuropsychology 10 (1996) 538–548.
After completing the task, subjects were given eight multiple-choice questions to
determine how well they remembered the testing situation. These questions asked,
for example, about the layout of the screen, the number of cards that could appear
together on the computer screen, the number of weather prediction trials presented,
and the appearance of the cues.




Medial temporal lobe structures damaged in Amnesic patients appear to support
acquisition of “declarative” memory of the training episode. In contrast, basal
ganglia structures damaged in Parkinson’s disease appear to support acquisition of
internal models for classification.
   Witt et al. (2002) Dissociation of Habit-Learning in Parkinson's and
             Cerebellar Disease. J. Cognitive Neurosci 14:493
Eldridge et al. (2002) Intact Implicit Habit Learning in Alzheimer's Disease.
                         Behavioral Neurosci 116:735
                                                             Brief notes on Alzheimer’s disease: In
                                                             early stages of the disease, there is
                                                             neurodegeneration in the medial temporal
                                                             lobes, similar to damage observed in
          cerebellar damage                                  amnesic patients. In later stages, neuronal
                                    Alzheimer’s disease      loss extends to the neocortex.
                   control




                                                 Parkinson’s
                                                 disease

                                                          In the post-experiment
                                                          interview (explicit memory
                                                          component), recall of AD
                                                          patients did not differ
                                                          from chance.
   Poldrack et al. (2001) Interactive memory systems in the human brain.
                                Nature 414:546
A “block” design: one group of subjects performed the FB task (and the baseline task), while
another performed the PA task (and the baseline task). Classification ability at end of training was
similar for the two groups.
                                                                  Between subject contrast: PA vs. FB




The FB task requires that you first select the class, and then you are provided with an error signal
regarding your choice. In the PA task, there is no explicit error signal because no choices are made.
Poldrack et al. (2001) Interactive memory systems in the human brain.
                             Nature 414:546


           Activity in caudate                                 Activity in hippocampus




Plot shows activity (with respect to baseline) in an event related design during the feedback-
learning task. Initially, as the task is performed there is increased activity in the hippocampus and
decreased activity in the caudate. With further training, the caudate activity increases and the
hippocampus activity declines. This suggests there may be a competition between these two
memory systems in the brain.
                              Generalization properties of classifiers


     Study items                          Test items
   Prototype Low distortion                                                           Control
                                                                                         Amnesic




                                                                    Percent correct
High distortion    Random


  Knowlton and Squire (1993) The learning of categories: parallel
  brain systems for item memory and category knowledge.
  Science 262:1747.




                  40 examples were generated from a prototype and studied.
                  Subjects were instructed that all examples belonged to the
                  same category. Five minutes later, performance was measured
                  on 84 new examples generated from the same prototype.
                  Subjects were asked “does this belong to the same category?”
       A generalization function for a linear classifier: system identification

              P  y  1 x, w  
                                                    1
                                          1  exp  wT g  x  
                              q ( n )  P  y  1 x( n ) , w ( n ) 
                                          P  y  1 x, w ( n )             P  y  1 x, w ( n )                       1
          “odds”        o( n )  x                                                                      
                                          P  y  0 x, w ( n )           1  P  y  1 x, w ( n )             exp  w ( n )T g  x  
Error experienced
         in trial n           y(n)  y(n)  q(n)
                                                                                        g  x( n ) 
                          w ( n 1)  w ( n )  y ( n )
                                                           q ( n ) 1  q ( n )  g  x ( n )  g  x ( n ) 
                                                                                             T



                                             g  x( n )  g  x 
                                                           T

                   b  x, x ( n )   
  Generalization

                                            g  x( n )  g  x( n ) 
       function                                        T



                                                                               1
                      o ( n 1)  x  
                                                                                             
                                        exp  w g  x   y
                                                 ( n )T        (n)
                                                                                   b  x, x  
                                                                                           (n)
                                             
                                                                  q 1  q 
                                                                    (n)        (n)             
                                                                                               
                                                                                     
                      o ( n 1)
                                 x   o  x  exp  y ( n )
                                         (n)             (n)
                                                                           b  x, x  
                                                                                   (n)
                                                       
                                                            q 1  q 
                                                                      (n)
                                                                                       
                                                                                       
   A generalization function for a linear classifier: system identification


“State” of the learner:                                                          P  y  1 x, w ( n ) 
                             z   (n)
                                        x   log o  x    log
                                                        (n)
                                                                               P  y  0 x, w ( n ) 
       log of the odds

                                                                                          
                  log o ( n 1)  x    log o ( n )  x    y ( n )                                 b  x, x ( n ) 
                                                                             q ( n ) 1  q ( n ) 
                                                                                
                           z ( n 1)  x   z ( n )  x   y ( n )                            b  x, x ( n ) 
    State transition
           equation                                                    q ( n ) 1  q ( n ) 
                                                                                                                   Input where error was
                                              Error in trial n                                                     experienced

                                                                   Generalization function
Early in training   After 300 trials          Catch Trial




                                  mean+/-SD
Shadmehr, Brandt & Corkin, J Neurophysiol 1998
Smith and Shadmehr (2005) Intact ability to learn internal models of arm dynamics
  in Huntington’s disease but not cerebellar degeneration. J. Neurophysiology




                  Cerebellar patients                 Huntington’s Disease patients




             Training set (bin=100 trials)

				
DOCUMENT INFO