Docstoc

DCM precision

Document Sample
DCM precision Powered By Docstoc
					            Bayesian models for fMRI data

Klaas Enno Stephan
Laboratory for Social and Neural Systems Research
Institute for Empirical Research in Economics
University of Zurich

Functional Imaging Laboratory (FIL)
Wellcome Trust Centre for Neuroimaging
University College London



With many thanks for slides & images to:            The Reverend Thomas Bayes
                                                           (1702-1761)
FIL Methods group,
particularly Guillaume Flandin


                  Methods & models for fMRI data analysis
                           19 November 2008
   Why do I need to learn about Bayesian stats?

Because SPM is getting more and more Bayesian:
• Segmentation & spatial normalisation
• Posterior probability maps (PPMs)
   – 1st level: specific spatial priors
   – 2nd level: global spatial priors

• Dynamic Causal Modelling (DCM)
• Bayesian Model Selection (BMS)
• EEG: source reconstruction
Bayesian segmentation    Spatial priors          Posterior probability       Dynamic Causal
  and normalisation   on activation extent         maps (PPMs)                 Modelling
Image time-series
                                                            Statistical parametric map (SPM)
                        Kernel          Design matrix




 Realignment        Smoothing        General linear model


                                                               Statistical        Gaussian
                                                               inference         field theory
        Normalisation




                          Template                                              p <0.05
                                     Parameter estimates
       Problems of classical (frequentist) statistics
p-value: probability of getting the observed data in the effect’s
absence. If small, reject null hypothesis that there is no effect.
                            H0 :   0        Probability of observing the data y,
                            p( y | H 0 )      given no effect ( = 0).
Limitations:
 One can never accept the null hypothesis
 Given enough data, one can always demonstrate a significant effect
 Correction for multiple comparisons necessary

            Solution: infer posterior probability of the effect

                              p( | y )    Probability of the effect,
                                           given the observed data
                      Overview of topics

• Bayes' rule
• Bayesian update rules for Gaussian densities
• Bayesian analyses in SPM5
  – Segmentation & spatial normalisation
  – Posterior probability maps (PPMs)
     • 1st level: specific spatial priors
     • 2nd level: global spatial priors

  – Bayesian Model Selection (BMS)
Bayes in motion - an animation
                               Bayes’ rule
Given data y and parameters , the conditional probabilities are:
                    p( y, )                           p( y, )
        p( | y )                        p( y |  ) 
                     p( y )                             p( )
Eliminating p(y,) gives Bayes’ rule:

                                    Likelihood           Prior

                                     p( y |  ) p( )
       Posterior         P( | y ) 
                                         p( y )
                                             Evidence
           Principles of Bayesian inference
 Formulation of a generative model

                             likelihood p(y|)
                            prior distribution p()


 Observation of data
                                  y


 Update of beliefs based upon observations, given a prior
  state of knowledge

                        p( | y )  p( y |  ) p( )
Posterior mean & variance of univariate Gaussians
 Likelihood & Prior
                                         y  
     p ( y |  )  N ( y; ,  )
                              2
                              e

     p( )  N ( ;  p ,  p )
                            2

                                                           
                                                               
                                               Posterior
 Posterior: p( | y)  N ( ; , )
                                   2

        1   1  1                                                   Likelihood
           2 2
        2 e  p                              p
                                       Prior
             1            
             2   12  p 
          2

            e    p     
 Posterior mean =
 variance-weighted combination
 of prior mean and data mean
Same thing – but expressed as precision weighting
 Likelihood & prior
                                             y  
      p ( y |  )  N ( y; ,  )
                              1
                              e

      p( )  N ( ;  p , 1 )
                            p
                                                               
                                                                   
                                                   Posterior
 Posterior: p( | y )  N ( ;  , 1 )
                                                                       Likelihood
         e   p                                p
                                           Prior
          e  p
           p
             

 Relative precision weighting
Same thing – but explicit hierarchical perspective
Likelihood & Prior                              y   (1)   (1)
 p( y |  )  N ( y; ,1 /  )
             (1)               (1)      (1)      (1)   ( 2)   ( 2)
 p( (1) )  N ( (1) ; ( 2 ) ,1 / ( 2 ) )
                                                                   
                                                                          
                                                                              (1)


                                                       Posterior
Posterior
                                                                                    Likelihood
    p (   (1)
                 | y )  N ( ;  ,1 /  )
                             (1)


     (1)  ( 2 )
                                                        ( 2)
                                               Prior
      (1) (1) ( 2 ) ( 2 )
                 
               

Relative precision weighting
                          Bayesian GLM: univariate case
         Normal densities
                                      Univariate
    p( )  N ( ; p , p )
                         2            linear       y  x  e
                                      model

                                                           | y
 p ( y |  )  N ( y;x,  e2 )
                                                                   x

p( | y)  N ( ; | y ,2| y )
                                                   p
    1         x2          1
                     
   2
    |y          2
                  e       p
                           2


               x     1    
 | y       2 y  2 p 
              2
              
              |y
                     p 
               e          

Relative precision weighting
                        Bayesian GLM: multivariate case
     Normal densities                           General
                                                Linear    y  Xθ  e
   p (θ)  N (θ; η p , C p )                    Model



 p (y | θ)  N (y; Xθ, Ce )

p (θ | y )  N (θ; η | y , C | y )
                                           2
       1          1         1
  C | y  XT Ce X  C p
                              1
  η | y  C | y XT Ce y  C p η p    
One step if Ce is known.
                                                             1
Otherwise iterative estimation
with EM.
                 An intuitive example

     10


      5
2




      0


      -5
                   Prior
                   Likelihood
     -10           Posterior
           -10        -5        0    5   10
                                1
                   Less intuitive

     10


      5
2




      0


      -5
                 Prior
                 Likelihood
     -10         Posterior
           -10      -5        0    5   10
                              1
                   Even less intuitive

     10          Prior
                 Likelihood
                 Posterior

      5
2




      0



      -5



     -10

           -10           -5   0    5     10
                              1
                 Bayesian (fixed effects) group analysis

Likelihood distributions from different                            Under Gaussian assumptions this is
subjects are independent                                           easy to compute:
 one can use the posterior from one                               group                      individual
subject as the prior for the next                                  posterior                  posterior
                                                                   covariance                 covariances


p( | y1 )        p( y1 |  ) p( )                                                          N

p( | y1 , y2 )  p( y2 |  ) p( y1 |  ) p( )                    C  1
                                                                       | y1 ,..., y N     C|1 i
                                                                                                 y
                                                                                             i 1
                  p( y2 |  ) p( | y1 )
                                                                                            N 1             
...                                                                 | y ,..., y           C | yi | yi C | y1 ,..., y N
 p( | y1 ,..., y N )  p( y N |  ) p( | y N 1 )...p( | y1 )                            i 1             
                                                                          1         N




                                                                   group                   individual posterior
 “Today’s posterior is tomorrow’s prior”
                                                                   posterior               covariances and means
                                                                   mean
               Bayesian analyses in SPM5

• Segmentation & spatial normalisation
• Posterior probability maps (PPMs)
  – 1st level: specific spatial priors
  – 2nd level: global spatial priors

• Dynamic Causal Modelling (DCM)
• Bayesian Model Selection (BMS)
• EEG: source reconstruction
 Spatial normalisation: Bayesian regularisation
                                  Deformations consist of a linear
                                  combination of smooth basis functions
                                        lowest frequencies of a 3D
                                         discrete cosine transform.

Find maximum a posteriori (MAP) estimates: simultaneously minimise
   – squared difference between template and source image
   – squared difference between parameters and their priors
                                         Deformation parameters

  MAP:    log p( | y)  log p( y |  )  log p( )  log p( y)

  “Difference” between template         Squared distance between parameters and
         and source image                 their expected values (regularisation)
    Bayesian segmentation with empirical priors
• Goal: for each voxel, compute
                                      p (tissue| intensity)
  probability that it belongs to a
  particular tissue type, given its   p (intensity | tissue) ∙ p (tissue)
  intensity
• Likelihood model:
  Intensities are modelled by a
  mixture of Gaussian distributions
  representing different tissue
  classes (e.g. GM, WM, CSF).
• Priors are obtained from tissue
  probability maps (segmented
  images of 151 subjects).

                                      Ashburner & Friston 2005, NeuroImage
                Unified segmentation & normalisation
• Circular relationship between segmentation & normalisation:
     – Knowing which tissue type a voxel belongs to helps normalisation.
     – Knowing where a voxel is (in standard space) helps segmentation.

• Build a joint generative model:
     – model how voxel intensities result from mixture of tissue type distributions
     – model how tissue types of one brain have to be spatially deformed to match
       those of another brain

• Using a priori knowledge about the parameters:
  adopt Bayesian approach and maximise the posterior probability


Ashburner & Friston 2005, NeuroImage
                    Bayesian fMRI analyses
General Linear Model:
                                y  X         with    ~ N (0, C )

What are the priors?
  • In “classical” SPM, no priors (= “flat” priors)
  • Full Bayes: priors are predefined on a principled or empirical basis
  • Empirical Bayes: priors are estimated from the data, assuming a
    hierarchical generative model  PPMs in SPM

               Parameters of one level = priors for distribution
               of parameters at lower level
               Parameters and hyperparameters at each
               level can be estimated using EM
           Posterior Probability Maps (PPMs)
Posterior distribution: probability of the effect given the data
                     p( | y )    mean: size of effect
                                  precision: variability

Posterior probability map: images of the probability (confidence) that
an activation exceeds some specified threshold, given the data y
                                                       
           p(   | y )              p( | y )

                                                           
Two thresholds:
• activation threshold : percentage of whole brain mean signal
  (physiologically relevant size of effect)
• probability  that voxels must exceed to be displayed (e.g. 95%)
                       PPMs vs. SPMs

                p( | y)  p( y |  ) p( )

PPMs
        Posterior              Likelihood              Prior

                                    SPMs


                                                              u


                                                     t  f ( y)
Bayesian test: p(     | y)      Classical t-test: p(t  u |   0)  
            2nd level PPMs with global priors
1st level (GLM):
                                     y  X (1) (1)   (1)          p( )  N (0, C )

                                       (1)   ( 2)   ( 2)        p( )  N (0, C )
2nd level (shrinkage prior):
                                             0   ( 2)
Basic idea: use the variance of  over voxels
as prior variance of  at any particular voxel.                  p( )
2nd level:
(2) = average effect over voxels,
(2) = voxel-to-voxel variation.
                                                                         0
(1)reflects regionally specific effects
 assume that it sums to zero over all voxels              In the absence of evidence
 shrinkage prior at the second level                      to the contrary, parameters
 variance of this prior is implicitly estimated                will shrink to zero.
by estimating (2)
                          Shrinkage Priors
Small & variable effect             Large & variable effect




Small but clear effect              Large & clear effect
                     2nd level PPMs with global priors
     1st level (GLM):
                                   y  X   (1)   p( )  N (0, C )         voxel-specific


     2nd level (shrinkage prior):

                                     0   ( 2)   p( )  N (0, C ) global 
                                                                               pooled estimate


    We are looking for the same                    Once Cε and C are known, we can
     effect over multiple voxels                    apply the usual rule for computing the
                                                    posterior mean & covariance:
    Pooled estimation of C over
     voxels
                                                            C | y  X T C1 X  C1 
                                                                                         1



Friston & Penny 2003, NeuroImage                            m | y  C | y X T C1 y
         PPMs and multiple comparisons

No need to correct for multiple comparisons:
Thresholding a PPM at 95% confidence: in every voxel, the
posterior probability of an activation  is  95%.
At most, 5% of the voxels identified could have activations less
than .
Independent of the search volume, thresholding a PPM thus
puts an upper bound on the false discovery rate.
                                                                 PPMs vs.SPMs

                                                   rest [2.06]                                                                                      rest

                                                                            contrast(s)                                                                               contrast(s)

                                  <                <                                                                           <                <




                                                                                                                                                           3
                                                                 4




                                                                                             SPMmip
SPMmip




                                                                                                                                                            1




                                                                                             [0, 0, 0]
                                                                  1
[0, 0, 0]




                                                                  4                                                                                         4
                                                                  7                                                                                         7
                                                                 10                                                                                        10
                                                                 13                                                                                        13
                                                                 16                                                                                        16
                                                                 19                                                                                        19
                                            PPM 2.06
                                                                 22                                                                      SPM{T39.0}        22
                                                                                                                                                           25
                                  <                              25
                                                                 28                                                            <                           28
                                                                                                                                                           31
                                                                 31                                                                                        34
                                                                 34                                                                                        37
                                                                 37                                                                                        40
                                                                 40                                                                                        43
                                                                 43
            SPMresults: C:\home\spm\analysis_PET                 46                                      SPMresults: C:\home\spm\analysis_PET              46
                                                                                                                                                           49
                                                                 49                                                                                        52
            Height threshold P = 0.95                            52                                      Height threshold T = 5.50
                                                                                                                                                           55
                                                                 55                                      Extent threshold k = 0 voxels
            Extent threshold k = 0 voxels                        60                                                                                        60
                                                                                                                                                                1 4 7 10 13 16 19 22
                                                                      1 4 7 10 13 16 19 22
                                                                                                                                                                    Design matrix
                                                                          Design matrix




                       PPMs: Show activations                                                                                   SPMs: Show voxels
                        greater than a given                                                                                      with non-zeros
                                size                                                                                                activations
               PPMs: pros and cons

    Advantages               Disadvantages
• One can infer that a     • Estimating priors over
 cause did not elicit a     voxels is
 response                   computationally
                            demanding
• Inference is
 independent of search     • Practical benefits are
 volume                      yet to be established
• SPMs conflate effect-    • Thresholds other than
 size and effect-            zero require
 variability                 justification
                   1st level PPMs with local spatial priors
   • Neighbouring voxels often not independent
   • Spatial dependencies vary across the brain
   • But spatial smoothing in SPM is uniform
   • Matched filter theorem: SNR maximal when
     smoothing the data with a kernel which             Contrast map
     matches the smoothness of the true signal
   • Basic idea: estimate regional spatial
     dependencies from the data and use this as
     a prior in a PPM
      regionally specific smoothing
      markedly increased sensitivity
                                                         AR(1) map
Penny et al. 2005, NeuroImage
                   The generative spatio-temporal model
                                                         q1                q2                                r1                  r2
                                             K
                                p  α    p  k                                             P

                                             k 1
                                                                                  p (β)   p (  p )
                                                                                                p 1
                                p  k   Ga  k ; q1 , q2 
                                                                                  p (  p )  Ga (  p ; r1 , r2 )


         u1           u2                                                                                             
                                                                     p W   p w             
                                       N                                         K

                            p  λ    p  n 
                                                                                            T                                         P
                                                                                            k                             p ( A )   p (a p )
                                                                                k 1
                                      n 1                                                                                           p 1

                            p  n   Ga  n ; u1 , u2                  k           
                                                                     p  w T   N w T ; 0,  k1  ST S 
                                                                                     k
                                                                                                             1
                                                                                                                         p (a p )  N  a p ; 0,  p 1 (ST S) 1 

                                                                W                                                    A


                                                                                                        = spatial precision of parameters
                                                                 Y                                      = observation noise precision
                                                                                                        = precision of AR coefficients
Penny et al. 2005, NeuroImage                            Y=XW+E
                             The spatial prior
Prior for k-th parameter:

             T
               k         
           p w  N w ;0,    T
                             k
                                    1
                                    k    S S 
                                           T   1
                                                    
             Shrinkage            Spatial precision:        Spatial
               prior          determines the amount of   kernel matrix
                                    smoothness


Different choices possible for spatial kernel matrix S.
Currently used in SPM: Laplacian prior (same as in LORETA)
Example: application to event-related fMRI data
                          Smoothing
                                           Contrast maps for
                                           familiar vs. non-familiar
                                           faces, obtained with
                                          - smoothing
                                          - global spatial prior
                                          - Laplacian prior




         Global prior   Laplacian Prior
SPM5 graphical user interface
              Bayesian model selection (BMS)
Given competing hypotheses
on structure & functional
mechanisms of a system, which
model is the best?




Which model represents the
best balance between model
fit and model complexity?




For which model m does
p(y|m) become maximal?
                                           Pitt & Miyung (2002), TICS
                        Bayesian model selection (BMS)
                                                 p( y |  , m) p( | m)
           Bayes’ rules:           p( | y, m) 
                                                        p( y | m)

           Model evidence:        p( y | m)   p( y |  , m)  p( | m) d

                        accounts for both accuracy and complexity of the model
                        allows for inference about structure (generalisability) of the model


          Various approximations, e.g.:            Model comparison via Bayes factor:
          - negative free energy
                                                                p ( y | m1 )
          - AIC                                            BF 
          - BIC                                                 p( y | m2 )
Penny et al. (2004) NeuroImage
              Example: BMS of dynamic causal models
                                                   attention
                                   M1                                                             M2
    modulation of back-                                PPC                                                          PPC
       ward or forward                                                M2 better than M1
                                                                                                  attention
       connection?                                                       BF = 2966
                                   stim       V1         V5                                       stim        V1     V5



    additional driving            M3     attention
       effect of attention                              PPC
       on PPC?                                                                     BF = 12
                                                                              M3 better than M2
                                   stim       V1         V5


    bilinear or nonlinear                                                                        M4     attention
                                                                                                                     PPC
       modulation of
                                                                    BF = 23
       forward connection?
                                                               M4 better than M3
                                                                                                  stim        V1     V5
Stephan et al. (2008) NeuroImage
Thank you

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:15
posted:3/28/2011
language:English
pages:39