Classical Inference by v143d0S9

VIEWS: 4 PAGES: 38

									            Classical (frequentist) inference

Klaas Enno Stephan
Branco Weiss Laboratory (BWL)
Institute for Empirical Research in Economics
University of Zurich

Functional Imaging Laboratory (FIL)
Wellcome Trust Centre for Neuroimaging
University College London



With many thanks for slides & images to:
FIL Methods group, especially Guillaume Flandin



                   Methods & models for fMRI data analysis
                             22 October 2008
                                Overview of SPM
                                                           Statistical parametric map (SPM)
Image time-series     Kernel           Design matrix




 Realignment        Smoothing       General linear model


                                                             Statistical         Gaussian
         Normalisation                                       inference          field theory




                         Template                                           p <0.05
                                     Parameter estimates
 Voxel-wise time series analysis

                                 model
                              specification
                               parameter
                               estimation




               Time
                              hypothesis
                                statistic




                BOLD signal
single voxel
time series                        SPM
                            Overview
•   A recap of model specification and parameter estimation

•   Hypothesis testing

•   Contrasts and estimability
       • T-tests
       • F-tests

•   Design orthogonality

•   Design efficiency
        Mass-univariate analysis: voxel-wise GLM
    1                     p           1         1
                                                      y  X  e
                                  
                              p                       e ~ N (0,  I )       2


    y    =         X                      +   e       Model is specified by
                                                      1. Design matrix X
                                                      2. Assumptions about e

                                                      N: number of scans
                                          N           p: number of regressors
N         N
          The design matrix embodies all available knowledge about
          experimentally controlled factors and potential confounds.
            Parameter estimation

                                                    N

                                                   e
                             Objective:
                                                           2
                             estimate parameters
              1           to minimize                   t

    =          +
                                                    t 1

              2


y       X             e         Ordinary least squares
                                  estimation (OLS)
                                (assuming i.i.d. error):

    y  X  e                  ˆ
                                 ( X T X )1 X T y
                   OLS parameter estimation
The Ordinary Least Squares (OLS) estimators are:     ˆ  ( X T X ) 1 X T y
                                                     
These estimators minimise     et2  eT e   . They are found solving either

                     et2 
                              0   or   X Te  0
                      ˆ
                        t


Under i.i.d. assumptions, the OLS estimates correspond to ML estimates:

     e ~ N (0,  2 I )                      Y ~ N ( X ,  2 I )
           ˆ ˆ
           eT e                             ˆ
                                             ~ N (  ,  2 ( X T X ) 1 )
      
      ˆ 2

          Np                                            NB: precision of our estimates
                                                          depends on design matrix!
          Maximum likelihood (ML) estimation
probability density function ( fixed!)          y  f ( y | )
                                             f ( y | )
likelihood function (y fixed!)               L( | y )
                                           L( | y )  f ( y |  )

ML estimator
                                           ˆ
                                            arg max L( | y )
                                                  
For cov(e)=2I, the ML estimator is
equivalent to the OLS estimator:             ˆ
                                              ( X T X ) 1 X T y   OLS

For cov(e)=2V, the ML estimator is
equivalent to a weighted least            ˆ  ( X TVX ) 1 X TVy    WLS
sqaures (WLS) estimate:
        SPM: t-statistic based on ML estimates
Wy  WX  We               c Tˆ                        ˆ
                                               st d (cT  ) 
                                                ˆ
                     t
                                   ˆ
                        st d ( cT  )
                         ˆ                        c (WX ) (WX ) c
                                                  ˆ 2 T                   T
ˆ
  (WX )  Wy
 c=10000000000         W V        1/ 2
                                                  
                                                  ˆ 2
                                                              
                                                           Wy  WXˆ          2



                     V  Cov(e)
                       2                                          tr( R)
       X                                            R  I  WX (WX ) 
                                V 
                                  Q  i   i




                                               For brevity:

                            ReML-              (WX )  ( X TWX )1 X T
                           estimates
                            Hypothesis testing

To test an hypothesis, we construct a “test statistics”.

• “Null hypothesis” H0 = “there is no effect”  cT = 0
  This is what we want to disprove.
   The “alternative hypothesis” H1 represents the outcome of interest.


• The test statistic T
  The test statistic summarises the evidence for
  H 0.
  Typically, the test statistic is small in magnitude
  when H0 is true and large when H0 is false.
   We need to know the distribution of T under
  the null hypothesis.
                                                           Null Distribution of T
                           Hypothesis testing
• Type I Error α:                                                                 u
  Acceptable false positive rate α.
  Threshold uα controls the false positive rate
                      p (T  u  | H 0 )
                                                                                      
• Observation of test statistic t, a realisation of T:   Null Distribution of T
  A p-value summarises evidence against H0.
  This is the probability of observing t, or a more
  extreme value, under the null hypothesis:                                  t

                      p(T  t | H 0 )

 • The conclusion about the hypothesis:                                           p
   We reject H0 in favour of H1 if t > uα
                                                         Null Distribution of T
           One cannot accept the null hypothesis
               (one can just fail to reject it)




Absence of evidence is not evidence of absence!
If we do not reject H0, then all can say is that there is not enough evidence in the
data to reject H0. This does not mean that there is a strong evidence to accept H0.

What does this mean for neuroimaging results based on classical statistics?
A failure to find an “activation” in a particular area does not mean we can conclude
that this area is not involved in the process of interest.
                                 Contrasts
• We are usually not interested in the whole  vector.

• A contrast selects a specific effect of interest:
   a contrast c is a vector of length p
   cT is a linear combination of regression coefficients 

                         cT = [1 0 0 0 0 …]
                    cTβ = 1x1 + 0x2 + 0x3 + 0x4 + 0x5 + . . .


                           cT = [0 -1 1 0 0 …]
                    cTβ = 0x1 + -1x2 + 1x3 + 0x4 + 0x5 + . . .


• Under i.i.d assumptions:

        ˆ ~ N (  ,  2 cT ( X T X ) 1 c)
      c 
        T                                                           NB: the precision of our
                                                                 estimates depends on design
                                                                matrix and the chosen contrast !
                       Estimability of a contrast




                                                                          1




                                                                                                          2
                                                                     Factor




                                                                                                     Factor




                                                                                                                         Mean
• If X is not of full rank then different parameters
                                                                                      One-way ANOVA
  can give identical predictions.                                               (unpaired two-sample t-test)



• The parameters are therefore ‘non-unique’, ‘non-                   1         0      1
                                                                     1         0      1
  identifiable’ or ‘non-estimable’.




                                                            images
                                                                     1         0      1
                                                                     1         0      1
• For such models, XTX is not invertible so we must                  0         1      1
  resort to generalised inverses (SPM uses the                       0         1      1
                                                                     0         1      1
  Moore-Penrose pseudo-inverse).                                     0         1      1
                                                                                              parameters


                                                                                                    Rank(X)=2
• Example:                                                                    (gray
                                                                                      parameter estimability
                                                                                          
                                                                                              not uniquely specified)



   [1 0 0], [0 1 0], [0 0 1] are not estimable.
   [1 0 1], [0 1 1], [1 -1 0], [0.5 0.5 1] are estimable.
                         t-contrasts – SPM{t}
                            Question:            box-car amplitude > 0 ?
cT = 1 0 0 0 0 0 0 0                                         =
                                                      1 = c T  > 0 ?

    1 2 3 4 5 ...      Null hypothesis:            H0: cT=0

                                               contrast of
                            Test statistic:
                                                estimated
                                               parameters
                                        t=                                          ˆ
                                                                       p ( y | c T   0)
                                                variance
                                                estimate


                                  cT ˆ                cT ˆ
                           t                                         ~ tN  p
                                                   2 c T X T X  c
                                      T ˆ
                              st d ( c  )
                               ˆ                   ˆ
                                                                1
                                t-contrasts in SPM
For a given contrast c:



                                                        ResMS image
                          beta_???? images
                                                             ˆ ˆ
                                                             eT e
                     ˆ
                       ( X T X ) 1 X T y             
                                                        ˆ 2

                                                            Np



                          con_???? image             spmT_???? image

                                cT ˆ                     SPM{t}
                                             t-contrast: a simple example

    Passive word listening versus rest

cT = [ 1                           0 ]            Q: activation during
                                                      listening ?
1




    10                                         Null hypothesis:   1  0                    SPMresults:
                                                                                            Height threshold T = 3.2057 {p<0.001}
    20                             X                                                         voxel-level
                                                                                                                               mm mm mm
    30
                                                                                              T        ( Z)   p uncorrected

    40
                                                    c  ˆ T          Statistics:
                                                                      set-level
                                                                                    p-values adjusted for search volume
                                                                                             13.94
                                                                                                 cluster-level
                                                                                             12.04
                                                                                                                              Inf 0.000
                                                                                                                              Infp 0.000 T        voxel-level
                                                                                                                                                                -63 -27 15
                                                                                                                                                                -48 p -33 12mm mm

                                               t
                                                                                                                                                                                 mm
                                                                       p        c      p corrected     kE      p                           p FDR-corr           (Z )
                                                                                             11.82                            Inf 0.000
                                                                                                                    uncorrected   FWE-corr
                                                                                                                                                                        -21
                                                                                                                                                                -66 uncorrected
                                                                                                                                                                   
                                                                                                                                                                                 6
                                                                     0.000 10          0.000
                                                                                             13.72
                                                                                               520                 0.000
                                                                                                                              Inf0.000 0.000 13.94
                                                                                                                                           0.000
                                                                                                                                                                    57 0.000
                                                                                                                                                                  Inf
                                                                                                                                                                        -21 12 -27 15
                                                                                                                                                                                -63



                                                          ˆ
    50


                                                  Std (cT  )
                                                                                                                                 0.000     0.000      12.04       Inf  0.000    -48 -33 12
                                                                                             12.29                            Inf0.000 0.000 11.82
                                                                                                                                           0.000                        -12 -3 -21 12
                                                                                                                                                                    63 0.000
                                                                                                                                                                  Inf           -66       6

    60
                                                                                       0.000
                                                                                              9.89
                                                                                               426                 0.000
                                                                                                                          7.830.000 0.000 13.72
                                                                                                                                 0.000
                                                                                                                                           0.000
                                                                                                                                           0.000      12.29         57 0.000
                                                                                                                                                                  Inf
                                                                                                                                                                  Inf   -39
                                                                                                                                                                       0.000     6 -12 -3
                                                                                                                                                                                 57 -21
                                                                                                                                                                                 63
                                                                                              7.39                        6.360.000 0.000 9.89
                                                                                                                                           0.000                7.83    -30
                                                                                                                                                                    36 0.000 -15 -39 6
                                                                                                                                                                                 57
                                                                                       0.000
                                                                                       0.000  6.84
                                                                                               35
                                                                                               9
                                                                                                                   0.000
                                                                                                                   0.000  5.990.000 0.000 7.39
                                                                                                                                 0.000     0.000
                                                                                                                                           0.000       6.84
                                                                                                                                                                6.36
                                                                                                                                                                5.9951 0.0000 48 -30 -15
                                                                                                                                                                       0.000     36
                                                                                                                                                                                 51   0 48
    70                                                                                 0.002  6.36
                                                                                               3                   0.024  5.650.001 0.000 6.36
                                                                                                                                           0.000                -63 -54 -3
                                                                                                                                                                5.65   0.000    -63 -54 -3
                                                                                       0.000
                                                                                       0.000  6.19
                                                                                               8
                                                                                               9
                                                                                                                   0.001
                                                                                                                   0.000  5.530.003 0.000 6.19
                                                                                                                                 0.001     0.000
                                                                                                                                           0.000       5.96     -30 0.000 -18 -27 9
                                                                                                                                                                5.53
                                                                                                                                                                5.36    -33
                                                                                                                                                                       0.000    -30 -33 -18
                                                                                                                                                                                 36
                                                                                              5.96                        5.360.004 0.000 5.84                          -27 -45 42 9
                                                                                                                                                                    36 0.000     9
                                                            ˆ
    80                                                                                 0.005   2                   0.058                   0.000                5.27



                                               p ( y | c T   0)
                                                                                       0.015
                                                                                       0.015  5.84
                                                                                               1
                                                                                               1
                                                                                                                   0.166
                                                                                                                   0.166  5.270.022 0.000 5.44
                                                                                                                                 0.036
                                                                                                                                           0.000
                                                                                                                                           0.000       5.32     -45 0.000
                                                                                                                                                                4.97
                                                                                                                                                                4.87      42
                                                                                                                                                                       0.000     9
                                                                                                                                                                                 48 27 24
                                                                                                                                                                                 36 -27 42
         0.5   1           1.5     2   2.5                                                    5.44                        4.97 0.000                                48 27 24
                                                                                              5.32                        4.87 0.000                                36 -27 42
                   Design matrix
                                         Student's t-distribution
•   first described by William Sealy Gosset, a statistician at the Guinness brewery at Dublin
•   t-statistic is a signal-to-noise measure: t = effect / standard deviation

•   t-distribution is an approximation to the normal distribution for small samples

•   t-contrasts are simply combinations of the betas
         the t-statistic does not depend on the scaling of the regressors or on the scaling of
    the contrast

•   Unilateral test:             H 0 : cT   0                   vs.           H1 : cT   0
                        0.4

                                                                                                   n =1
                       0.35
                                                                                                   n =2
                                                                                                   n =5
                        0.3                                                                        n =10
                                                                                                   n= 
                       0.25


                        0.2


                       0.15


                        0.1


                       0.05


                          0
                           -5       -4      -3      -2      -1      0       1       2      3   4           5

                                Probability density function of Student’s t distribution
          F-test: the extra-sum-of-squares principle
                   Model comparison: Full vs. reduced model
                 Null Hypothesis H0: True model is X0 (reduced model)

X0        X1                          X0               F-statistic: ratio of unexplained
                                                       variance under X0 and total
                                                       unexplained variance under the
                                                       full model

                                                              RSS0  RSS
                           RSS                 RSS0        F
                                                                 RSS
                            ˆ full
                            e2                 ˆ2
                                                ereduced
                                                              ESS
                                                           F     ~ Fn 1 ,n 2
                                                              RSS
                                                               n1 = rank(X) – rank(X0)
 Full model (X0 + X1)?          Or reduced model?              n2 = N – rank(X)
        F-test: multidimensional contrasts – SPM{F}
                      Tests multiple linear hypotheses:
 H0: True model is X0         H0: 3 = 4 = ... = 9 = 0   test H0 : cT = 0 ?
X0     X1 (3-9)         X0                     00100000
                                                00010000
                                                00001000
                                         cT =
                                                00000100
                                                00000010
                                                00000001




                                                                     SPM{F6,322}


 Full model?       Reduced model?
         F-contrast in SPM


                                ResMS image
  beta_???? images
                                     ˆ ˆ
                                    eT e
ˆ
  ( X T X ) 1 X T y          
                                ˆ 2

                                    Np




  ess_???? images            spmF_???? images

   ( RSS0 - RSS )                 SPM{F}
F-test example: movement related effects
                     To assess       movement-related
                     activation:
                     There is a lot of residual
                     movement-related artifact in the
                     data (despite spatial realignment),
                     which tends to be concentrated
                     near the boundaries of tissue
                     types.
                     By including the realignment
                     parameters in our design matrix,
                     we can “regress out” linear
                     components            of       subject
                     movement, reducing the residual
                     error, and hence improve our
                     statistics for the effects of interest.
                    Differential F-contrasts




Think of it as constructing 3 regressors from the 3 differences and complement
this new design matrix such that data can be fitted in the same exact way (same
error, same fitted data).
                            F-test: a few remarks
• F-tests can be viewed as testing for the additional variance explained by a
  larger model wrt. a simpler (nested) model  model comparison
• F tests a weighted sum of squares of one or several combinations of the
  regression coefficients .
• In practice, partitioning of X into [X0 X1] is done by multidimensional contrasts.

• Hypotheses:
      1   0   0   0
      0   1   0   0
                        Null hypothesis H0:            β1 = β2 = ... = βp = 0
                   
      0           0
      
           0   1
                       Alternative hypothesis H1:     At least one βk ≠ 0
      0   0   0   0



• F-tests are not directional:
  When testing a uni-dimensional contrast with an F-test, for example 1 – 2, the
  result will be the same as testing 2 – 1.
  It will be exactly the square of the t-test, testing for both positive and negative
  effects.
Example: a suboptimal model

                            True signal and observed signal (--)



                           Model (green, pic at 6sec)
                           TRUE signal (blue, pic at 3sec)



                             Fitting (1 = 0.2, mean = 0.11)




                           Residual (still contains some signal)


  Test for the green regressor not significant
Example: a suboptimal model

               1 = 0.22
               2 = 0.11           Residual Var.= 0.3

                                     p(Y| b1 = 0) 
                                     p-value = 0.1
                                        (t-test)

           =               +

                                     p(Y| b1 = 0) 
                                     p-value = 0.2
                                        (F-test)
       Y        X             e
A better model

                True signal + observed signal


                Model (green and red)
                and true signal (blue ---)
                Red regressor : temporal derivative of
                the green regressor

                 Total fit (blue)
                 and partial fit (green & red)
                 Adjusted and fitted signal


                 Residual (a smaller variance)

      t-test of the green regressor significant
      F-test very significant
      t-test of the red regressor very significant
A better model

        1 = 0.22
        2 = 2.15
        3 = 0.11

                            Residual Var. = 0.2

                               p(Y| b1 = 0) 
                               p-value = 0.07
    =               +              (t-test)

                            p(Y| b1 = 0, b2 = 0) 
                            p-value = 0.000001
                                    (F-test)
Y       X              e
              Correlation among regressors
                                                     y



                               x2*   x2
                                          x1


   y  x11  x2  2  e                       y  x11  x2  2  e
                                                           * *


   1   2  1                                1  1;  2*  1

Correlated regressors =                   When x2 is orthogonalized with
explained variance is shared              regard to x1, only the parameter
between regressors                        estimate for x1 changes, not that
                                          for x2!
Design orthogonality

      • For each pair of columns of the design
        matrix, the orthogonality matrix depicts the
        magnitude of the cosine of the angle
        between them, with the range 0 to 1 mapped
        from white to black.

      • The cosine of the angle between two vectors
        a and b is obtained by:

                                 ab
                       cos  
                                 ab

      • If both vectors have zero mean then the
        cosine of the angle between the vectors is
        the same as the correlation between the
        two variates.
Correlated regressors

                   True signal




                     Model (green and red)




                    Fit (blue : total fit)




                    Residual
Correlated regressors

           1 = 0.79
           2 = 0.85            Residual var. = 0.3
           3 = 0.06
                                      p(Y| b1 = 0) 
                                     p-value = 0.08
                                         (t-test)

       =                +             P(Y| b2 = 0) 
                                     p-value = 0.07
                                         (t-test)

                                  p(Y| b1 = 0, b2 = 0) 
   Y           X           e       p-value = 0.002
                                        (F-test)
           1

           2

               1    2
After orthogonalisation

                          True signal

                     Model (green and red)
                    red regressor has been
            orthogonalised with respect to the green
                              one
             remove everything that correlates with
                      the green regressor

                            Fit (does not change)




                           Residuals (do not change)
After orthogonalisation
              1 = 1.47   (0.79)
              2 = 0.85   (0.85)       Residual var. = 0.3
              3 = 0.06   (0.06)
                                           p(Y| b1 = 0)
                                                              does
                                        p-value = 0.0003      change

                                             (t-test)

                                          p(Y| b2 = 0)
      =                    +                                  does
                                         p-value = 0.07       not
                                                              change
                                             (t-test)

                                       p(Y| b1 = 0, b2 = 0)   does
                                        p-value = 0.002       not
  Y            X                  e                          change
                                             (F-test)

          1
          2
               1   2
                             Design efficiency
• The aim is to minimize the standard error of a t-contrast                           ˆ
                                                                                   cT 
  (i.e. the denominator of a t-statistic).                              T 
                             ˆ                                                           ˆ
                                                                                 var(c T  )
                       var(c  )   2 c T ( X T X ) 1 c
                              T
                                    ˆ

• This is equivalent to maximizing the efficiency ε:

                           e ( 2 , c, X )  ( 2cT ( X T X ) 1 c) 1
                               ˆ               ˆ

                                  Noise variance            Design variance

• If we assume that the noise variance is independent of the specific design:
                                                                                  NB: efficiency
                           e ( c, X )  ( c ( X X ) c )
                                                   T   T      1      1       depends on design
                                                                              matrix and the chosen
                                                                                     contrast !

• This is a relative measure: all we can say is that one design is more efficient than
  another (for a given contrast).
                         Design efficiency
                    e ( c, X )  ( c ( X X ) c )
                                      T         T       1   1



• XTX is the covariance matrix of the regressors in the design matrix
• efficiency decreases with increasing covariance
• but note that efficiency differs across contrasts


                                                1     0.9
                                          X X 
                                            T

                                                0.9   1 

                                          cT = [1 0]         → ε = 5.26
                                          cT = [1 1]         → ε = 20
                                          cT = [1 -1]        → ε = 1.05
                                     Example: working memory
                          A                              B                              C
           Stimulus         Response          Stimulus       Response        Stimulus       Response
Time (s)




                 Correlation = -.65              Correlation = +.33               Correlation = -.24
               Efficiency ([1 0]) = 29          Efficiency ([1 0]) = 40         Efficiency ([1 0]) = 47


           •      A: Response follows each stimulus with (short) fixed delay.
           •      B: Jittering the delay between stimuli and responses.
           •      C: Requiring a response only for half of all trials (randomly chosen).
Thank you

								
To top