# Classical Inference by v143d0S9

VIEWS: 4 PAGES: 38

• pg 1
```									            Classical (frequentist) inference

Klaas Enno Stephan
Branco Weiss Laboratory (BWL)
Institute for Empirical Research in Economics
University of Zurich

Functional Imaging Laboratory (FIL)
Wellcome Trust Centre for Neuroimaging
University College London

With many thanks for slides & images to:
FIL Methods group, especially Guillaume Flandin

Methods & models for fMRI data analysis
22 October 2008
Overview of SPM
Statistical parametric map (SPM)
Image time-series     Kernel           Design matrix

Realignment        Smoothing       General linear model

Statistical         Gaussian
Normalisation                                       inference          field theory

Template                                           p <0.05
Parameter estimates
Voxel-wise time series analysis

model
specification
parameter
estimation

Time
hypothesis
statistic

BOLD signal
single voxel
time series                        SPM
Overview
•   A recap of model specification and parameter estimation

•   Hypothesis testing

•   Contrasts and estimability
• T-tests
• F-tests

•   Design orthogonality

•   Design efficiency
Mass-univariate analysis: voxel-wise GLM
1                     p           1         1
y  X  e

p                       e ~ N (0,  I )       2

y    =         X                      +   e       Model is specified by
1. Design matrix X

N: number of scans
N           p: number of regressors
N         N
The design matrix embodies all available knowledge about
experimentally controlled factors and potential confounds.
Parameter estimation

N

e
Objective:
2
estimate parameters
 1           to minimize                   t

=          +
t 1

 2

y       X             e         Ordinary least squares
estimation (OLS)
(assuming i.i.d. error):

y  X  e                  ˆ
  ( X T X )1 X T y
OLS parameter estimation
The Ordinary Least Squares (OLS) estimators are:     ˆ  ( X T X ) 1 X T y

These estimators minimise     et2  eT e   . They are found solving either

  et2 
0   or   X Te  0
ˆ
t

Under i.i.d. assumptions, the OLS estimates correspond to ML estimates:

e ~ N (0,  2 I )                      Y ~ N ( X ,  2 I )
ˆ ˆ
eT e                             ˆ
 ~ N (  ,  2 ( X T X ) 1 )
 
ˆ 2

Np                                            NB: precision of our estimates
depends on design matrix!
Maximum likelihood (ML) estimation
probability density function ( fixed!)          y  f ( y | )
  f ( y | )
likelihood function (y fixed!)               L( | y )
L( | y )  f ( y |  )

ML estimator
ˆ
  arg max L( | y )

For cov(e)=2I, the ML estimator is
equivalent to the OLS estimator:             ˆ
  ( X T X ) 1 X T y   OLS

For cov(e)=2V, the ML estimator is
equivalent to a weighted least            ˆ  ( X TVX ) 1 X TVy    WLS
sqaures (WLS) estimate:
SPM: t-statistic based on ML estimates
Wy  WX  We               c Tˆ                        ˆ
st d (cT  ) 
ˆ
t
ˆ
st d ( cT  )
ˆ                        c (WX ) (WX ) c
ˆ 2 T                   T
ˆ
  (WX )  Wy
c=10000000000         W V        1/ 2
 
ˆ 2

 Wy  WXˆ          2

 V  Cov(e)
2                                          tr( R)
X                                            R  I  WX (WX ) 
V 
 Q  i   i

For brevity:

ReML-              (WX )  ( X TWX )1 X T
estimates
Hypothesis testing

To test an hypothesis, we construct a “test statistics”.

• “Null hypothesis” H0 = “there is no effect”  cT = 0
This is what we want to disprove.
 The “alternative hypothesis” H1 represents the outcome of interest.

• The test statistic T
The test statistic summarises the evidence for
H 0.
Typically, the test statistic is small in magnitude
when H0 is true and large when H0 is false.
 We need to know the distribution of T under
the null hypothesis.
Null Distribution of T
Hypothesis testing
• Type I Error α:                                                                 u
Acceptable false positive rate α.
Threshold uα controls the false positive rate
  p (T  u  | H 0 )

• Observation of test statistic t, a realisation of T:   Null Distribution of T
A p-value summarises evidence against H0.
This is the probability of observing t, or a more
extreme value, under the null hypothesis:                                  t

p(T  t | H 0 )

• The conclusion about the hypothesis:                                           p
We reject H0 in favour of H1 if t > uα
Null Distribution of T
One cannot accept the null hypothesis
(one can just fail to reject it)

Absence of evidence is not evidence of absence!
If we do not reject H0, then all can say is that there is not enough evidence in the
data to reject H0. This does not mean that there is a strong evidence to accept H0.

What does this mean for neuroimaging results based on classical statistics?
A failure to find an “activation” in a particular area does not mean we can conclude
that this area is not involved in the process of interest.
Contrasts
• We are usually not interested in the whole  vector.

• A contrast selects a specific effect of interest:
 a contrast c is a vector of length p
 cT is a linear combination of regression coefficients 

cT = [1 0 0 0 0 …]
cTβ = 1x1 + 0x2 + 0x3 + 0x4 + 0x5 + . . .

cT = [0 -1 1 0 0 …]
cTβ = 0x1 + -1x2 + 1x3 + 0x4 + 0x5 + . . .

• Under i.i.d assumptions:

ˆ ~ N (  ,  2 cT ( X T X ) 1 c)
c 
T                                                           NB: the precision of our
estimates depends on design
matrix and the chosen contrast !
Estimability of a contrast

1

2
Factor

Factor

Mean
• If X is not of full rank then different parameters
One-way ANOVA
can give identical predictions.                                               (unpaired two-sample t-test)

• The parameters are therefore ‘non-unique’, ‘non-                   1         0      1
1         0      1
identifiable’ or ‘non-estimable’.

images
1         0      1
1         0      1
• For such models, XTX is not invertible so we must                  0         1      1
resort to generalised inverses (SPM uses the                       0         1      1
0         1      1
Moore-Penrose pseudo-inverse).                                     0         1      1
parameters

Rank(X)=2
• Example:                                                                    (gray
parameter estimability

    not uniquely specified)

[1 0 0], [0 1 0], [0 0 1] are not estimable.
[1 0 1], [0 1 1], [1 -1 0], [0.5 0.5 1] are estimable.
t-contrasts – SPM{t}
Question:            box-car amplitude > 0 ?
cT = 1 0 0 0 0 0 0 0                                         =
1 = c T  > 0 ?

1 2 3 4 5 ...      Null hypothesis:            H0: cT=0

contrast of
Test statistic:
estimated
parameters
t=                                          ˆ
p ( y | c T   0)
variance
estimate

cT ˆ                cT ˆ
t                                         ~ tN  p
 2 c T X T X  c
T ˆ
st d ( c  )
ˆ                   ˆ
1
t-contrasts in SPM
For a given contrast c:

ResMS image
beta_???? images
ˆ ˆ
eT e
ˆ
  ( X T X ) 1 X T y             
ˆ 2

Np

con_???? image             spmT_???? image

cT ˆ                     SPM{t}
t-contrast: a simple example

Passive word listening versus rest

cT = [ 1                           0 ]            Q: activation during
listening ?
1

10                                         Null hypothesis:   1  0                    SPMresults:
Height threshold T = 3.2057 {p<0.001}
20                             X                                                         voxel-level
mm mm mm
30
T        ( Z)   p uncorrected

40
c  ˆ T          Statistics:
set-level
13.94
cluster-level
12.04
Inf 0.000
Infp 0.000 T        voxel-level
-63 -27 15
-48 p -33 12mm mm

t
mm
p        c      p corrected     kE      p                           p FDR-corr           (Z )
11.82                            Inf 0.000
uncorrected   FWE-corr
-21
-66 uncorrected

6
0.000 10          0.000
13.72
520                 0.000
Inf0.000 0.000 13.94
0.000
57 0.000
Inf
-21 12 -27 15
-63

ˆ
50

Std (cT  )
0.000     0.000      12.04       Inf  0.000    -48 -33 12
12.29                            Inf0.000 0.000 11.82
0.000                        -12 -3 -21 12
63 0.000
Inf           -66       6

60
0.000
9.89
426                 0.000
7.830.000 0.000 13.72
0.000
0.000
0.000      12.29         57 0.000
Inf
Inf   -39
0.000     6 -12 -3
57 -21
63
7.39                        6.360.000 0.000 9.89
0.000                7.83    -30
36 0.000 -15 -39 6
57
0.000
0.000  6.84
35
9
0.000
0.000  5.990.000 0.000 7.39
0.000     0.000
0.000       6.84
6.36
5.9951 0.0000 48 -30 -15
0.000     36
51   0 48
70                                                                                 0.002  6.36
3                   0.024  5.650.001 0.000 6.36
0.000                -63 -54 -3
5.65   0.000    -63 -54 -3
0.000
0.000  6.19
8
9
0.001
0.000  5.530.003 0.000 6.19
0.001     0.000
0.000       5.96     -30 0.000 -18 -27 9
5.53
5.36    -33
0.000    -30 -33 -18
36
5.96                        5.360.004 0.000 5.84                          -27 -45 42 9
36 0.000     9
ˆ
80                                                                                 0.005   2                   0.058                   0.000                5.27

p ( y | c T   0)
0.015
0.015  5.84
1
1
0.166
0.166  5.270.022 0.000 5.44
0.036
0.000
0.000       5.32     -45 0.000
4.97
4.87      42
0.000     9
48 27 24
36 -27 42
0.5   1           1.5     2   2.5                                                    5.44                        4.97 0.000                                48 27 24
5.32                        4.87 0.000                                36 -27 42
Design matrix
Student's t-distribution
•   first described by William Sealy Gosset, a statistician at the Guinness brewery at Dublin
•   t-statistic is a signal-to-noise measure: t = effect / standard deviation

•   t-distribution is an approximation to the normal distribution for small samples

•   t-contrasts are simply combinations of the betas
     the t-statistic does not depend on the scaling of the regressors or on the scaling of
the contrast

•   Unilateral test:             H 0 : cT   0                   vs.           H1 : cT   0
0.4

n =1
0.35
n =2
n =5
0.3                                                                        n =10
n= 
0.25

0.2

0.15

0.1

0.05

0
-5       -4      -3      -2      -1      0       1       2      3   4           5

Probability density function of Student’s t distribution
F-test: the extra-sum-of-squares principle
Model comparison: Full vs. reduced model
Null Hypothesis H0: True model is X0 (reduced model)

X0        X1                          X0               F-statistic: ratio of unexplained
variance under X0 and total
unexplained variance under the
full model

 ˆ full
e2                 ˆ2
ereduced
ESS
F     ~ Fn 1 ,n 2
n1 = rank(X) – rank(X0)
Full model (X0 + X1)?          Or reduced model?              n2 = N – rank(X)
F-test: multidimensional contrasts – SPM{F}
Tests multiple linear hypotheses:
H0: True model is X0         H0: 3 = 4 = ... = 9 = 0   test H0 : cT = 0 ?
X0     X1 (3-9)         X0                     00100000
00010000
00001000
cT =
00000100
00000010
00000001

SPM{F6,322}

Full model?       Reduced model?
F-contrast in SPM

ResMS image
beta_???? images
ˆ ˆ
eT e
ˆ
  ( X T X ) 1 X T y          
ˆ 2

Np

ess_???? images            spmF_???? images

F-test example: movement related effects
To assess       movement-related
activation:
There is a lot of residual
movement-related artifact in the
data (despite spatial realignment),
which tends to be concentrated
near the boundaries of tissue
types.
By including the realignment
parameters in our design matrix,
we can “regress out” linear
components            of       subject
movement, reducing the residual
error, and hence improve our
statistics for the effects of interest.
Differential F-contrasts

Think of it as constructing 3 regressors from the 3 differences and complement
this new design matrix such that data can be fitted in the same exact way (same
error, same fitted data).
F-test: a few remarks
• F-tests can be viewed as testing for the additional variance explained by a
larger model wrt. a simpler (nested) model  model comparison
• F tests a weighted sum of squares of one or several combinations of the
regression coefficients .
• In practice, partitioning of X into [X0 X1] is done by multidimensional contrasts.

• Hypotheses:
1   0   0   0
0   1   0   0
Null hypothesis H0:            β1 = β2 = ... = βp = 0
             
0           0

0   1
   Alternative hypothesis H1:     At least one βk ≠ 0
0   0   0   0

• F-tests are not directional:
When testing a uni-dimensional contrast with an F-test, for example 1 – 2, the
result will be the same as testing 2 – 1.
It will be exactly the square of the t-test, testing for both positive and negative
effects.
Example: a suboptimal model

True signal and observed signal (--)

Model (green, pic at 6sec)
TRUE signal (blue, pic at 3sec)

Fitting (1 = 0.2, mean = 0.11)

Residual (still contains some signal)

 Test for the green regressor not significant
Example: a suboptimal model

1 = 0.22
2 = 0.11           Residual Var.= 0.3

p(Y| b1 = 0) 
p-value = 0.1
(t-test)

=               +

p(Y| b1 = 0) 
p-value = 0.2
(F-test)
Y        X             e
A better model

True signal + observed signal

Model (green and red)
and true signal (blue ---)
Red regressor : temporal derivative of
the green regressor

Total fit (blue)
and partial fit (green & red)

Residual (a smaller variance)

 t-test of the green regressor significant
 F-test very significant
 t-test of the red regressor very significant
A better model

1 = 0.22
2 = 2.15
3 = 0.11

Residual Var. = 0.2

p(Y| b1 = 0) 
p-value = 0.07
=               +              (t-test)

p(Y| b1 = 0, b2 = 0) 
p-value = 0.000001
(F-test)
Y       X              e
Correlation among regressors
y

x2*   x2
x1

y  x11  x2  2  e                       y  x11  x2  2  e
* *

1   2  1                                1  1;  2*  1

Correlated regressors =                   When x2 is orthogonalized with
explained variance is shared              regard to x1, only the parameter
between regressors                        estimate for x1 changes, not that
for x2!
Design orthogonality

• For each pair of columns of the design
matrix, the orthogonality matrix depicts the
magnitude of the cosine of the angle
between them, with the range 0 to 1 mapped
from white to black.

• The cosine of the angle between two vectors
a and b is obtained by:

ab
cos  
ab

• If both vectors have zero mean then the
cosine of the angle between the vectors is
the same as the correlation between the
two variates.
Correlated regressors

True signal

Model (green and red)

Fit (blue : total fit)

Residual
Correlated regressors

1 = 0.79
2 = 0.85            Residual var. = 0.3
3 = 0.06
p(Y| b1 = 0) 
p-value = 0.08
(t-test)

=                +             P(Y| b2 = 0) 
p-value = 0.07
(t-test)

p(Y| b1 = 0, b2 = 0) 
Y           X           e       p-value = 0.002
(F-test)
1

2

1    2
After orthogonalisation

True signal

Model (green and red)
red regressor has been
orthogonalised with respect to the green
one
 remove everything that correlates with
the green regressor

Fit (does not change)

Residuals (do not change)
After orthogonalisation
1 = 1.47   (0.79)
2 = 0.85   (0.85)       Residual var. = 0.3
3 = 0.06   (0.06)
p(Y| b1 = 0)
does
p-value = 0.0003      change

(t-test)

p(Y| b2 = 0)
=                    +                                  does
p-value = 0.07       not
change
(t-test)

p(Y| b1 = 0, b2 = 0)   does
p-value = 0.002       not
Y            X                  e                          change
(F-test)

1
2
1   2
Design efficiency
• The aim is to minimize the standard error of a t-contrast                           ˆ
cT 
(i.e. the denominator of a t-statistic).                              T 
ˆ                                                           ˆ
var(c T  )
var(c  )   2 c T ( X T X ) 1 c
T
ˆ

• This is equivalent to maximizing the efficiency ε:

e ( 2 , c, X )  ( 2cT ( X T X ) 1 c) 1
ˆ               ˆ

Noise variance            Design variance

• If we assume that the noise variance is independent of the specific design:
NB: efficiency
e ( c, X )  ( c ( X X ) c )
T   T      1      1       depends on design
matrix and the chosen
contrast !

• This is a relative measure: all we can say is that one design is more efficient than
another (for a given contrast).
Design efficiency
e ( c, X )  ( c ( X X ) c )
T         T       1   1

• XTX is the covariance matrix of the regressors in the design matrix
• efficiency decreases with increasing covariance
• but note that efficiency differs across contrasts

 1     0.9
X X 
T

 0.9   1 

cT = [1 0]         → ε = 5.26
cT = [1 1]         → ε = 20
cT = [1 -1]        → ε = 1.05
Example: working memory
A                              B                              C
Stimulus         Response          Stimulus       Response        Stimulus       Response
Time (s)

Correlation = -.65              Correlation = +.33               Correlation = -.24
Efficiency ([1 0]) = 29          Efficiency ([1 0]) = 40         Efficiency ([1 0]) = 47

•      A: Response follows each stimulus with (short) fixed delay.
•      B: Jittering the delay between stimuli and responses.
•      C: Requiring a response only for half of all trials (randomly chosen).
Thank you

```
To top