Document Sample

Bayesian models for fMRI data Klaas Enno Stephan Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich Functional Imaging Laboratory (FIL) Wellcome Trust Centre for Neuroimaging University College London With many thanks for slides & images to: The Reverend Thomas Bayes (1702-1761) FIL Methods group, particularly Guillaume Flandin Methods & models for fMRI data analysis 19 November 2008 Why do I need to learn about Bayesian stats? Because SPM is getting more and more Bayesian: • Segmentation & spatial normalisation • Posterior probability maps (PPMs) – 1st level: specific spatial priors – 2nd level: global spatial priors • Dynamic Causal Modelling (DCM) • Bayesian Model Selection (BMS) • EEG: source reconstruction Bayesian segmentation Spatial priors Posterior probability Dynamic Causal and normalisation on activation extent maps (PPMs) Modelling Image time-series Statistical parametric map (SPM) Kernel Design matrix Realignment Smoothing General linear model Statistical Gaussian inference field theory Normalisation Template p <0.05 Parameter estimates Problems of classical (frequentist) statistics p-value: probability of getting the observed data in the effect’s absence. If small, reject null hypothesis that there is no effect. H0 : 0 Probability of observing the data y, p( y | H 0 ) given no effect ( = 0). Limitations: One can never accept the null hypothesis Given enough data, one can always demonstrate a significant effect Correction for multiple comparisons necessary Solution: infer posterior probability of the effect p( | y ) Probability of the effect, given the observed data Overview of topics • Bayes' rule • Bayesian update rules for Gaussian densities • Bayesian analyses in SPM5 – Segmentation & spatial normalisation – Posterior probability maps (PPMs) • 1st level: specific spatial priors • 2nd level: global spatial priors – Bayesian Model Selection (BMS) Bayes in motion - an animation Bayes’ rule Given data y and parameters , the conditional probabilities are: p( y, ) p( y, ) p( | y ) p( y | ) p( y ) p( ) Eliminating p(y,) gives Bayes’ rule: Likelihood Prior p( y | ) p( ) Posterior P( | y ) p( y ) Evidence Principles of Bayesian inference Formulation of a generative model likelihood p(y|) prior distribution p() Observation of data y Update of beliefs based upon observations, given a prior state of knowledge p( | y ) p( y | ) p( ) Posterior mean & variance of univariate Gaussians Likelihood & Prior y p ( y | ) N ( y; , ) 2 e p( ) N ( ; p , p ) 2 Posterior Posterior: p( | y) N ( ; , ) 2 1 1 1 Likelihood 2 2 2 e p p Prior 1 2 12 p 2 e p Posterior mean = variance-weighted combination of prior mean and data mean Same thing – but expressed as precision weighting Likelihood & prior y p ( y | ) N ( y; , ) 1 e p( ) N ( ; p , 1 ) p Posterior Posterior: p( | y ) N ( ; , 1 ) Likelihood e p p Prior e p p Relative precision weighting Same thing – but explicit hierarchical perspective Likelihood & Prior y (1) (1) p( y | ) N ( y; ,1 / ) (1) (1) (1) (1) ( 2) ( 2) p( (1) ) N ( (1) ; ( 2 ) ,1 / ( 2 ) ) (1) Posterior Posterior Likelihood p ( (1) | y ) N ( ; ,1 / ) (1) (1) ( 2 ) ( 2) Prior (1) (1) ( 2 ) ( 2 ) Relative precision weighting Bayesian GLM: univariate case Normal densities Univariate p( ) N ( ; p , p ) 2 linear y x e model | y p ( y | ) N ( y;x, e2 ) x p( | y) N ( ; | y ,2| y ) p 1 x2 1 2 |y 2 e p 2 x 1 | y 2 y 2 p 2 |y p e Relative precision weighting Bayesian GLM: multivariate case Normal densities General Linear y Xθ e p (θ) N (θ; η p , C p ) Model p (y | θ) N (y; Xθ, Ce ) p (θ | y ) N (θ; η | y , C | y ) 2 1 1 1 C | y XT Ce X C p 1 η | y C | y XT Ce y C p η p One step if Ce is known. 1 Otherwise iterative estimation with EM. An intuitive example 10 5 2 0 -5 Prior Likelihood -10 Posterior -10 -5 0 5 10 1 Less intuitive 10 5 2 0 -5 Prior Likelihood -10 Posterior -10 -5 0 5 10 1 Even less intuitive 10 Prior Likelihood Posterior 5 2 0 -5 -10 -10 -5 0 5 10 1 Bayesian (fixed effects) group analysis Likelihood distributions from different Under Gaussian assumptions this is subjects are independent easy to compute: one can use the posterior from one group individual subject as the prior for the next posterior posterior covariance covariances p( | y1 ) p( y1 | ) p( ) N p( | y1 , y2 ) p( y2 | ) p( y1 | ) p( ) C 1 | y1 ,..., y N C|1 i y i 1 p( y2 | ) p( | y1 ) N 1 ... | y ,..., y C | yi | yi C | y1 ,..., y N p( | y1 ,..., y N ) p( y N | ) p( | y N 1 )...p( | y1 ) i 1 1 N group individual posterior “Today’s posterior is tomorrow’s prior” posterior covariances and means mean Bayesian analyses in SPM5 • Segmentation & spatial normalisation • Posterior probability maps (PPMs) – 1st level: specific spatial priors – 2nd level: global spatial priors • Dynamic Causal Modelling (DCM) • Bayesian Model Selection (BMS) • EEG: source reconstruction Spatial normalisation: Bayesian regularisation Deformations consist of a linear combination of smooth basis functions lowest frequencies of a 3D discrete cosine transform. Find maximum a posteriori (MAP) estimates: simultaneously minimise – squared difference between template and source image – squared difference between parameters and their priors Deformation parameters MAP: log p( | y) log p( y | ) log p( ) log p( y) “Difference” between template Squared distance between parameters and and source image their expected values (regularisation) Bayesian segmentation with empirical priors • Goal: for each voxel, compute p (tissue| intensity) probability that it belongs to a particular tissue type, given its p (intensity | tissue) ∙ p (tissue) intensity • Likelihood model: Intensities are modelled by a mixture of Gaussian distributions representing different tissue classes (e.g. GM, WM, CSF). • Priors are obtained from tissue probability maps (segmented images of 151 subjects). Ashburner & Friston 2005, NeuroImage Unified segmentation & normalisation • Circular relationship between segmentation & normalisation: – Knowing which tissue type a voxel belongs to helps normalisation. – Knowing where a voxel is (in standard space) helps segmentation. • Build a joint generative model: – model how voxel intensities result from mixture of tissue type distributions – model how tissue types of one brain have to be spatially deformed to match those of another brain • Using a priori knowledge about the parameters: adopt Bayesian approach and maximise the posterior probability Ashburner & Friston 2005, NeuroImage Bayesian fMRI analyses General Linear Model: y X with ~ N (0, C ) What are the priors? • In “classical” SPM, no priors (= “flat” priors) • Full Bayes: priors are predefined on a principled or empirical basis • Empirical Bayes: priors are estimated from the data, assuming a hierarchical generative model PPMs in SPM Parameters of one level = priors for distribution of parameters at lower level Parameters and hyperparameters at each level can be estimated using EM Posterior Probability Maps (PPMs) Posterior distribution: probability of the effect given the data p( | y ) mean: size of effect precision: variability Posterior probability map: images of the probability (confidence) that an activation exceeds some specified threshold, given the data y p( | y ) p( | y ) Two thresholds: • activation threshold : percentage of whole brain mean signal (physiologically relevant size of effect) • probability that voxels must exceed to be displayed (e.g. 95%) PPMs vs. SPMs p( | y) p( y | ) p( ) PPMs Posterior Likelihood Prior SPMs u t f ( y) Bayesian test: p( | y) Classical t-test: p(t u | 0) 2nd level PPMs with global priors 1st level (GLM): y X (1) (1) (1) p( ) N (0, C ) (1) ( 2) ( 2) p( ) N (0, C ) 2nd level (shrinkage prior): 0 ( 2) Basic idea: use the variance of over voxels as prior variance of at any particular voxel. p( ) 2nd level: (2) = average effect over voxels, (2) = voxel-to-voxel variation. 0 (1)reflects regionally specific effects assume that it sums to zero over all voxels In the absence of evidence shrinkage prior at the second level to the contrary, parameters variance of this prior is implicitly estimated will shrink to zero. by estimating (2) Shrinkage Priors Small & variable effect Large & variable effect Small but clear effect Large & clear effect 2nd level PPMs with global priors 1st level (GLM): y X (1) p( ) N (0, C ) voxel-specific 2nd level (shrinkage prior): 0 ( 2) p( ) N (0, C ) global pooled estimate We are looking for the same Once Cε and C are known, we can effect over multiple voxels apply the usual rule for computing the posterior mean & covariance: Pooled estimation of C over voxels C | y X T C1 X C1 1 Friston & Penny 2003, NeuroImage m | y C | y X T C1 y PPMs and multiple comparisons No need to correct for multiple comparisons: Thresholding a PPM at 95% confidence: in every voxel, the posterior probability of an activation is 95%. At most, 5% of the voxels identified could have activations less than . Independent of the search volume, thresholding a PPM thus puts an upper bound on the false discovery rate. PPMs vs.SPMs rest [2.06] rest contrast(s) contrast(s) < < < < 3 4 SPMmip SPMmip 1 [0, 0, 0] 1 [0, 0, 0] 4 4 7 7 10 10 13 13 16 16 19 19 PPM 2.06 22 SPM{T39.0} 22 25 < 25 28 < 28 31 31 34 34 37 37 40 40 43 43 SPMresults: C:\home\spm\analysis_PET 46 SPMresults: C:\home\spm\analysis_PET 46 49 49 52 Height threshold P = 0.95 52 Height threshold T = 5.50 55 55 Extent threshold k = 0 voxels Extent threshold k = 0 voxels 60 60 1 4 7 10 13 16 19 22 1 4 7 10 13 16 19 22 Design matrix Design matrix PPMs: Show activations SPMs: Show voxels greater than a given with non-zeros size activations PPMs: pros and cons Advantages Disadvantages • One can infer that a • Estimating priors over cause did not elicit a voxels is response computationally demanding • Inference is independent of search • Practical benefits are volume yet to be established • SPMs conflate effect- • Thresholds other than size and effect- zero require variability justification 1st level PPMs with local spatial priors • Neighbouring voxels often not independent • Spatial dependencies vary across the brain • But spatial smoothing in SPM is uniform • Matched filter theorem: SNR maximal when smoothing the data with a kernel which Contrast map matches the smoothness of the true signal • Basic idea: estimate regional spatial dependencies from the data and use this as a prior in a PPM regionally specific smoothing markedly increased sensitivity AR(1) map Penny et al. 2005, NeuroImage The generative spatio-temporal model q1 q2 r1 r2 K p α p k P k 1 p (β) p ( p ) p 1 p k Ga k ; q1 , q2 p ( p ) Ga ( p ; r1 , r2 ) u1 u2 p W p w N K p λ p n T P k p ( A ) p (a p ) k 1 n 1 p 1 p n Ga n ; u1 , u2 k p w T N w T ; 0, k1 ST S k 1 p (a p ) N a p ; 0, p 1 (ST S) 1 W A = spatial precision of parameters Y = observation noise precision = precision of AR coefficients Penny et al. 2005, NeuroImage Y=XW+E The spatial prior Prior for k-th parameter: T k p w N w ;0, T k 1 k S S T 1 Shrinkage Spatial precision: Spatial prior determines the amount of kernel matrix smoothness Different choices possible for spatial kernel matrix S. Currently used in SPM: Laplacian prior (same as in LORETA) Example: application to event-related fMRI data Smoothing Contrast maps for familiar vs. non-familiar faces, obtained with - smoothing - global spatial prior - Laplacian prior Global prior Laplacian Prior SPM5 graphical user interface Bayesian model selection (BMS) Given competing hypotheses on structure & functional mechanisms of a system, which model is the best? Which model represents the best balance between model fit and model complexity? For which model m does p(y|m) become maximal? Pitt & Miyung (2002), TICS Bayesian model selection (BMS) p( y | , m) p( | m) Bayes’ rules: p( | y, m) p( y | m) Model evidence: p( y | m) p( y | , m) p( | m) d accounts for both accuracy and complexity of the model allows for inference about structure (generalisability) of the model Various approximations, e.g.: Model comparison via Bayes factor: - negative free energy p ( y | m1 ) - AIC BF - BIC p( y | m2 ) Penny et al. (2004) NeuroImage Example: BMS of dynamic causal models attention M1 M2 modulation of back- PPC PPC ward or forward M2 better than M1 attention connection? BF = 2966 stim V1 V5 stim V1 V5 additional driving M3 attention effect of attention PPC on PPC? BF = 12 M3 better than M2 stim V1 V5 bilinear or nonlinear M4 attention PPC modulation of BF = 23 forward connection? M4 better than M3 stim V1 V5 Stephan et al. (2008) NeuroImage Thank you

DOCUMENT INFO

Shared By:

Categories:

Tags:
DCM PRECISION, DCM Précision, PRECISION co, Voray sur l'ognon, Benchmark Group, magazine féminin, plastic moulds, DCM Ltd, software business, the meeting

Stats:

views: | 15 |

posted: | 3/28/2011 |

language: | English |

pages: | 39 |

OTHER DOCS BY nikeborome

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.