# Approximate Bayesian Computation (PDF)

Document Sample

```					Approximate Bayesian Computation

e
Simon Tavar´, USC
IMA Lecture #5, September 18 2003
Outline

• Four examples on diﬀerent time scales
• Stochastic computation
• Simulating posterior distributions
• MCMC without likelihood ratios

• Open problems
Ex. 1. The Origin of Primates
Primate Phylogeny
Ex. 2. Human Prehistory: mitochondrial Eve
• Mitochondrial DNA: ‘Out of Africa’
– Amerindians (Ryk Ward and collaborators)
– 63 HVR I sequences, 360bp long
• Microsatellites

– Rosenberg et al. (2002), Science
– 377 autosomal microsatellites, 1056 individuals,
52 populations
Ex. 3. Genome-wide patterns of SNP
variation

• Nucleotide diversity across genome

• looking for signs of selection
• LD across genome
• LD mapping
Ex. 4. Cancer genomics
Estimating the age of a tumor
Statistical aspects

• Highly dependent data
• Dependence caused by tree or graph linking
observations
• Explicit theory hard to come by . . .

• . . . computational inference methods essential
Stochastic computation in evolutionary
genetics

• Many diﬀerent approaches have been employed:

– Rejection
– Importance sampling
– Sequential importance sampling with resampling
– Markov chain Monte Carlo
– Metropolis-coupled MCMC
– Perfect sampling: coupling from the past
Stochastic computation in evolutionary
genetics

• Many diﬀerent approaches have been employed:

– Rejection
– Importance sampling
– Sequential importance sampling with resampling
– Markov chain Monte Carlo
– Metropolis-coupled MCMC
– Perfect sampling: coupling from the past
Introduction to Bayesian computation

• Data D, prior π(·)
• Want to generate observations from posterior
distribution f (θ | D)
• Bayes Theorem gives

f (θ | D) ∝ IP(D | θ)π(θ)
Rejection methods I

1. Generate θ from π(·)
2. Accept θ with probability IP(D | θ); return to 1

Accepted observations have distribution f (θ | D)
Complex stochastic models

• A stochastic process often underlies the likelihood
computation
• This process may be extremely complex, making
explicit probability calculations diﬃcult or impossible
• Thus IP(D | θ) may be uncomputable (either quickly
or theoretically)
Rejection methods II

The process is often easy to simulate, so:

1. Generate θ from π(·)
2. Simulate D from stochastic model with parameter θ
3. Accept θ if D = D; return to 1
Rejection methods II

The process is often easy to simulate, so:

1. Generate θ from π(·)
2. Simulate D from stochastic model with parameter θ
3. Accept θ if D = D; return to 1

Do we ever hit the target?
Approximate Bayesian Computation I

1. Generate θ from π(·)
2. Simulate D from stochastic model with parameter θ
3. Calculate distance ρ(D, D ) between D and D
4. Accept θ if ρ ≤ ; return to 1
• If   → ∞, generates from prior
• If   = 0, generates from f (θ|D)

• Choice of reﬂects tension between computability
and accuracy
• Method is honest: you get observations from
f (θ | ρ(D, D ) ≤ )
Approximate Bayesian Computation II

• The limit   = 0 reproduces the data precisely
• In many examples the data are too high-dimensional
• . . . so reduce dimension by using summary statistics
Motivation: suﬃciency

If S is suﬃcient for θ, then f (θ|D) = f (θ|S)
Inference method can be considerably simpliﬁed, e.g.
rejection method via

f (θ|S) ∝ IP(S|θ)π(θ)

Puts a premium on ﬁnding decent summary statistics

• A systematic, implementable approach? (LeCam
(1963))
• Estimate distance between f (θ|D) and f (θ|S) given
a measure of how far from suﬃcient S is for θ
Combine summaries and rejection

T et al. (1997), Fu and Li (1997), Weiss and von Haeseler
(1998), Pritchard et al. (1999)
Choose statistics S = (S1 , . . . , Sp ) to summarize D

1. Generate θ from π(·)
2. Simulate D , calculate s
3. Accept θ if ρ(s , s) ≤ ; return to 1
An example from population genetics

The coalescent is a useful tool for interpreting patterns of
molecular variation
Ingredients:

• Biological model for relationships among n sampled
sequences (tree or graph; Λ, T )
• Parameters of interest (mutation, migration,
recombination, . . . rates θ)
• Aim: generate observations from f (θ|D) or from
f (θ, Λ, T |D)
Probability model: branching Markov process
Ward’s mtDNA data

Yakima n = 42, 31 SNPs
Interested (for this talk) in inference about

• the scaled mutation parameter θ

• the TMRCA T of sample ( = height of tree)
• T has prior mean 1.95
Inference in population genetics

• In classical frequentist setting, most estimators are
inconsistent
• Even in simple cases where do have consistency, rates
of convergence are of order log n

• Inference about T (an unobserved random variable)
makes Bayesian approach more sensible
Results

Acceptance rate   0.6%

TMRCA T
1st quartile      1.07
mean              1.71
median            1.49
3rd quartile      2.11

Pros:

• Usually easy to code
• Generates independent observations (can use
embarrassingly parallel computation)
• Can be used to estimate Bayes factors directly
Cons:

• For complex probability models, sampling from prior
does not make good use of accepted observations
• Q: a scheme that combines generation of perfect
observations with use of existing sample?
ABC III: Generalization of rejection method

Beaumont, Zhang and Balding (2002), Genetics

A classic in the frequentist world that uses implicit
probability models to estimate likelihoods is
Diggle and Gratton (1984), JRSSB
Reminder: standard MCMC

1. Now at θ
2. Propose move to θ according to q(θ → θ )
3. Calculate
IP(D|θ )π(θ )q(θ → θ)
h = min 1,
IP(D|θ)π(θ)q(θ → θ )

4. Accept θ with probability h, else return θ
MCMC in evolutionary genetics setting

• Small tweaks in the biology often translate into huge
changes in algorithm
• Long development time
• All the usual problems with convergence
• Almost all the eﬀort goes into evaluation of likelihood
(Yet) another MCMC approach

1. Now at θ
2. Propose a move to θ according to q(θ → θ )
3. Generate D using θ
4. If D = D, go to next step, else return θ

5. Calculate
π(θ )q(θ → θ)
h = h(θ, θ ) = min 1,
π(θ)q(θ → θ )

6. Accept θ with probability h, else return θ
Proposition The stationary distribution of the chain is
indeed f (θ | D)
Proof Denote the transition mechanism of the chain by
r(θ → θ )
Choose θ = θ satisfying

π(θ )q(θ → θ)
≤1
π(θ)q(θ → θ )
f (θ | D)r(θ → θ )

= f (θ | D)q(θ → θ )IP(D | θ )h(θ, θ )
IP(D | θ)π(θ)                        π(θ )q(θ → θ)
=                 q(θ → θ )IP(D | θ )
IP(D)                           π(θ)q(θ → θ )
IP(D | θ )π(θ )
=                 {q(θ → θ)IP(D | θ)}
IP(D)
= f (θ | D)q(θ → θ)IP(D | θ)h(θ , θ)
= f (θ | D)r(θ → θ)
Practical version: ABC

Data D, summary statistics S

4 . If ρ(D , D) ≤ , go to next step, otherwise return θ
4 . If ρ(S , S) ≤ , go to next step, otherwise return θ

for some suitable metric ρ and approximation level

Observations now from f (θ|ρ(D , D) ≤ ) or
f (θ|ρ(S , S) ≤ )
Variations on ABC

• What if D is not discrete?
– Use previous method (binning)
– Use simulation approach to estimate the
likelihood terms in the Hastings ratio

• If the underlying probability model is complex,
simulating data will not often lead to acceptance.
Thus need update for parts of the probability model

• These methods can often be started at stationarity,
so no burn-in
1. Now at (θ, T )
2. Propose a move to (θ , T ) according to
q((θ, T ) → (θ , T ))
3. Simulate D using (θ , T )
4. If D = D, go to next step, else return (θ, T ) (or use
ABC)
5. Calculate
π(θ , T )q((θ , T ) → (θ, T ))
h = min 1,
π(θ, T )q((θ, T ) → (θ , T ))

6. Accept (θ , T ) with probability h, else return (θ, T ).
Mitochondrial example revisited

H = number of haplotypes
V = number of SNPs

• Yakima n = 42, V = 31, H = 20

• Nuu Chah Nulth n = 63, V = 26, H = 28
Interested in inference about

• the scaled mutation parameter θ
• the TMRCA T of sample ( = height of tree)

• T has prior mean 1.95

From MCMC methods
Yakima          IE(T |D) = 0.72
Nuu Chah Nulth      IE(T |D) = 0.68
Yakima                   Estimated
S = V, = 0        Rej.      like.    No like.

Acceptance rate   0.6%     65%        3.9%

TMRCA T
1st quartile      1.07     1.05       1.09
mean              1.71     1.70       1.72
median            1.49     1.45       1.49
3rd quartile      2.11     2.07       2.15
S = (V, H), = 0   Rejection   No likelihood

Acceptance rate    0.03%         0.46%

TMRCA T
1st quartile        0.73          0.74
mean                1.01          1.03
median              0.93          0.94
3rd quartile        1.22          1.24
NCN S = (V, H)                Estimated
=2               Rejection   likelihood   No likelihood
Acceptance rate   0.0008%       16.9%          0.2%

TMRCA T
1st quartile        0.51        0.50           0.54
mean                0.69        0.67           0.70
median              0.64        0.63           0.66
3rd quartile        0.81        0.80           0.81
The eﬀects of

NCN S = (V, H)     =2      =1       =0
Acceptance rate   0.2%   0.04%   0.005%
TMRCA T
1st quartile      0.54   0.49     0.46
mean              0.70   0.64     0.59
median            0.66   0.60     0.55
3rd quartile      0.81   0.74     0.69
Summary

In the population genetics setting, there are many existing
MCMC algorithms that explore the space of θ, Λ, T .
These can be used immediately to address harder problems,
by replacing ‘peeling’ by simulation

• Complex mutation models (e.g. dependent sites in a
sequence)
• RFLPs: Population expansion, multiple cutters,
mutation model
• Ascertainment (e.g. SNP panels)
Current work

• Testing on complex problems (e.g. subdivision and
multiple loci, population history)
• Mapping disease genes using ancestral recombination
graph
• Exploring connection with other auxiliary variable
methods
• Exploring projection pursuit for combining summary
statistics
Acknowledgements

• (MCMC, statistical genetics) Duncan Thomas, John
Molitor
• (MCMC) Paul Marjoram, Vincent Plagnol, Lada
Markovtsova
• (Population genetics) Magnus Nordborg, Noah
Rosenberg
• (Projection pursuit) Peter Calabrese

e
Paper: P. Marjoram, J. Molitor, V. Plagnol, S. Tavar´
(2003) Markov chain Monte Carlo without likelihoods,
Proc. Natl. Acad. Sci., in press.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 6 posted: 8/15/2011 language: English pages: 50