Approximate Bayesian Computation (PDF)

Document Sample
Approximate Bayesian Computation (PDF) Powered By Docstoc
					Approximate Bayesian Computation

                     e
          Simon Tavar´, USC
     IMA Lecture #5, September 18 2003
                     Outline

• Four examples on different time scales
• Stochastic computation
• Simulating posterior distributions
• MCMC without likelihood ratios

• Open problems
Ex. 1. The Origin of Primates
Primate Phylogeny
Ex. 2. Human Prehistory: mitochondrial Eve
• Mitochondrial DNA: ‘Out of Africa’
     – Amerindians (Ryk Ward and collaborators)
     – 63 HVR I sequences, 360bp long
• Microsatellites

     – Rosenberg et al. (2002), Science
     – 377 autosomal microsatellites, 1056 individuals,
       52 populations
 Ex. 3. Genome-wide patterns of SNP
             variation

• Nucleotide diversity across genome

• looking for signs of selection
• LD across genome
• LD mapping
Ex. 4. Cancer genomics
Estimating the age of a tumor
              Statistical aspects

• Highly dependent data
• Dependence caused by tree or graph linking
  observations
• Explicit theory hard to come by . . .

• . . . computational inference methods essential
Stochastic computation in evolutionary
              genetics

• Many different approaches have been employed:

    – Rejection
    – Importance sampling
    – Sequential importance sampling with resampling
    – Markov chain Monte Carlo
    – Metropolis-coupled MCMC
    – Perfect sampling: coupling from the past
Stochastic computation in evolutionary
              genetics

• Many different approaches have been employed:

    – Rejection
    – Importance sampling
    – Sequential importance sampling with resampling
    – Markov chain Monte Carlo
    – Metropolis-coupled MCMC
    – Perfect sampling: coupling from the past
 Introduction to Bayesian computation

• Data D, prior π(·)
• Want to generate observations from posterior
  distribution f (θ | D)
• Bayes Theorem gives

                f (θ | D) ∝ IP(D | θ)π(θ)
                Rejection methods I

  1. Generate θ from π(·)
  2. Accept θ with probability IP(D | θ); return to 1

Accepted observations have distribution f (θ | D)
        Complex stochastic models

• A stochastic process often underlies the likelihood
  computation
• This process may be extremely complex, making
  explicit probability calculations difficult or impossible
• Thus IP(D | θ) may be uncomputable (either quickly
  or theoretically)
                Rejection methods II

The process is often easy to simulate, so:

  1. Generate θ from π(·)
  2. Simulate D from stochastic model with parameter θ
  3. Accept θ if D = D; return to 1
                Rejection methods II

The process is often easy to simulate, so:

  1. Generate θ from π(·)
  2. Simulate D from stochastic model with parameter θ
  3. Accept θ if D = D; return to 1

Do we ever hit the target?
  Approximate Bayesian Computation I

1. Generate θ from π(·)
2. Simulate D from stochastic model with parameter θ
3. Calculate distance ρ(D, D ) between D and D
4. Accept θ if ρ ≤ ; return to 1
• If   → ∞, generates from prior
• If   = 0, generates from f (θ|D)


• Choice of reflects tension between computability
  and accuracy
• Method is honest: you get observations from
  f (θ | ρ(D, D ) ≤ )
 Approximate Bayesian Computation II

• The limit   = 0 reproduces the data precisely
• In many examples the data are too high-dimensional
• . . . so reduce dimension by using summary statistics
               Motivation: sufficiency

If S is sufficient for θ, then f (θ|D) = f (θ|S)
Inference method can be considerably simplified, e.g.
rejection method via

                   f (θ|S) ∝ IP(S|θ)π(θ)

Puts a premium on finding decent summary statistics

   • A systematic, implementable approach? (LeCam
     (1963))
   • Estimate distance between f (θ|D) and f (θ|S) given
     a measure of how far from sufficient S is for θ
        Combine summaries and rejection

T et al. (1997), Fu and Li (1997), Weiss and von Haeseler
(1998), Pritchard et al. (1999)
Choose statistics S = (S1 , . . . , Sp ) to summarize D

   1. Generate θ from π(·)
   2. Simulate D , calculate s
   3. Accept θ if ρ(s , s) ≤ ; return to 1
     An example from population genetics

The coalescent is a useful tool for interpreting patterns of
molecular variation
Ingredients:

   • Biological model for relationships among n sampled
     sequences (tree or graph; Λ, T )
   • Parameters of interest (mutation, migration,
     recombination, . . . rates θ)
   • Aim: generate observations from f (θ|D) or from
     f (θ, Λ, T |D)
Probability model: branching Markov process
                 Ward’s mtDNA data

Yakima n = 42, 31 SNPs
Interested (for this talk) in inference about

   • the scaled mutation parameter θ

   • the TMRCA T of sample ( = height of tree)
   • T has prior mean 1.95
     Inference in population genetics

• In classical frequentist setting, most estimators are
  inconsistent
• Even in simple cases where do have consistency, rates
  of convergence are of order log n


• Inference about T (an unobserved random variable)
  makes Bayesian approach more sensible
       Results


Acceptance rate   0.6%

TMRCA T
1st quartile      1.07
mean              1.71
median            1.49
3rd quartile      2.11
    Advantages and disadvantages of ABC

Pros:




   • Usually easy to code
   • Generates independent observations (can use
     embarrassingly parallel computation)
   • Can be used to estimate Bayes factors directly
Cons:

   • For complex probability models, sampling from prior
     does not make good use of accepted observations
   • Q: a scheme that combines generation of perfect
     observations with use of existing sample?
ABC III: Generalization of rejection method

Beaumont, Zhang and Balding (2002), Genetics


A classic in the frequentist world that uses implicit
probability models to estimate likelihoods is
Diggle and Gratton (1984), JRSSB
        Reminder: standard MCMC

1. Now at θ
2. Propose move to θ according to q(θ → θ )
3. Calculate
                       IP(D|θ )π(θ )q(θ → θ)
          h = min 1,
                        IP(D|θ)π(θ)q(θ → θ )

4. Accept θ with probability h, else return θ
MCMC in evolutionary genetics setting




• Small tweaks in the biology often translate into huge
  changes in algorithm
• Long development time
• All the usual problems with convergence
• Almost all the effort goes into evaluation of likelihood
A bad sampler
      (Yet) another MCMC approach

1. Now at θ
2. Propose a move to θ according to q(θ → θ )
3. Generate D using θ
4. If D = D, go to next step, else return θ

5. Calculate
                                 π(θ )q(θ → θ)
         h = h(θ, θ ) = min 1,
                                 π(θ)q(θ → θ )

6. Accept θ with probability h, else return θ
Proposition The stationary distribution of the chain is
indeed f (θ | D)
Proof Denote the transition mechanism of the chain by
r(θ → θ )
Choose θ = θ satisfying

                    π(θ )q(θ → θ)
                                  ≤1
                    π(θ)q(θ → θ )
                 f (θ | D)r(θ → θ )

= f (θ | D)q(θ → θ )IP(D | θ )h(θ, θ )
  IP(D | θ)π(θ)                        π(θ )q(θ → θ)
=                 q(θ → θ )IP(D | θ )
       IP(D)                           π(θ)q(θ → θ )
  IP(D | θ )π(θ )
=                 {q(θ → θ)IP(D | θ)}
        IP(D)
= f (θ | D)q(θ → θ)IP(D | θ)h(θ , θ)
= f (θ | D)r(θ → θ)
              Practical version: ABC

Data D, summary statistics S

  4 . If ρ(D , D) ≤ , go to next step, otherwise return θ
 4 . If ρ(S , S) ≤ , go to next step, otherwise return θ

for some suitable metric ρ and approximation level


Observations now from f (θ|ρ(D , D) ≤ ) or
f (θ|ρ(S , S) ≤ )
             Variations on ABC

• What if D is not discrete?
     – Use previous method (binning)
     – Use simulation approach to estimate the
       likelihood terms in the Hastings ratio


• If the underlying probability model is complex,
  simulating data will not often lead to acceptance.
  Thus need update for parts of the probability model


• These methods can often be started at stationarity,
  so no burn-in
1. Now at (θ, T )
2. Propose a move to (θ , T ) according to
   q((θ, T ) → (θ , T ))
3. Simulate D using (θ , T )
4. If D = D, go to next step, else return (θ, T ) (or use
   ABC)
5. Calculate
                      π(θ , T )q((θ , T ) → (θ, T ))
        h = min 1,
                      π(θ, T )q((θ, T ) → (θ , T ))

6. Accept (θ , T ) with probability h, else return (θ, T ).
       Mitochondrial example revisited

H = number of haplotypes
V = number of SNPs

   • Yakima n = 42, V = 31, H = 20

   • Nuu Chah Nulth n = 63, V = 26, H = 28
Interested in inference about

   • the scaled mutation parameter θ
   • the TMRCA T of sample ( = height of tree)

   • T has prior mean 1.95

From MCMC methods
     Yakima          IE(T |D) = 0.72
 Nuu Chah Nulth      IE(T |D) = 0.68
Yakima                   Estimated
S = V, = 0        Rej.      like.    No like.

Acceptance rate   0.6%     65%        3.9%

TMRCA T
1st quartile      1.07     1.05       1.09
mean              1.71     1.70       1.72
median            1.49     1.45       1.49
3rd quartile      2.11     2.07       2.15
S = (V, H), = 0   Rejection   No likelihood

Acceptance rate    0.03%         0.46%

TMRCA T
1st quartile        0.73          0.74
mean                1.01          1.03
median              0.93          0.94
3rd quartile        1.22          1.24
NCN S = (V, H)                Estimated
 =2               Rejection   likelihood   No likelihood
Acceptance rate   0.0008%       16.9%          0.2%

TMRCA T
1st quartile        0.51        0.50           0.54
mean                0.69        0.67           0.70
median              0.64        0.63           0.66
3rd quartile        0.81        0.80           0.81
          The effects of

NCN S = (V, H)     =2      =1       =0
Acceptance rate   0.2%   0.04%   0.005%
TMRCA T
1st quartile      0.54   0.49     0.46
mean              0.70   0.64     0.59
median            0.66   0.60     0.55
3rd quartile      0.81   0.74     0.69
                        Summary

In the population genetics setting, there are many existing
MCMC algorithms that explore the space of θ, Λ, T .
These can be used immediately to address harder problems,
by replacing ‘peeling’ by simulation

   • Complex mutation models (e.g. dependent sites in a
     sequence)
   • RFLPs: Population expansion, multiple cutters,
     mutation model
   • Ascertainment (e.g. SNP panels)
                 Current work

• Testing on complex problems (e.g. subdivision and
  multiple loci, population history)
• Mapping disease genes using ancestral recombination
  graph
• Exploring connection with other auxiliary variable
  methods
• Exploring projection pursuit for combining summary
  statistics
                Acknowledgements

   • (MCMC, statistical genetics) Duncan Thomas, John
     Molitor
   • (MCMC) Paul Marjoram, Vincent Plagnol, Lada
     Markovtsova
   • (Population genetics) Magnus Nordborg, Noah
     Rosenberg
   • (Projection pursuit) Peter Calabrese

                                                    e
Paper: P. Marjoram, J. Molitor, V. Plagnol, S. Tavar´
(2003) Markov chain Monte Carlo without likelihoods,
Proc. Natl. Acad. Sci., in press.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:8/15/2011
language:English
pages:50