Docstoc

Testing methods _ comparing models using continuous and

Document Sample
Testing methods _ comparing models using continuous and Powered By Docstoc
					 Testing methods & comparing models using
continuous and discrete character simulation




                 Liam J. Revell
    Motivating example: phylogenetic
     error and comparative analyses
• Phylogenetic comparative analyses often (but not always)
  assume that the tree & data are known without error.
• (An alternative assumption is that the sampling distribution
  obtained from repeating the analysis on a sample of trees
  from their Bayesian posterior distribution will give a unbiased
  estimate of the parameter’s sampling distribution.)
• To test this, I simulated trees & data, and then mutated the
  trees via NNI.
  Nearest neighbor interchange:
                                  Reattach

  Dissolve internal branches




Felsenstein 2004
               Topological error: σ2




Simulation 2
      Topological error: model selection

                                    100
                                                                           OU
                                                                           lambda
                                    80                                     BM
               selected model (%)
                                    60
                                    40
                                    20
                                    0




                                          5   15 25 35 45 55 65 75 85 95
                                                      random NNIs

Simulation 1
    Motivating example: phylogenetic
     error and comparative analyses
• This result shows that topological error introduces non-
  random error (bias) into parameter estimation.
• It also suggests that averaging a posterior sample is also
  biased – and perhaps from some problems a global ML or
  MCC estimate might be better (although I have not simulated
  this).
• Finally, topological error can cause us to prefer more complex
  models of evolution.
                            Outline
• Discrete vs. continuous time simulation:
   – Example 1: Brownian motion simulation.
   – Example 2: Discrete character evolution.
• Alternative methods for quantitative character simulation.
• Visualizing the results of numerical simulations:
   – Projecting the phylogeny into phenotype space.
   – Mapping discrete characters on the tree.
• Uses of simulation in comparative biology.
   – Stochastic character mapping.
   – Phylogenetic Monte Carlo.
   – Testing comparative methods.
• Conclusions.
         Discrete time simulations
• Note that discrete & continuous time have nothing at all to do
  with whether the simulated traits are discrete or continuous.
• In discrete time, we simulate evolution as the result of
  changes additively accumulated over many individual
  timesteps.
• Timesteps may be (but are not necessarily – depending on
  our purposes) organismal generations.
       Example: Brownian motion
• In Brownian motion, changes in a
  phenotypic trait are drawn from a
  normal distribution, centered on
  0.0, with a variance proportional
  to the time elapsed x the rate.
• In a discrete time simulation of
  Brownian motion, every
  “generation” is simulated by a
  draw from a random normal with
  variance σ2dt (or σ2ti for the ith
  generation).
               Example: Brownian motion
    • We can simulate Brownian evolution for a single or multiple
      lineages.




n<-1; t<-100; sig2<-0.1
time<-0:t
X<-rbind(rep(0,n),matrix(rnorm(n*t,sd=sqrt(sig2)),t,n))
Y<-apply(X,2,cumsum)
plot(time,Y[,1],ylim=c(-10,10),xlab="time",ylab="phenotype",type="l")
apply(Y,2,lines,x=time)
text(0.95*t,9,bquote(paste(sigma^2,"=",.(sig2))))
               Example: Brownian motion
    • We can simulate Brownian evolution for a single or multiple
      lineages.




n<-1; t<-100; sig2<-0.1
time<-0:t
X<-rbind(rep(0,n),matrix(rnorm(n*t,sd=sqrt(sig2)),t,n))
Y<-apply(X,2,cumsum)
plot(time,Y[,1],ylim=c(-10,10),xlab="time",ylab="phenotype",type="l")
apply(Y,2,lines,x=time)
text(0.95*t,9,bquote(paste(sigma^2,"=",.(sig2))))
               Example: Brownian motion
    • We can simulate Brownian evolution for a single or multiple
      lineages.




n<-100; t<-100; sig2<-0.01
time<-0:t
X<-rbind(rep(0,n),matrix(rnorm(n*t,sd=sqrt(sig2)),t,n))
Y<-apply(X,2,cumsum)
plot(time,Y[,1],ylim=c(-10,10),xlab="time",ylab="phenotype",type="l")
apply(Y,2,lines,x=time)
text(0.95*t,9,bquote(paste(sigma^2,"=",.(sig2))))
               Example: Brownian motion
    • We can simulate Brownian evolution for a single or multiple
      lineages.




n<-100; t<-100; sig2<-0.05
time<-0:t
X<-rbind(rep(0,n),matrix(rnorm(n*t,sd=sqrt(sig2)),t,n))
Y<-apply(X,2,cumsum)
plot(time,Y[,1],ylim=c(-10,10),xlab="time",ylab="phenotype",type="l")
apply(Y,2,lines,x=time)
text(0.95*t,9,bquote(paste(sigma^2,"=",.(sig2))))
               Example: Brownian motion
    • We can simulate Brownian evolution for a single or multiple
      lineages.




n<-100; t<-100; sig2<-0.1
time<-0:t
X<-rbind(rep(0,n),matrix(rnorm(n*t,sd=sqrt(sig2)),t,n))
Y<-apply(X,2,cumsum)
plot(time,Y[,1],ylim=c(-10,10),xlab="time",ylab="phenotype",type="l")
apply(Y,2,lines,x=time)
text(0.95*t,9,bquote(paste(sigma^2,"=",.(sig2))))
         Example: Brownian motion
• We can simulate Brownian evolution for a single or multiple
  lineages.
• The expected variance on the outcome is just:



                          (due to the additivity of variances)

• Does this mean that if we were just interested in the end
  product of Brownian evolution we could simulate
                                                                 ?

• Yes.
               Example: Brownian motion
    • Discrete time.




n<-100; t<-100; sig2<-0.1
time<-0:t
X<-rbind(rep(0,n),matrix(rnorm(n*t,sd=sqrt(sig2)),t,n))
Y<-apply(X,2,cumsum)
plot(time,Y[,1],ylim=c(-10,10),xlab="time",ylab="phenotype",type="l")
apply(Y,2,lines,x=time)
text(0.95*t,9,bquote(paste(sigma^2,"=",.(sig2))))
               Example: Brownian motion
    • One big step . . . . this is a continuous time simulation.




n<-100; t<-100; sig2<-0.1
time<-c(0,t)
X<-rbind(rep(0,n),matrix(rnorm(n,sd=sqrt(sig2*t)),1,n))
Y<-apply(X,2,cumsum)
plot(time,Y[,1],ylim=c(-10,10),xlab="time",ylab="phenotype",type="l")
apply(Y,2,lines,x=time)
text(0.95*t,9,bquote(paste(sigma^2,"=",.(sig2))))
         Example: Discrete character
 • We can also simulate changes in a discrete character through
   time as a discrete-time Markov process or Markov chain.
 • This is just a way of saying that the process is memoryless –
   i.e., its future state is only dependent on the present state of
   the chain, and not at all on what happened in the past.
 • When we simulate the evolution of a discrete character, we
   need a probability for change per step in the chain.
 • Imagine a character with initial state “blue” that changes to
   “yellow” (and back again) with a probability of 0.2.
t=   1    2    3    4    5    6    7    8    9    10   11   12
              Example: Discrete character
    • We can look at the average properties of discrete time
      Markov chains using simulation as well:




                           0        20        40        60    80   100


n<-100; t<-100; p<-0.1; x<-list(); x[[1]]<-rep(0,n)
for(i in 1+1:t){
          x[[i]]<-x[[i-1]]; z<-runif(n)<p
          x[[i]][intersect(which(x[[i-1]]==0),which(z))]<-1
          x[[i]][intersect(which(x[[i-1]]==1),which(z))]<-0
}
y<-sapply(x,mean)
barplot(rbind(y,1-y),col=c("blue","yellow"),xlab="time")
              Example: Discrete character
    • We can look at the average properties of discrete time
      Markov chains using simulation as well:




                           0        20        40        60    80   100


n<-100; t<-100; p<-0.01; x<-list(); x[[1]]<-rep(0,n)
for(i in 1+1:t){
          x[[i]]<-x[[i-1]]; z<-runif(n)<p
          x[[i]][intersect(which(x[[i-1]]==0),which(z))]<-1
          x[[i]][intersect(which(x[[i-1]]==1),which(z))]<-0
}
y<-sapply(x,mean)
barplot(rbind(1-y,y),col=c("blue","yellow"),xlab="time")
       Example: Discrete character
• For continuous time discrete character simulation, the
  probability distribution for x is found by matrix
  exponentiation of the instantaneous transition matrix Q, i.e.:



• Thus, to simulate the state at time t given the state at time 0,
  we first compute the right product (a vector) and assign the
  state randomly with the probabilities given by pt.

• We can do this and compare to our replicated discrete time
  simulations.
              Example: Discrete character
    • Here’s our discrete time simulation, replicated 1000 times:




                           0        20        40        60    80   100


n<-1000; t<-100; p<-0.01; x<-list(); x[[1]]<-rep(0,n)
for(i in 1+1:t){
          x[[i]]<-x[[i-1]]; z<-runif(n)<p
          x[[i]][intersect(which(x[[i-1]]==0),which(z))]<-1
          x[[i]][intersect(which(x[[i-1]]==1),which(z))]<-0
}
y<-sapply(x,mean)
barplot(rbind(1-y,y),col=c("blue","yellow"),xlab="time")
              Example: Discrete character
    • Here’s our continuous time probability function, overlain:




                           0        20        40        60   80   100


p0<-c(1,0)
Q<-matrix(c(-p,p,p,-p),2,2)
time<-matrix(0:t)
y<-apply(time,1,function(x,Q,p0) expm(Q*x)%*%p0,Q=Q,p0=p0)
plot(time,y[1,],ylim=c(0,1),type="l")
      Example: Discrete character
• What about the wait times to character change?
• Well, under a continuous time Markov process the waiting
  times are exponentially distributed with rate –qii.

• We can compare our discrete time simulation wait times to
  the expected distribution of waiting times in a continuous
  time Markov.
              Example: Discrete character
    • First, compute (first) waiting times from our simulation of
      earlier:




wait<-vector()
X<-sapply(x,rbind)
for(i in 1:nrow(X)){
          if(any(X[i,]==1)) wait[i]<-min(which(X[i,]==1))
          else wait[i]<-t
}
hist(wait,main=NULL,xlab="First waiting time (generations)")->temp
              Example: Discrete character
    • Now, compute and overlay the exponential densities:




z<-vector()
for(i in 2:(length(temp$breaks)))
          z[i-1]<-(pexp(temp$breaks[i],p)-pexp(temp$breaks[i-1],p))*n
lines(temp$mids,z,type="o")
   Methods for quantitative character
         simulation on a tree
• We have discussed simulating characters along single lineages.
• To simulate character data on a tree we have multiple choices.
• We can draw changes along edges of the tree from the root
  to any tip:


                              +0.36               t1 0.36


        0.00
                                        -0.29     t2 0.55

                    +0.84        0.84

                                        +0.33     t3 1.17
   Methods for quantitative character
         simulation on a tree
• An alternative approach utilizes the fact that shared history of
  related species creates covariances between tip species.
• If we simulated data with this covariance, then it will be as if
  we had simulated up the tree from root to tips.


                                                    t1



                                                    t2



                                                    t3
    Covariances between species?
• A useful way to think about the covariance between species
  under BM is just the (long run) covariance of the tip values of
  related species from many replicates of the evolutionary
  process.
• We can explore this concept by simulation (naturally).

                                                    t1



                                                    t2



                                                    t3
           Covariances between species?
   Covariance depends on the relative
                                        t1
   shared history of species.
                                 t1

                                             t2

                                 t2
           50%
                                                  t3
                                 t3



X<-t(fastBM(tree,nsim=1000))
pairs(X,diag.panel=panel.hist)
           Covariances between species?
   Covariance depends on the relative
   shared history of species.
                                 t1



                                 t2
           50%


                                 t3



X<-t(fastBM(tree,nsim=1000))
pairs(X,diag.panel=panel.hist)
           Covariances between species?
   Covariance depends on the relative
                                          t1
   shared history of species.
                                     t1

                                               t2

                                     t2
                90%
                                                    t3
                                     t3



tree$edge.length<-c(0.9,1,0.1,0.1)
X<-t(fastBM(tree,nsim=1000))
pairs(X,diag.panel=panel.hist)
           Covariances between species?
   Covariance depends on the relative
   shared history of species.
                                     t1



                                     t2
                90%


                                     t3



tree$edge.length<-c(0.9,1,0.1,0.1)
X<-t(fastBM(tree,nsim=1000))
pairs(X,diag.panel=panel.hist)
           Covariances between species?
   Covariance depends on the relative
                                          t1
   shared history of species.
                                     t1

                                               t2

                                     t2


   10%                                              t3
                                     t3



tree$edge.length<-c(0.1,1,0.9,0.9)
X<-t(fastBM(tree,nsim=1000))
pairs(X,diag.panel=panel.hist)
           Covariances between species?
   Covariance depends on the relative
   shared history of species.
                                     t1



                                     t2


   10%
                                     t3



tree$edge.length<-c(0.1,1,0.9,0.9)
X<-t(fastBM(tree,nsim=1000))
pairs(X,diag.panel=panel.hist)
    Covariances between species?
• This gives an alternative means of simulating under BM: we
  can just draw a vector of correlated values from the
  multivariate distribution expected for the tips.
• In particular, for:
                      t1
         1.0


                   0.1
                      t2
                            and we simulate from:
         0.9
                   0.1
                      t3

What does this look like?
           Covariances between species?

                                  t1



                                  t2



                                  t3



X<-t(fastBM(tree,nsim=1000))
pairs(X,diag.panel=panel.hist)

X<-matrix(rnorm(3000),1000,3)%*%chol(C)
pairs(X,diag.panel=panel.hist)
                                 Visualization




tree<-pbtree(n=6,scale=1)
tree<-read.tree(text=write.tree(tree))
X<-t(fastBM(tree,nsim=1000))
for(i in length(tree$tip)+1:tree$Nnode) tree<-rotate(tree,i)
plotTree(tree,pts=F,fsize=1.5)
pairs(X,diag.panel=panel.hist)
                    Visualization
• We have many options for visualizing phenotypic evolution on
  the tree (simulated or reconstructed).
• One method is via so-called “traitgrams,” in which we project
  our tree into a phenotype/time space.
• We can use known or estimated ancestral phenotypic values.
• Traitgrams are plotted by drawing lines between ancestors
  and descendants that obtain their vertical dimension from
  known or estimated trait values; and their horizontal
  dimension from the tree.
                                Visualization
    • Traitgram for one trait with known ancestral values:




tree<-pbtree(n=20,scale=100)
x<-fastBM(tree,internal=TRUE)
phenogram(tree,x)
                                 Visualization
    • Traitgram for one trait with estimated ancestral values:




tree<-pbtree(n=20,scale=100)
x<-fastBM(tree,internal=TRUE)
phenogram(tree,x)
phenogram(tree,x[1:length(tree$tip)])
                                 3D traitgram
     • We can also plot a traitgram in two phenotypic dimensions,
       with time represented as a third dimension.




tree<-pbtree(n=40,scale=100)
X<-sim.corrs(tree,matrix(c(1,0.8,0.8,1),2,2))
fancyTree(tree,type="traitgram3d",X=X,control=list(box=FALSE))
movie3d(spin3d(axis=c(0,0,1),rpm=10),duration=10,dir="traitgram3d.movie",convert=FALSE)
3D traitgram
            “Phylomorphospace”
• Another technique we can use to visualize multivariate
  evolution is by projecting the tree into 2 or 3D morphospace
  (defined only by the trait axes).
• This has been called a “phylomorphospace” projection
  (Sidlauskas 2008).
                      “Phylomorphospace”
    • 2D phylomorpho-
      space for correlated
      characters.




phylomorphospace(tree,X)
                      “Phylomorphospace”
    • 3D phylomorpho-
      space.




phylomorphospace(tree,X)
X<-cbind(X,fastBM(tree))
phylomorphospace3d(tree,X)
movie3d(spin3d(axis=c(1,1,1),rpm=10),duration=10,dir="phylomorphospace.movie",convert=FALSE)
      Discrete character evolution
• We can simulate discrete character evolution too.
• One way to do this is to draw wait times from their expected
  distribution, and then change states with probabilities given
  by Q.
• We can not only record the outcome of this simulation, but
  also the character history on the tree.
             Discrete character evolution
    • Let’s try:




tree<-pbtree(n=100,scale=1)
tree<-sim.history(tree,Q=matrix(c(-1,1,1,-1),2,2),anc=1)
cols<-c("blue","red"); names(cols)<-c(1,2)
plotSimmap(tree,cols,pts=FALSE,ftype="off")
        Trait dependent evolution
• Finally, we can simulate the rate or process of evolution in a
  continuous trait as a function of the state for a discrete
  character.
• This works the same way as our previous Brownian motion
  simulations, except that in this case the instantaneous
  variance differs on different branches & branch segments.
                Trait dependent evolution
    • Let’s simulate this and then visualize it using a traitgram.
         – First a phylogeny with an encoded discrete character history:




tree<-sim.history(pbtree(n=30,scale=1),Q=matrix(c(-1,1,1,-1),2,2),anc=1)
cols<-c("blue","red"); names(cols)<-c(1,2)
plotSimmap(tree,cols,pts=F,ftype="i",fsize=0.75)
                Trait dependent evolution
    • Let’s simulate this and then visualize it using a traitgram.
         – Next, simulate a continuous character evolving with two rates & plot:




tree<-sim.history(pbtree(n=30,scale=1),Q=matrix(c(-1,1,1,-1),2,2),anc=1)
cols<-c("blue","red"); names(cols)<-c(1,2)
plotSimmap(tree,cols,pts=F,ftype="i",fsize=0.75)
x<-sim.rates(tree,c(1,10))
phenogram(tree,x,fsize=0.75,colors=cols)
Simulations in comparative biology
• Simulations have both considerable history & extensive
  present use in phylogenetic comparative biology.
• A comprehensive study of all the uses of simulation would be
  impossible. Three uses I will discuss today are as follows:
   – Sampling discrete character histories using stochastic character
     mapping.
   – Simulating null distributions: the “phylogenetic Monte Carlo.”
   – Testing the performance & robustness of phylogenetic methods.
              Stochastic mapping
• Stochastic character mapping is a procedure originally
  devised by Neilsen (2002; Huelsenbeck et al. 2003).
• In this procedure, we first sample (or estimate*) Q
  conditioned on an evolutionary process.
• We then compute the conditional probabilities of all states of
  our trait for all nodes of the tree.
• Next, we sample a character history that is consistent with
  these probabilities & our evolutionary process.
• (We do this by starting at the root node and then sampling up
  the tree conditioned on the parent state.)
• With the nodes set, we sample character history by drawing
  wait times from an exponential distribution.
                        Stochastic mapping




tree<-sim.history(pbtree(n=30,scale=2),Q=matrix(c(-1,1,1,-1),2,2),anc=1)
x<-tree$states
plotTree(tree,ftype=“off”,pts=FALSE)
                        Stochastic mapping




tree<-sim.history(pbtree(n=30,scale=2),Q=matrix(c(-1,1,1,-1),2,2),anc=1)
x<-tree$states
plotTree(tree,ftype=“off”,pts=FALSE)
mtrees<-make.simmap(tree,x,nsim=100)
plotSimmap(mtree,cols,pts=FALSE,ftype=“off”)
                        Stochastic mapping




tree<-sim.history(pbtree(n=30,scale=2),Q=matrix(c(-1,1,1,-1),2,2),anc=1)
x<-tree$states
plotTree(tree,ftype=“off”,pts=FALSE)
mtrees<-make.simmap(tree,x,nsim=100)
plotSimmap(mtrees,cols,pts=FALSE,ftype=“off”)
              Stochastic mapping
• What is stochastic mapping good for?
• For one thing it can be used to model the effects of a discrete
  character on the evolution of quantitative traits.
                        Stochastic mapping




x<-tree$states
mtrees<-make.simmap(tree,x,nsim=200)
y<-sim.rates(tree,c(1,10))
foo<-function(x,y){
          temp<-brownieREML(x,y)
          temp$sig2.multiple["2"]/temp$sig2.multiple["1"]
}
z<-sapply(mtrees,foo,y=y)
            200
                               Stochastic mapping




                                                              100
Frequency
            150




                                                              80
                                                  Frequency
            100




                                                              60
                                                              40
            50




                                                              20
                                                              0
            0




                  0.0   0.5    1.0    1.5   2.0                     0   5   10      15   20   25

                              σ2(1)                                         σ2(2)

x<-tree$states
mtrees<-make.simmap(tree,x,nsim=500)
foo<-function(x,y){
          temp<-brownieREML(x,y)
          c(temp$sig2.multiple["1"],temp$sig2.multiple["2"])
}
Z<-sapply(mtrees,foo,y=y)
        Phylogenetic Monte Carlo
• One of the first uses of simulation in phylogenetic
  comparative biology was in generating a null distribution for
  hypothesis testing.
• This general approach was originally devised for statistical
  problems in which phylogenetic autocorrelation meant that
  parametric statistical methods were inappropriate.
• For example, Garland et al. (1993) devised the phylogenetic
  ANOVA, in which a regular ANOVA was conducted – but then
  rather than compare the F-statistic to a parametric F
  distribution, the authors generated a null distribution via
  Brownian simulation on the tree.
                Phylogenetic Monte Carlo




X<-fastBM(tree,nsim=500)
LR<-vector()
for(i in 1:100){
          temp<-brownie.lite(tree,X[,i])
          LR[i]<--2*(temp$logL1-temp$logL.multiple)
}
temp<-hist(LR)
plot(temp$mids,temp$density,type="l"); Psum<-sum(2*(res$logL.multiple-res$logL1)>LR)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:6/26/2014
language:English
pages:61