VIEWS: 27 PAGES: 41 CATEGORY: Legal POSTED ON: 2/16/2010 Public Domain
Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions Introduction to Biochemical Network Modelling Darren Wilkinson1,2 1 School of Mathematics & Statistics 2 Centre for Integrated Systems Biology of Ageing and Nutrition Newcastle University, UK SAMSI Undergraduate Workshop, 2nd–3rd March, 2007 Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Application projects Bayesian inference Summary and conclusions Overview Biological network modelling Model calibration Application projects — modelling and inference (Bayesian inference) Round-up Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions Computational Systems Biology (CSB) Much of CSB is concerned with building models of complex biological pathways, then validating and analysing those models using a variety of methods, including time-course simulation Most CSB researchers work with continuous deterministic models (coupled ODE and DAE systems) There is increasing evidence that much intra-cellular behaviour (including gene expression) is intrinsically stochastic, and that systems cannot be properly understood unless stochastic eﬀects are incorporated into the models Stochastic models are harder to build, estimate, validate, analyse and simulate than deterministic models... Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions Modelling Start with some kind of picture or diagram for a mechanism Turn it into a set of (pseudo-)biochemical reactions Specify the rate laws and rate parameters of the reactions Run some stochastic or deterministic computer simulator of the system dynamics Study the dynamics in a variety of ways to gain insight into the system Reﬁne the model structure and/or parameters after comparing simulated dynamics with experimental observations Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions Example — genetic auto-regulation P r P2 RNAP p q g DNA Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions Biochemical reactions Simpliﬁed view: Reactions g + P2 ←→ g · P2 Repression g −→ g + r Transcription r −→ r + P Translation 2P ←→ P2 Dimerisation r −→ ∅ mRNA degradation P −→ ∅ Protein degradation But these aren’t as nice to look at as the picture... Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions Petri net representation Simple bipartite digraph representation of the reaction network — useful both for visualisation and computational analysis Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions Matrix representation of the Petri net Reactants (Pre) Products (Post) Species g · P2 g r P P2 g · P2 g r P P2 Repression 1 1 1 Reverse repression 1 1 1 Transcription 1 1 1 Translation 1 1 1 Dimerisation 2 1 Dissociation 1 2 mRNA degradation 1 Protein degradation 1 But still need rate laws and reaction rates... Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions Mass-action stochastic kinetics Stochastic molecular approach: Statistical mechanics arguments lead to a Markov jump process in continuous time whose instantaneous reaction rates are directly proportional to the number of molecules of each reacting species Such dynamics can be simulated (exactly) on a computer using standard discrete-event simulation techniques Standard implementation of this strategy is known as the “Gillespie algorithm” (just discrete event simulation), but there are several exact and approximate variants of this basic approach Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions Lotka-Volterra system Reactions X −→ 2X (prey reproduction) X + Y −→ 2Y (prey-predator interaction) Y −→ ∅ (predator death) X – Prey, Y – Predator We can re-write this using matrix notation for the corresponding Petri net Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions Forming the matrix representation The L-V system in tabular form Rate Law LHS RHS Net-eﬀect h(·, c) X Y X Y X Y R1 c1 x 1 0 2 0 1 0 R2 c2 xy 1 1 0 2 -1 1 R3 c3 y 0 1 0 0 0 -1 Call the 3 × 2 net-eﬀect (or reaction) matrix A. The matrix S = A is the stoichiometry matrix of the system. Typically both are sparse. The SVD of S (or A) is of interest for structural analysis of the system dynamics... Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions Petri net invariants A P-invariant is a non-zero solution to Ay = 0 (ie. y is in the null-space of A) P-invariants correspond to conservation laws in the network, and lead to rank-degeneracy of A A T -invariant is a non-zero, non-negative (integer-valued) solution to Sx = 0 (ie. x is in the null-space of S) T invariants correspond to sequences of reaction events that return the system to its original state The SVD of S (or A) characterises the null-space of S and A The Lotka-Volterra model is of full rank (so no P-invariants), and has one T -invariant, x = (1, 1, 1) Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions The Gillespie algorithm 1 Initialise the system at t = 0 with rate constants c1 , c2 , . . . , cv and initial numbers of molecules for each species, x1 , x2 , . . . , xu . 2 For each i = 1, 2, . . . , v , calculate hi (x, ci ) based on the current state, x. v 3 Calculate h0 (x, c) ≡ i=1 hi (x, ci ), the combined reaction hazard. 4 Simulate time to next event, t , as an Exp(h0 (x, c)) random quantity, and put t := t + t . 5 Simulate the reaction index, j, as a discrete random quantity with probabilities hi (x, ci ) / h0 (x, c), i = 1, 2, . . . , v . 6 Update x according to reaction j. That is, put x := x + S (j) , where S (j) denotes the jth column of the stoichiometry matrix S. 7 Output x and t. 8 If t < Tmax , return to step 2. Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions The continuous deterministic approximation If the discreteness and stochasticity are ignored, then by considering the reaction ﬂuxes it is straightforward to deduce the mass-action ordinary diﬀerential equation (ODE) system: ODE Model dXt = Sh(Xt , c) dt Analytic solutions are rarely available, but good numerical solvers can generate time course behaviour Slight complications due to rank-degeneracy of S Also spatial versions — reaction-diﬀusion kinetics — PDE models — computationally intensive (slow) Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions The Lotka-Volterra model 25 [Y1] 15 [Y2] 20 15 10 [Y] [Y2] 10 5 5 0 0 0 20 40 60 80 100 0 2 4 6 8 Time [Y1] Y1 400 400 Y2 300 300 Y2 Y 200 200 100 100 0 5 10 15 20 25 50 100 150 200 250 300 350 Time Y1 Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions Key diﬀerences Deterministic solution is exactly periodic with perfectly repeating oscillations, carrying on indeﬁnitely Stochastic solution oscillates, but in a random, unpredictable way (wandering from orbit to orbit in phase space) Stochastic solution will end in disaster! Either prey or predator numbers will hit zero... Either way, predators will end up extinct, so expected number of predators will tend to zero — qualitatively diﬀerent to the deterministic solution So, in general the deterministic solution does not provide reliable information about either the stochastic process or its average behaviour Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Modelling Bayesian inference Stochastic kinetics Summary and conclusions Simulated realisation of the auto-regulatory network 2.0 1.5 Rna 1.0 0.5 0.0 50 30 P 600 0 10 400 P2 200 0 0 1000 2000 3000 4000 5000 Time Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Likelihood-based fully Bayesian inference Bayesian inference “Likelihood-free” Bayesian inference Summary and conclusions Model calibration In its most basic form, model calibration is concerned with “tuning” the parameters of a computer model in order to make the output obtained by running it consistent with experimental observations In practice, this is only one aspect of the problem, as there will typically be a range of parameter values consistent with observations, and so the calibration exercise is part of a broader analysis, also concerning model validity and parameter identiﬁability and confounding Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Likelihood-based fully Bayesian inference Bayesian inference “Likelihood-free” Bayesian inference Summary and conclusions Simple example: linear birth-death process Birth-death reactions λX X −→ 2X µX X −→ ∅ Deterministic solution: Xt = X0 exp{(λ − µ)t} This is a function of (λ − µ) only! Stochastic solution is more interesting, and depends on both λ and µ... Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Likelihood-based fully Bayesian inference Bayesian inference “Likelihood-free” Bayesian inference Summary and conclusions Birth-death realisations 60 50 40 30 X 20 10 0 0 1 2 3 4 5 t Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Likelihood-based fully Bayesian inference Bayesian inference “Likelihood-free” Bayesian inference Summary and conclusions Issues with the birth-death process Stochastic variation: random distribution at each time point, correlations between time points, random time to extinction, etc. Parameter identiﬁcation: if a deterministic model is ﬁtted, one can only ever identify (λ − µ) — never λ and µ separately Information about both λ and µ in the data... Need both λ and µ for reliable stochastic simulation Can’t ﬁt parameters using a deterministic model, then run a stochastic simulation... Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Likelihood-based fully Bayesian inference Bayesian inference “Likelihood-free” Bayesian inference Summary and conclusions Birth-death realisations lambda=0, mu=1 lambda=3, mu=4 60 60 40 40 X X 20 20 0 0 0 1 2 3 4 5 0 1 2 3 4 5 t t lambda=7, mu=8 lambda=10, mu=11 60 60 40 40 X X 20 20 0 0 0 1 2 3 4 5 0 1 2 3 4 5 t t Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Likelihood-based fully Bayesian inference Bayesian inference “Likelihood-free” Bayesian inference Summary and conclusions Fully Bayesian inference In principle it is possible to carry out rigorous statistical inference for the parameters of the stochastic process model Fairly detailed experimental data are required — eg. quantitative single-cell time-course data derived from live-cell imaging The standard procedure uses GFP labelling of key reporter proteins together with time-lapse confocal microscopy, but other approaches are also possible The statistical theory underlying the inference algorithms is fairly technical — the techniques are developed and illustrated in a sequence of papers. The main ﬁndings are summarised in: Golightly, A. & Wilkinson, D. J. (2006) Bayesian sequential inference for stochastic kinetic biochemical network models, Journal of Computational Biology, 13(3):838–851. Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Introduction Application projects Likelihood-based fully Bayesian inference Bayesian inference “Likelihood-free” Bayesian inference Summary and conclusions “Likelihood-free” MCMC for Bayesian inference It is possible to develop a generic framework for Bayesian inference for model parameters applicable to both deterministic and stochastic models using the ideas of “likelihood-free MCMC”, which sacriﬁces some computational eﬃciency for considerable reduction in implementation complexity It exploits forward simulation from the computer model Such an approach requires a very large number of simulation runs, and is therefore most easily applied to fast simulators (simple models) For slow simulators (complex models), HPC facilities can be exploited in order to build a fast emulator of the slow simulator Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Ageing Application projects Complex modelling Bayesian inference Bayesian calibration Summary and conclusions Mechanisms of ageing Ageing is caused by the gradual accumulation of unrepaired molecular damage, leading to an increasing fraction of damaged cells and eventually to functional impairment of tissues and organs One major cause of molecular damage is highly reactive oxygen species (ROS) Molecular damage may trigger cellular response programmes, so that the ageing process may also be seen to be governed by genetically determined pathways Many of the (random) damage and (imperfect) repair mechanisms important for understanding cellular ageing are intrinsically stochastic Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Ageing Application projects Complex modelling Bayesian inference Bayesian calibration Summary and conclusions Network theory of ageing Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Ageing Application projects Complex modelling Bayesian inference Bayesian calibration Summary and conclusions Modelling large biological systems BBSRC/MRC/DTI Grant (+ Unilever) BASIS — Biology of Ageing e-Science Integration and Simulation (4/02–3/06) — Kirkwood, Wilkinson, Boys, Gillespie, Proctor, Shanley Modelling large complex systems with many interacting components SBML model database (SBML encoded for discrete stochastic simulation) Discrete stochastic simulation service (and results database) Distributed computing infrastructure for routine use (web portal and web-service interface for GRID computing) Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Ageing Application projects Complex modelling Bayesian inference Bayesian calibration Summary and conclusions SBML — The Systems Biology Markup Language SBML is an XML-based language for encoding and exchanging quantitative biochemical network models Encodes species, initial amounts, reactions, rate laws, etc. Original speciﬁcation (Level 1) aimed mainly at continuous deterministic models Current speciﬁcation (Level 2) perfectly capable of encoding discrete stochastic models in an unambiguous way Many tools for working with SBML models (model builders, deterministic and stochastic simulators, etc.) Issues with testing correctness of stochastic simulators, and correctly encoding discrete stochastic models using oﬀ-the-shelf model-building tools Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Ageing Application projects Complex modelling Bayesian inference Bayesian calibration Summary and conclusions Computer model technology BASIS features — service-oriented architecture (SOA) Controls access to models, data and computational resources Represents and encodes complex models using XML technology (SBML in this case) Simulation engine that can handle a broad class of models without recompilation Databases for models and simulation output Web interface for human-interaction SOAP web-services API for programmatical access Do we need a standard API for biological simulation services? Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Ageing Application projects Complex modelling Bayesian inference Bayesian calibration Summary and conclusions BASIS Software – www.basis.ncl.ac.uk UK e-Science GRID Pilot Project Web client SOAP client (WS−Security) SOAP client (SSL) Apache web server Tomcat Python Spyce/PSP and CGI scripts Axis Java WS−Security WSs Python − SOAP Web Services interface (SSL−based) Python Main BASIS API Postgres Condor libSBML R GraphViz Database Job SBML Data Network sched library analysis visualise C GSL Simulation code Scientific library Debian GNU/Linux (sarge) Software architecture used to implement the BASIS system Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Ageing Application projects Complex modelling Bayesian inference Bayesian calibration Summary and conclusions Example: Chaperones and their role in ageing C. J. Proctor, C. Soti, R. J. Boys, C. S. Gillespie, D. P. Shanley, D. J. Wilkinson, T. B. L. Kirkwood (2005) Modelling the actions of chaperones and their role in ageing, Mechanisms of Ageing and Development, 126(1):119-131. Several versions of this model in the BASIS public model repository, each with a unique ID — each can be copied, modiﬁed and simulated eg. urn:basis.ncl:model:518 Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Ageing Application projects Complex modelling Bayesian inference Bayesian calibration Summary and conclusions Outline CaliBayes architecture Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Ageing Application projects Complex modelling Bayesian inference Bayesian calibration Summary and conclusions Distinctive features Although the outline architecture appears similar to that for many iterative parameter ﬁtting algorithms, there are some fundamental diﬀerences This is not a hill-climbing algorithm, and is not searching for the “best ﬁt” The calibration engine is using the information it receives from the statistical comparison service in order to randomly explore the posterior distribution for the parameters (the set of parameters consistent with the data and prior knowledge, weighted according to their probability) This posterior distribution can be used for a range of analyses, including calibration Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Ageing Application projects Complex modelling Bayesian inference Bayesian calibration Summary and conclusions An example posterior distribution 25 100 0.25 20 80 15 0.20 60 Density Density v’’dr 10 40 0.15 5 20 0.10 0 0 0.00 0.01 0.02 0.03 0.04 0.000 0.010 0.020 0.030 0.05 0.15 0.25 v’d v’d v’’d Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Ageing Application projects Complex modelling Bayesian inference Bayesian calibration Summary and conclusions Extensions Bayesian inference naturally integrates data from multiple sources, and may be assimilated simultaneously or sequentially depending on the context The architecture requires slight modiﬁcation for complex models, as then the simulator is replaced by an emulator, built oﬀ-line using HPC facilities The framework can also be adapted to tackle experimental design questions such as: Given a limited budget, and our current state of knowledge, what are the best new experiments to carry out in order to learn most about the model parameters of greatest interest? It is also possible to extend the framework to compare evidence for competing models for the same process Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration MCMC Application projects Future directions Bayesian inference Emulators Summary and conclusions MCMC-based fully Bayesian inference for fast computer models Before worrying about the issues associated with slow simulators, it is worth thinking about the issues involved in calibrating fast deterministic and stochastic simulators, based only on the ability to forward-simulate from the model In this case it is often possible to construct MCMC algorithms for fully Bayesian inference using the ideas of likelihood-free MCMC (Marjoram et al 2003) Here an MCMC scheme is developed exploiting forward simulation from the model, and this causes problematic likelihood terms to drop out of the M-H acceptance probabilities Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration MCMC Application projects Future directions Bayesian inference Emulators Summary and conclusions Future directions In the presence of measurement error, the sequential likelihood-free scheme is eﬀective, and is much simpler than a more eﬃcient MCMC approach The likelihood-free approach is easier to tailor to non-standard models and data The essential problem is that of calibration of complex stochastic computer models Worth connecting with the literature on deterministic computer models For slow stochastic models, there is considerable interest in developing fast emulators and embedding these into MCMC algorithms Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration MCMC Application projects Future directions Bayesian inference Emulators Summary and conclusions Building emulators for slow simulators Use Gaussian process regression to build an emulator of a slow deterministic simulator Obtain runs on a carefully constructed set of design points (eg. a Latin hypercube) — easy to exploit parallel computing hardware here For a stochastic simulator, many approaches are possible (Mixtures of) Dirichlet processes (and related constructs) are potentially quite ﬂexible Can also model output parametrically (say, Gaussian), with parameters modelled by (independent) Gaussian processes Will typically want more than one run per design point, in order to be able to estimate distribution Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Biological computer models Application projects Problems Bayesian inference Reference Summary and conclusions Why are Systems Biology models interesting examples of computer models? Models Diverse class of models: fast/slow, spatial/non-spatial, deterministic/stochastic, discrete/continuous time/states — even modelling the same biological process! Many parameters Structural uncertainty Genuine interest in the (posterior distribution of the) parameters — not just in prediction Data High-dimensional Diverse: high-resolution time-course data, coarse population averaged data, endpoint data, distributional data, individual speciﬁc parameters/data, covariates Multiple distinct sources of data for a given model Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Biological computer models Application projects Problems Bayesian inference Reference Summary and conclusions Interesting methodological problems Calibration of fast and slow stochastic simulators, using individual, averaged and distributional data Dealing with heterogeneity — cell–cell, tissue–tissue, or organism–organism Emulation of slow stochastic simulators — good models and ﬁtting procedures Experimental design for stochastic computer models — trade oﬀs between repetition and space-ﬁlling, etc. Utilising fast stochastic or deterministic approximate simulators for a slow stochastic simulator Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling Biological modelling Model calibration Biological computer models Application projects Problems Bayesian inference Reference Summary and conclusions Further information Stochastic Modelling for Systems Biology An accessible introduction to stochastic modelling of complex genetic and biochemical networks. Covers: biological modelling, biochemical reac- tions, Petri nets, SBML, stochastic processes, sim- ulation algorithms (including Gillespie), case stud- ies, MCMC, and Bayesian inference for network dynamics. ISBN: 1-58488-540-8 Contact details... email: d.j.wilkinson@ncl.ac.uk www: http://www.staff.ncl.ac.uk/d.j.wilkinson/ Darren Wilkinson — SAMSI Undergraduate Workshop Biochemical Network Modelling