VIEWS: 34 PAGES: 32 CATEGORY: Education POSTED ON: 11/21/2008 Public Domain
Introduction to Sequential Monte Carlo Methods Jochen Voss, University of Warwick (UK) stochastic modelling ﬁltering statistical tools sequential methods part I stochastic modelling Many quantities in nature, economy, etc. can be described as random phenomena. examples: weather stock prices election results throwing dice Randomness is used either because the system has a random component or to describe missing information. Statistical quantities to describe randomness: # events occuring probability: P ≈ when the experiment is repeated # trials very often. For numerical random quantities we also use: 1 N expectation/mean: average value over many trials: µ ≈ N i=1 Xi . variance: average squared distance from the mean over many trials: σ 2 ≈ N N (Xi − µ)2 . 1 i=1 Probability distributions are used to completely describe a random quantity; they give the probability for every event. example: The Gaussian normal distribution is described by 1 2 P(X ∈ A) = √ e−x /2 dx A 2π −3 −2 −1 0 1 2 3 Probability distributions describe uncertainty or lack of information. More complex examples of a random objects are stochastic processes: quantities which depends on randomness and on the time. examples: the series of results when throwing a die the temperature at a given spot as a function of time stock prices complex example: the change in time of an oceanic velocity ﬁeld can be described as a stochastic process. part II ﬁltering conditional probabilities Conditional probabilities are used when partial information is available about a random system: if it is already known that event B occurs place, then the conditional probability that event A occurs is P(A, B) P(A|B) = P(B) used to incorporate partial information into a model can be extended to the case P(B) = 0 extremely useful examples. P(A, B) = P(A|B) P(B) P(A, B, C ) = P(A|B, C ) P(B|C ) P(C ) example: X is a random “signal” we only observe observation Y Y =X+ , where is a random perturbation given Y , the value of X is still uncertain, described by the conditional distribution signal X ﬁltering Filtering is the task of ﬁnding the conditional distribution of an (unobserved) signal, given an incomplete observation. ﬁltering updates a model by incorporating information the probability distribution of the model before the observations are taken into account is called the prior distribution. the conditional distribution which incorporates the observations is called the posterior distribution. since the observations remove uncertainty, the posterior typically has smaller variance than the prior. complex example: can one ﬁnd the posterior of the ran- dom velocity ﬁeld, given the path of a ﬂoater? part III statistical tools Bayes’ Rule One of the fundamental tools for computing conditional probabilities is Bayes’ rule: P(A) P(A|B) = P(B|A) P(B) Typically used when A is the signal and B is the observation: P(signal|observation) ∼ P(observation|signal)P(signal). since the observation is known, P(observation) is a constant P(signal|observation) is often diﬃcult to compute P(observation|signal) is often easy to compute prior: example: P(X = 1) = 0.6, P(X = 2) = 0.2, P(X = 3) = 0.2 p = 0.60 p = 0.20 p = 0.20 Y is Gaussian, centered around X posterior: Now we observe Y = 2.05. Using Bayes’ rule we can com- observation pute the posterior of X . p = 0.22 p = 0.67 p = 0.11 Monte-Carlo Methods Distributions can be approximated by “clouds” of particles: # particles in A P(X ∈ A) ≈ # particles N 1 E (X ) ≈ X (i) N i=1 π 766 ≈ =⇒ π ≈ 3.064 4 1000 Importance Sampling Sometimes plain Monte Carlo methods don’t work well. An extension is importance sampling: we are interested in the target distribution f we sample particles from the “wrong” distribution g (the proposal distribution) to compensate, we have to add weights to the particles: f (X (i) ) w (i) = g (X (i) ) we use the weights in approximations: N 1 E (X ) ≈ w (i) X (i) N i=1 part IV sequential methods ﬁltering problem: (unobserved) signal: xn ∼ p(xn |xn−1 ) observations: yn ∼ p(yn |xn ) x0 x1 x2 x3 ··· y1 y2 y3 task: use the observations y1 , . . . , yN to ﬁnd the posterior p(xn |y1 , . . . , yn ) There are two commonly used setups: oﬄine ﬁltering: all the observations are available from the start online ﬁltering: we want to compute the posterior after every observation. Sequential methods are most useful in the second case, i.e. for online ﬁltering. We use importance sampling to approximate the posterior: (i) (i) Xn−1 with weights wn−1 is our approximation for p(xn−1 |y1 , . . . , yn−1 ). prior: p(xn |xn−1 , y1 , . . . , yn−1 ) posterior: p(xn |xn−1 , y1 , . . . , yn−1 , yn ) (i) (i) updated weights: wn = wn−1 · p(yn |xn ) technical trick: after a while most weights become very small throw away particles with very small weights replace particles with big weights by several particles with smaller weights