Introduction to Sequential Monte Carlo Methods stochastic by gregorio11

VIEWS: 34 PAGES: 32

									Introduction to Sequential Monte Carlo Methods
                        Jochen Voss, University of Warwick (UK)




 stochastic modelling
 filtering
 statistical tools
 sequential methods
part I
stochastic modelling
Many quantities in nature, economy, etc. can be described as random
phenomena.


examples:
    weather
    stock prices
    election results
    throwing dice


Randomness is used either
    because the system has a random component or
    to describe missing information.
Statistical quantities to describe randomness:
                    # events occuring
probability: P ≈                       when the experiment is repeated
                         # trials
very often.
For numerical random quantities we also use:
                                                          1    N
expectation/mean: average value over many trials: µ ≈     N    i=1 Xi .
variance: average squared distance from the mean over many trials:
σ 2 ≈ N N (Xi − µ)2 .
      1
          i=1
Probability distributions are used to completely describe a random
quantity; they give the probability for every event.

example: The Gaussian normal distribution is described by
                                          1   2
                      P(X ∈ A) =         √ e−x /2 dx
                                     A    2π




                 −3    −2     −1     0      1     2     3



Probability distributions describe uncertainty or lack of information.
More complex examples of a random objects are stochastic processes:
quantities which depends on randomness and on the time.
examples:
    the series of results when throwing a die
    the temperature at a given spot as a function of time
    stock prices
complex example:
the     change    in
time of an oceanic
velocity field can
be described as a
stochastic process.
part II
filtering
conditional probabilities
Conditional probabilities are used when partial information is available
about a random system: if it is already known that event B occurs place,
then the conditional probability that event A occurs is

                                     P(A, B)
                          P(A|B) =
                                      P(B)


    used to incorporate partial information into a model
    can be extended to the case P(B) = 0
    extremely useful

examples.
                       P(A, B) = P(A|B) P(B)
                 P(A, B, C ) = P(A|B, C ) P(B|C ) P(C )
                           example:
                              X is a random “signal”
                              we only observe




           observation Y
                              Y =X+ ,
                              where is a random
                              perturbation
                              given Y , the value of X
                              is still uncertain,
                              described by the
                              conditional distribution
signal X
filtering


Filtering is the task of finding the conditional distribution of an
(unobserved) signal, given an incomplete observation.

    filtering updates a model by incorporating information
    the probability distribution of the model before the observations are
    taken into account is called the prior distribution.
    the conditional distribution which incorporates the observations is
    called the posterior distribution.
    since the observations remove uncertainty, the posterior typically has
    smaller variance than the prior.
complex example:
can one find the
posterior of the ran-
dom velocity field,
given the path of a
floater?
part III
statistical tools
Bayes’ Rule
One of the fundamental tools for computing conditional probabilities is
Bayes’ rule:
                                         P(A)
                       P(A|B) = P(B|A)
                                         P(B)

Typically used when A is the signal and B is the observation:

         P(signal|observation) ∼ P(observation|signal)P(signal).


    since the observation is known, P(observation) is a constant
    P(signal|observation) is often difficult to compute
    P(observation|signal) is often easy to compute
                                prior:
example:
P(X = 1) = 0.6,
P(X = 2) = 0.2,
P(X = 3) = 0.2
                                  p = 0.60   p = 0.20      p = 0.20
Y is Gaussian,
centered around X
                                posterior:

Now we observe Y = 2.05.
Using Bayes’ rule we can com-                observation
pute the posterior of X .
                                  p = 0.22   p = 0.67      p = 0.11
Monte-Carlo Methods


Distributions can be approximated
by “clouds” of particles:
                  # particles in A
    P(X ∈ A) ≈
                    # particles
                 N
             1
    E (X ) ≈           X (i)
             N
                 i=1

                                     π    766
                                       ≈      =⇒ π ≈ 3.064
                                     4   1000
Importance Sampling
Sometimes plain Monte Carlo methods don’t work well. An extension is
importance sampling:
    we are interested in the target distribution f
    we sample particles from the “wrong” distribution g (the proposal
    distribution)
    to compensate, we have to add weights to the particles:

                                         f (X (i) )
                               w (i) =
                                         g (X (i) )

    we use the weights in approximations:
                                          N
                                   1
                          E (X ) ≈             w (i) X (i)
                                   N
                                         i=1
part IV
sequential methods
filtering problem:
     (unobserved) signal: xn ∼ p(xn |xn−1 )
     observations: yn ∼ p(yn |xn )


              x0            x1            x2    x3   ···




                            y1            y2    y3


task: use the observations y1 , . . . , yN to
find the posterior p(xn |y1 , . . . , yn )
There are two commonly used setups:
    offline filtering: all the observations are available from the start
    online filtering: we want to compute the posterior after every
    observation.
Sequential methods are most useful in the second case, i.e. for online
filtering.
We use importance sampling to approximate the posterior:
      (i)                      (i)
    Xn−1 with weights wn−1 is our approximation for
    p(xn−1 |y1 , . . . , yn−1 ).
    prior: p(xn |xn−1 , y1 , . . . , yn−1 )
    posterior: p(xn |xn−1 , y1 , . . . , yn−1 , yn )
                             (i)       (i)
    updated weights: wn = wn−1 · p(yn |xn )
technical trick: after a while most weights become very small
    throw away particles with very small weights
    replace particles with big weights
    by several particles with smaller weights

								
To top