Carlo Simulation Of The by Heroes On Parade


									CPSC 405                                                                                     2007
                     Monte Carlo Simulation
1    Some History
The term “Monte Carlo Method” is very general. A Monte Carlo method is a stochastic tech-
nique, meaning that it is based on using random numbers and probability to investigate problems.
Monte Carlo methods are employed in a wide variety of fields including economics, finance,
physics, chemistry, engineering, and even the study of traffic flows. Each field using Monte
Carlo methods may apply them in different ways, but in essence they are using random numbers
to examine some problem and approximate its outcome. As such Monte Carlo methods give us
a way to model complex systems that are often extremely hard to investigate with other types of
    The term “Monte Carlo” was introduced by von Neumann and Ulam during World War II, as
a code word for the secret work at Los Alamos; it was suggested by the gambling casinos at the
city of Monte Carlo in Monaco. The Monte Carlo method was then applied to problems related
to the atomic bomb.
    What we shall describe below as the “basic Monte Carlo” algorithm has been known since
about 1850. A somewhat popular game at the time was to throw a needle onto a board ruled with
parallel straight lines and then calculate the value of π from observations of the number of inter-
sections between needle and lines. This algorithm was published by A. Hall in an article titled
“On an experimental determination of PI” in the Journal Messenger of Mathematics in 1872. For
a simulation of the experiment see

    The Monte Carlo method took off as a computational tool when computers became avail-
able and radical improvements over the “basic” algorithm were found, described below under
”Markov Chain Monte Carlo”.

2    Basic Monte Carlo Simulation
The basic idea of Monte Carlo is very simple. Since mean values of stochastic variables can be
expressed as the integral of a variable times the probability density function, we could reverse
the process by generating stochastic numbers and computing their mean by simple averaging. In
theory, this should then give you an approximation to the integral. Let’s make this more precise.
    Suppose we have an integral to evaluate of the form

                                      I=          h(x)f (x)dx                                  (1)

where D is some (usually high dimensional) domain with coordinates x and f (x) is a non-
negative function which satisfies
                                              f (x)dx = 1.                                     (2)

Eq. 2 together with the non-negativity conditions allows us to interpret f as a probability density
function on D. With this interpretation in mind, we see that Eq. 1 is just the formula for the
expectation value of h(x), where x are random variables on domain D with pdf P (x) = f (x).
   If we now could generate N random variables xi (these will generally be vector coordinates)
with pdf P (x) = f (x), we could approximate I in Eq. 1 by the experimentally observed average
                                  I =< h(x) >≈                  h(xk ).                         (3)
                                               N          k=1

Even better, we could also estimate the standard deviation of the approximation to I and get
some error estimates as well.
    The example we did earlier, computing the area of the unit circle, can be cast in this form.
D is the square x = [x, y] = [−1 1] × [−1 1]. f (x, y) = 1/4 inside D, zero elsewhere (just a
uniform distribution), and

                                  h(x, y) =     1 if x2 + y 2 < 1                               (4)
                                                0 otherwise.                                    (5)

In this case it is very easy to generate samples xk of pdf f (x) and the integral I should of course
evaluate to π/4.
    As long as we work in one or two dimensions, the Monte Carlo method is inefficient, and
better numerical approximations are available to compute the integral I. Such methods also
sample the space D by discrete samples xk , k = 1, . . . N , but usually not in a random way. For
such methods, in an M dimensional space D, the error in the approximation decreases with N
according to

                                           N −1/M ,                                             (6)

which is very very slow in high dimensions. The error in the Monte Carlo method on the other
hand decreases as

                                            N −1/2 ,                                            (7)

independent of the dimension of D!
   Many scientific problems can be cast into the form (1). Usually h(x) is a known and not too
complicated function, but f (x) can only be computed with great effort, as it involves extensive
simulation. The difficult part of Monte Carlo simulation lies in finding methods to actually
generate random variables x that are distributed with pdf P (x) = f (x).
   One way around this would be to recast (1) into a different form by defining
                                         h(x) = V h(x)f (x)                                    (8a)
                                         f (x) = 1/V, with                                     (8b)
                                         V =        dx.                                        (8c)

In terms of these new “hatted” functions our integral (1) looks like

                                          I=       ˆ   ˆ
                                                   h(x)f (x)dx,                                (9)

and, at least for hypercubical domains D it is easy to generate samples xk as f is just the uniform
distribution. Unfortunately this means that, if f was very small on large regions in D, we are
generating many samples x where it does not contribute much to the average (2). This is wasteful,
because our new h is expensive to evaluate since we have to compute f every time, which was
expensive. For large problems this is so bad that the method is too slow to be useful in practice.

3     Markov Chain Monte Carlo: The Metropolis Algorithm
In 1953 Metropolis and collaborators published a method to effectively sample P (x) = f (x).
This algorithm is now known as the Metropolis algorithm. Nicolas Metropolis was a mathemati-
cian who worked on the construction of the Los Alamos MANIAC computer used for hydrogen
bomb calculations. Edward Teller, the so-called father of the H bomb, is one of the co-authors
on the original paper (Journal of Chemical Physics, 1953) describing the algorithm. Please read
sections I and II of the paper. The rest is optional.
    The Metropolis algorithm computes a sequence of vectors xi , i = 1, . . . , N , that sample
P (x). This sequence is highly correlated, but if I then randomly shuffle all the xi , for large
enough N , they are random variables with pdf P (x). Some of the values xi can, and usually are,
the same.
    The algorithm appears deceptively simple:

    1. Pick any starting point for x1 .

    2. Pick a candidate point y at random from some (small) neighborhood around xi .

    3. Define pa = P (y)/P (xi )

    4. Generate a uniform random number r on [0 1].

    5. If r < pa set xi+1 = y, otherwise set xi+1 = xi , i.e., reject y.

    At this point you may ask: how do you pick “a candidate point y at random from some
(small) neighborhood around xi ”? This is where the art comes in. Some general conditions on
this “picking algorithm” are known. If they are satisfied you can prove the algorithm converges
for large N . The art lies in choosing the picking method such that it converges as fast as possi-
ble. Nobody knows how to do this best and techniques used are mainly based on experience in
applications to specific problems.
    One method that works, but is almost as bad as the “naive” Monte Carlo method described in
Section 2, is to just choose y totally at random. The problem is that in most applications P (x)
is very small almost everywhere, so that pa (the probability to accept a candidate) is very small.

So if we find a point with a largish value of P (x) we should generate a cluster of points around
there, but we immediately jump away from it and may never find this area in D, which is vast,
     What is usually done is to choose y probabilistically, with a pdf (which has nothing directly
to do with P (x)) that favors points close to the current point x. So the points xi tend to randomly
walk around the domain D, governed by some probabilities. Such a “random walk” is a Markov
chain, hence the name “Markov Chain Monte Carlo”. Other Markov Chain Monte Carlo algo-
rithms besides the Metropolis algorithm exist which have different strategies to sample P (x).
     If the Markov chain obeys certain rules (which we won’t go into) it is provable that the
Metropolis algorithm converges to a sampling of P (x). There are infinitely many ways to set up
this Markov chain and how you do this precisely is critically important in getting the algorithm
to converge in a reasonable amount of time. It is not uncommon to run a Monte Carlo algorithm
for months on a cluster of computers.
     The sample code provided in mcfun2.m illustrates the naive and Metropolis algorithms on
a trivial problem, for which we know the answer. The integral to compute is
                                                             ∞    ∞
                                          500, 000                                        2 +y 2 )
                                     I=                                   x2 y 2 e−50(x              dxdy.
                                             π          −∞        −∞

                                                                                                              2          2
Some elementary analysis shows that I = 1. We interpret 50 e−50(x +y ) as P (x) and h(x) =
10, 000x2 y 2 . Note that though the domain D is the entire plane, P (x) is very small everywhere
except around the origin. The example provided computes the integral with the naive or with
the Metropolis algorithm. The Markov chain implemented consists of jumping to another x in
a unit square around x. The function reports the numerical approximation to I as well as the α
confidence interval. A modest improvement can be observed in the Metropolis method which is
all we can hope for in two dimensions. In the figure you can see how the points are sampled with
the naive method and with the Metropolis algorithm.


      6                                       6

      4                                       4

      2                                       2                                                0.1

      0                                       0                                                 0

     −2                                      −2

     −4                                      −4

     −6                                      −6

          −6   −4   −2   0   2   4   6        −8   −6   −4   −2   0       2   4   6   8         −0.4   −0.3       −0.2       −0.1   0   0.1   0.2   0.3

                (a) Naive MC                            (b) Metropolis                        (c) A zoom into the central region

Figure 1: Computing an integral with the Monte Carlo method with sample code mcfun2.m.
The domain is −8 < x, y < 8. The naive algorithm samples over the entire region, which is
wasteful, whereas the Metropolis algorithm samples only where it matters.

4    Statistical Physics
Monte Carlo simulation is widely used in statistical physics. Suppose we have a model with a
large state space S. For example in molecular physics S would be the listing of the positions and
velocities of all the molecules in the system. To simulate an N molecule system then results in
a statespace of dimension 6N . The model of the system consists of a formula (or algorithm) to
compute the energy E(s) of the system in state s. In molecular applications E could be computed
by looking at all neigbouring molecules, evaluating their potential energy due to the interactions
of their electron clouds, and summing over all possible pairs. At zero temperature the physical
state of the system is then found by finding the state s0 that minimizes E(s).
    At non-zero temperature the state “fluctuates” due to thermal agitation. Suppose we now
measure some quantity F (s) which is expressed as some function of the state. For example, we
might be interested in the average intermolecular distance or something like that. A basic law of
statistical physics states that the measured value of F is the statistical average of F (s) over all
the states s with weight factor P (s) (which can be interpreted as a pdf) given by

                                      P (s) = Z −1 e−E(s)/kT ,

where k is the Bolzmann constant, T is the temperature in Kelvin, and the normalization constant
Z is chosen to make P (s) a proper pdf with integral over state space equal to 1:

                                       Z=         e−E(s)/kT ds.

So the observed value of F (s) is

                               < F >= Z −1            F (s)e−E(s)/kT ds,

which is precisely in the form suitable for a Monte Carlo simulation.
     An example is the Ising model which is used to study phase transitions. So-called spins sit
on the sites of a lattice in space. A spin s can take the value +1 or −1. These values could stand
for the presence or absence of an atom, or the orientation of a magnetic atom (up or down). The
energy of the model derives from the interaction between the spins. We take the energy per pair
of neighbors s and s as −Jss , where J > 0 is the spin-spin interaction. When the temperature T
is high in relation to J, the spins are disordered: they take more or less random values. However,
when the temperature T drops below a critical temperature, the spin system orders into a state of
broken symmetry and most of the spins have the same sign.
     If we label the points (usually arranged on a grid) at which the spin variables si sit i =
1, . . . , N , the magnetization M can be defined as
                                         M (s) =             si .

At zero temperature the energy is minimized when all the si have the same sign and M will have
some definite positive or negative value. At non-zero temperature the measured magnetization is
given by
                             < M >= Z −1         M (s)e−E(s)/kT ds,
which can be computed with a Monte Carlo simulation. Below a critical temperature Tc the
system becomes magnetized. For a visualization see


To top