cs668 lec10 MonteCarlo by n09C7b


									Lecture 10 Outline
 Monte Carlo methods
 History of methods
 Sequential random number generators
 Parallel random number generators
 Generating non-uniform random numbers
 Monte Carlo case studies
Monte Carlo Methods
   Monte Carlo is another name for statistical
    sampling methods of great importance to physics
    and computer science
   Applications of Monte Carlo Method
      Evaluating integrals of arbitrary functions of 6+
      Predicting future values of stocks

      Solving partial differential equations

      Sharpening satellite images

      Modeling cell populations

      Finding approximate solutions to NP-hard
An Interesting History
 • In 1738, Swiss physicist and mathematician Daniel Bernoulli published
 Hydrodynamica which laid the basis for the kinetic theory of gases: great
 numbers of molecules moving in all directions, that their impact on a
 surface causes the gas pressure that we feel, and that what we experience
 as heat is simply the kinetic energy of their motion.

 • In 1859, Scottish physicist James Clerk Maxwell formulated the
 distribution of molecular velocities, which gave the proportion of
 molecules having a certain velocity in a specific range. This was the
 first-ever statistical law in physics. Maxwell used a simple thought
 experiment: particles must move independent of any chosen coordinates,
 hence the only possible distribution of velocities must be normal in each

 • In 1864, Ludwig Boltzmann, a young student in Vienna, came across
 Maxwell’s paper and was so inspired by it that he spent much of his
 long, distinguished, and tortured life developing the subject further.
History of Monte Carlo Method
   Credit for inventing the Monte Carlo method is shared by Stanislaw Ulam,
    John von Neuman and Nicholas Metropolis.
   Ulam, a Polish born mathematician, worked for John von Neumann on the
    Manhattan Project. Ulam is known for designing the hydrogen bomb with
    Edward Teller in 1951. In a thought experiment he conceived of the MC
    method in 1946 while pondering the probabilities of winning a card game of
   Ulam, von Neuman, and Metropolis developed algorithms for computer
    implementations, as well as exploring means of transforming non-random
    problems into random forms that would facilitate their solution via statistical
    sampling. This work transformed statistical sampling from a mathematical
    curiosity to a formal methodology applicable to a wide variety of problems. It
    was Metropolis who named the new methodology after the casinos of Monte
    Carlo. Ulam and Metropolis published a paper called “The Monte Carlo
    Method” in Journal of the American Statistical Association, 44 (247), 335-
    341, in 1949.
    Solving Integration Problems via Statistical Sampling:
    Monte Carlo Approximation
   How to evaluate integral of f(x)?
       Integration Approximation
   Can approximate using another function g(x)
      Integration Approximation
   Can approximate by taking the average or expected value
       Integration Approximation
   Estimate the average by taking N samples
Monte Carlo Integration

    Im = Monte Carlo estimate
    N = number of samples
    x1, x2, …, xN are uniformly distributed random numbers between a
     and b
Monte Carlo Integration
       Monte Carlo Integration
   We have the definition of expected value and how to estimate it.

   Since the expected value can be expressed as an integral, the integral
    is also approximated by the sum.

   To simplify the integral, we can substitute g(x) = f(x)p(x).
   The variance describes how much the sampled values vary from
    each other.

   Variance proportional to 1/N
   Standard Deviation is just the square root of the variance
   Standard Deviation proportional to 1 / sqrt(N)

   Need 4X samples to halve the error
   Problem:
      Variance (noise) decreases slowly

      Using more samples only removes a small amount of noise
       Variance Reduction
   There are several ways to reduce the variance
      Importance Sampling

      Stratified Sampling

      Quasi-random Sampling

      Metropolis Random Mutations
       Importance Sampling
   Idea: use more samples in important regions of the function
   If function is high in small areas, use more samples there
Importance Sampling

   Want g/p to have low variance
   Choose a good function p similar to g:
       Stratified Sampling
   Partition S into smaller domains Si
   Evaluate integral as sum of integrals over Si
   Example: jittering for pixel sampling
   Often works much better than importance sampling in practice
Parallelism in Monte Carlo Methods

 Monte Carlo methods often amenable to
 Find an estimate about p times faster
 Reduce error of estimate by p1/2
Random versus Pseudo-random
   Virtually all computers have “random number”
   Their operation is deterministic
   Sequences are predictable
   More accurately called “pseudo-random number”
   In this chapter “random” is shorthand for “pseudo-
   “RNG” means “random number generator”
Properties of an Ideal RNG
   Uniformly distributed
   Uncorrelated
   Never cycles
   Satisfies any statistical test for randomness
   Reproducible
   Machine-independent
   Changing “seed” value changes sequence
   Easily split into independent subsequences
   Fast
   Limited memory requirements
No RNG Is Ideal
 Finite precision arithmetic  finite number
  of states  cycles
    Period = length of cycle

    If period > number of values needed,

     effectively acyclic
 Reproducible  correlations
 Often speed versus quality trade-offs
Linear Congruential RNGs
X i  (a  X i 1  c) mod M


                    Additive constant

Sequence depends on choice of seed, X0
Period of Linear Congruential RNG

 Maximum period is M
 For 32-bit integers maximum period is 232,
  or about 4 billion
 This is too small for modern computers
 Use a generator with at least 48 bits of
Producing Floating-Point Numbers

 Xi, a, c, and M are all integers
 Xis range in value from 0 to M-1
 To produce floating-point numbers in range
  [0, 1), divide Xi by M
Defects of Linear Congruential RNGs

 Least significant bits correlated
    Especially when M is a power of 2

 k-tuples of random numbers form a lattice
    Points tend to lie on hyperplanes

    Especially pronounced when k is large
Lagged Fibonacci RNGs
X i  X i  p  X i q

 p and q are lags, p > q
 * is any binary arithmetic operation
    Addition modulo M
    Subtraction modulo M
    Multiplication modulo M
    Bitwise exclusive or
Properties of Lagged Fibonacci RNGs

 Require p seed values
 Careful selection of seed values, p, and q
  can result in very long periods and good
 For example, suppose M has b bits
 Maximum period for additive lagged
  Fibonacci RNG is (2p -1)2b-1
Ideal Parallel RNGs
 All properties of sequential RNGs
 No correlations among numbers in different
 Scalability
 Locality
Parallel RNG Designs
 Manager-worker
 Leapfrog
 Sequence splitting
 Independent sequences
Manager-Worker Parallel RNG
 Manager process generates random
 Worker processes consume them
 If algorithm is synchronous, may achieve
  goal of consistency
 Not scalable
 Does not exhibit locality
Leapfrog Method

Process with rank 1 of 4 processes
Properties of Leapfrog Method
 Easy modify linear congruential RNG to
  support jumping by p
 Can allow parallel program to generate
  same tuples as sequential program
 Does not support dynamic creation of new
  random number streams
Sequence Splitting

Process with rank 1 of 4 processes
Properties of Sequence Splitting
 Forces each process to move ahead to its
  starting point
 Does not support goal of reproducibility
 May run into long-range correlation
 Can be modified to support dynamic
  creation of new sequences
Independent Sequences
 Run sequential RNG on each process
 Start each with different seed(s) or other
 Example: linear congruential RNGs with
  different additive constants
 Works well with lagged Fibonacci RNGs
 Supports goals of locality and scalability
    Statistical Simulation: Metropolis Algorithm
   Metropolis algorithm. [Metropolis, Rosenbluth, Rosenbluth, Teller,
    Teller 1953]
      Simulate behavior of a physical system according to principles of

        statistical mechanics.
      Globally biased toward "downhill" lower-energy steps, but

        occasionally makes "uphill" steps to break out of local minima.

   Gibbs-Boltzmann function. The probability of finding a physical
    system in a state with energy E is proportional to e -E / (kT), where T > 0
    is temperature and k is a constant.
      For any temperature T > 0, function is monotone decreasing

         function of energy E.
      System more likely to be in a lower energy state than higher one.

            T large: high and low energy states have roughly same

            T small: low energy states are much more probable
   Metropolis algorithm.
      Given a fixed temperature T, maintain current state S.

      Randomly perturb current state S to new state S'  N(S).

      If E(S')  E(S), update current state to S'
       Otherwise, update current state to S' with probability e - E / (kT),
       where E = E(S') - E(S) > 0.

   Convergence Theorem. Let fS(t) be fraction of first t steps in which
    simulation is in state S. Then, assuming some technical conditions,
    with probability 1:
                            lim f S (t)         eE(S ) /(kT) ,
                           t               Z

                           where Z          eE(S ) /(kT) .
                                         S  N (S )

   Intuition. Simulation spends roughly the right amount of time in each
    state, according to Gibbs-Boltzmann equation.
    Simulated Annealing
   Simulated annealing.
      T large  probability of accepting an uphill move is large.

      T small  uphill moves are almost never accepted.

      Idea: turn knob to control T.

      Cooling schedule: T = T(i) at iteration i.

   Physical analog.
      Take solid and raise it to high temperature, we do not expect it to

       maintain a nice crystal structure.
      Take a molten solid and freeze it very abruptly, we do not expect

       to get a perfect crystal either.
      Annealing: cool material gradually from high temperature,

       allowing it to reach equilibrium at succession of intermediate
       lower temperatures.
Other Distributions
 Analytical transformations
 Box-Muller Transformation
 Rejection method
Analytical Transformation
-probability density function f(x)
-cumulative distribution F(x)
In theory of probability, a quantile function of a distribution is the
inverse of its cumulative distribution function.
Exponential Distribution:
An exponential distribution arises naturally when modeling the time between independent
events that happen at a constant average rate and are memoryless. One of the few cases
where the quartile function is known analytically.

       F 1 (u)  m ln(1  u)
                                                           F ( x)  1  e  x / m

                                                                    1 x / m
                                                            f ( x)  e
Example 1:
 Produce four samples from an exponential
  distribution with mean 3
 Uniform sample: 0.540, 0.619, 0.452, 0.095
 Take natural log of each value and multiply
  by -3
 Exponential sample: 1.850, 1.440, 2.317,
Example 2:
   Simulation advances in time steps of 1 second
   Probability of an event happening is from an exponential
    distribution with mean 5 seconds
   What is probability that event will happen in next second?
   F(x=1/5) =1 - exp(-1/5)) = 0.181269247
   Use uniform random number to test for occurrence of
    event (if u < 0.181 then ‘event’ else ‘no event’)
Normal Distributions:
Box-Muller Transformation

 Cannot invert cumulative distribution
  function to produce formula yielding
  random numbers from normal (gaussian)
 Box-Muller transformation produces a pair
  of standard normal deviates g1 and g2 from
  a pair of normal deviates u1 and u2
Box-Muller Transformation
      v1  2u1 - 1       This is a consequence of
                         the fact that the chi-
      v2  2u2 - 1       square distribution with
      r  v 12 + v 22    two degrees of freedom
until r > 0 and r < 1    is an easily-generated
                         exponential random
f  sqrt (-2 ln r /r )   variable.
g1  f v1
g2  f v2
   Produce four samples from a normal
    distribution with mean 0 and standard
    deviation 1

    u1    u2      v1       v2       r        f      g1       g2
0.234    0.784   -0.532   0.568    0.605   1.290   -0.686   0.732

0.824    0.039   0.648    -0.921   1.269

0.430    0.176   -0.140   -0.648   0.439   1.935   -0.271   -1.254
Rejection Method
   Generate random variables from this
    probability density function
                 sin x,            if 0  x   / 4
f ( x) (4 x    8) /(8 2 ), if /4 x  2   / 4
                   0,                 otherwise
Example (cont.)
       1 /(2   / 4), if 0  x  2   / 4
h( x) 
             0,             otherwise
                  (2   / 4) /( 2 / 2)

  (2   / 4) /( 2 / 2)

            2 / 2, if 0  x  2   / 4
h ( x )  
            0,          otherwise
So  h(x)  f(x) for all x
Example (cont.)
xi      ui      uih(xi) f(xi) Outcome
0.860   0.975   0.689   0.681 Reject
1.518   0.357   0.252   0.448 Accept
0.357   0.920   0.650   0.349 Reject
1.306   0.272   0.192   0.523 Accept

Two samples from f(x) are 1.518 and 1.306
Case Studies (Topics Introduced)
 Temperature inside a 2-D plate (Random
 Two-dimensional Ising model
  (Metropolis algorithm)
 Room assignment problem (Simulated
 Parking garage (Monte Carlo time)
 Traffic circle (Simulating queues)
Temperature Inside a 2-D Plate

                       Random walk
Example of Random Walk
0  u  1  4u   {0,1,2,3}

  NP-Hard Assignment Problems

TSP:Find a tour of US cities that minimizes distance.
Physical Annealing
 Heat a solid until it melts
 Cool slowly to allow material to reach state
  of minimum energy
 Produces strong, defect-free crystal with
  regular structure
Simulated Annealing
 Makes analogy between physical annealing
  and solving combinatorial optimization
 Solution to problem = state of material
 Value of objective function = energy
  associated with state
 Optimal solution = minimum energy state
How Simulated Annealing Works
 Iterative algorithm, slowly lower T
 Randomly change solution to create
  alternate solution
 Compute , the change in value of objective
 If  < 0, then jump to alternate solution
 Otherwise, jump to alternate solution with
  probability e-/T
Performance of Simulated
   Rate of convergence depends on initial value of T
    and temperature change function
   Geometric temperature change functions typical;
    e.g., Ti+1 = 0.999 Ti
   Not guaranteed to find optimal solution
   Same algorithm using different random number
    streams can converge on different solutions
   Opportunity for parallelism

              Starting with higher
              initial temperature
              leads to more iterations
              before convergence
Parking Garage
 Parking garage has S stalls
 Car arrivals fit Poisson distribution with
  mean A: Exponentially distributed inter-
  arrival times
 Stay in garage fits a normal distribution
  with mean M and standard deviation M/S
Implementation Idea
     Times Spaces Are Available

      101.2      142.1     70.3      91.7   223.1

  Current Time           Car Count          Cars Rejected

     64.2                    15                  2
Summary (1/3)
 Applications of Monte Carlo methods
   Numerical integration

   Simulation

 Random number generators
   Linear congruential

   Lagged Fibonacci
Summary (2/3)
   Parallel random number generators
      Manager/worker

      Leapfrog

      Sequence splitting

      Independent sequences

   Non-uniform distributions
      Analytical transformations

      Box-Muller transformation

      (Rejection method)
Summary (3/3)
   Concepts revealed in case studies
     Monte Carlo time

     Random walk

     Metropolis algorithm

     Simulated annealing

     Modeling queues

To top