Stochastic Programming on a Grid

Document Sample
Stochastic Programming on a Grid Powered By Docstoc
					                          Stochastic Programming on a Grid

                                     Jeff Linderoth, Stephen Wright

                                         University of Wisconsin-Madison


                                         ICCOPT II, August 2007




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   1 / 50
   1    Stochastic Programming
          Introduction
          Formulation and Basic Algorithms: Two stages

   2    Grid Computing Tools

   3    Two-Stage Problems
         Parallel Algorithms
         Computational Results: Performance
         Computational Results: Solution Quality

   4    Multistage Problems
         Formulations
         Algorithms
         Implementation Challenges
         Computational Results


Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   2 / 50
   In this talk we show how a computation-intensive optimization problem
   can be solved by “putting it all together:”
           Good algorithms, matched to the platform
           Raw power of Grids
           Programmability of the MW toolkit.




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   3 / 50
  Outline



           Stochastic Programming (SP)
           Formulation and Basic Algorithms for Two-Stage SP
           Using Condor and MW
           Asynchronous Trust-Region Algorithm
           Computational Results: Algorithm Performance
           Multistage SP: Formulations and Algorithms
           Multistage SP: Computational Results
   Collaborators: Alex Shapiro (Georgia Tech), Jierui Shen (Lehigh).




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   4 / 50
  Stochastic Programming


   Optimization of a model with uncertainty.
   Often formulated mathematically as

                                             def
                            min f (x) = Eξ g (x; ξ) =                      g (x; ξ)p(ξ)dξ,
                              x                                        Ω

   (p is probability density function) subject to constraints on x ∈ R n .
   Arises in planning-under-uncertainty applications, where each ξ represents
   a possible scenario (a possible way in which the model could evolve).
   Space Ω can contain finite or infinite scenarios.
   g (x; ξ) could be the value function of some second level optimization
   problem parametrized by x. (Recourse.)



Jeff Linderoth, Stephen Wright (UW-Madison)     Stochastic Programming on a Grid      ICCOPT II, August 2007   5 / 50
  Example: Network Planning
   Adding capacity on a telecommunications network for private-line services.
   (Sen et al., 1994.)


                               Shows nodes and links, Node pair A-B, and a route
                               between A and B.


                                                                                        B


                    A




   Add capacity to some links, to attempt to meet (uncertain) demand for
   traffic between nodes.
Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   6 / 50
   Sample demand profile: Third node pair: Demand 0 (prob .855), Demand
   5.39 (prob .095), Demand 75.1 (prob .05).

           Data:
               - network topology: n = 89 links.
               - point-to-point pairs: i = 1, 2, . . . , 86.
               - demands di for each pair i are random and independent, with 3 to 7
                 possible scenarios. Total about 1070 scenarios!
           Decision variables: xj , j = 1, 2, . . . , n: amount of capacity to add on
           link j. Total new capacity bounded by B.
           Objective: minimize the expected amount of unmet demand, summed
           over the m point-to-point pairs.
   We can’t hope to solve the problem by accounting for all the 1070 possible
   scenarios exhaustively — it’s much too large.
   Practical approach: Use sampling to select a subset of N scenarios,
   randomly. The sample average approximation (SAA) is large, but
   manageable.

Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   7 / 50
  2-stage stochastic LP with recourse


                 min Q(x) = c T x + EP Q(x; ω) subj. to Ax = b, x ≥ 0,
                   x

   where P is a probability measure on the space (Ω, F), and

                                  Q(x; ω) = min q(ω)T y subject to
                                                     y

                                        Wy = h(ω) − T (ω)x, y ≥ 0.

   x = first-stage vars, y = second-stage vars.
   Sampled approximation: Sample N points ωj , j = 1, 2, . . . , N from P, and
   solve
                                                                         N
                              minx Q(x) = c T x + N −1                   j=1 Q(x; ωj )
                                         subj. to Ax = b, x ≥ 0.


Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid     ICCOPT II, August 2007   8 / 50
   Each Q(x; ωj ) is convex, piecewise-linear in x.

                    Q(x)




                                                                                subgradients




                                                                         x

   Compute subgradients of Q by
      finding dual solutions πj of the second-stage LP’s for j = 1, 2, . . . , N
      (concurrently!)
                                    Q(x; ωj ) : minyj q(ωj )T yj , subj. to
                                       Wyj = h(ωj ) − T (ωj )x, yj ≥ 0;
                                             N
           summing: c − N −1                            T
                                             j=1 T (ωj ) πj .
Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid    ICCOPT II, August 2007   9 / 50
  “bundle” methods
   Build up a lower bounding, piecewise linear approximation to Q(x), based
   on function values Q(x ℓ ) and subgradients g ℓ at iterates x ℓ .
   Model function Mk (x) after k iterates is

                          Mk (x) =            sup        Q(x ℓ ) + (g ℓ )T (x − x ℓ ) .
                                         ℓ=0,1,...,k

   Choose next iterate as

                       x k+1 = arg min Mk (x), subj. to Ax = b, x ≥ 0.

   which can be formulated as:

                                              minx,θ θ, subject to
                                                Ax = b, x ≥ 0,
                          θ≥     Q(x ℓ )     + (g ℓ )T (x − x ℓ ), ℓ = 0, 1, . . . , k.

   (Each constraint is called a cut.)
Jeff Linderoth, Stephen Wright (UW-Madison)    Stochastic Programming on a Grid   ICCOPT II, August 2007   10 / 50
   Example: After first two iterations 0, 1:
                                   Q(x)




                                                                                     M1 (x)




                                             x0                              x1
                                                                                     x




   x 2 is the minimizer of M1 ; add new subgradient to obtain M2 ; take
   minimizer to obtain x 3 :
                                   Q(x)




                                                                                     M2 (x)




                                                         x2           x3
                                                                                     x




Jeff Linderoth, Stephen Wright (UW-Madison)        Stochastic Programming on a Grid        ICCOPT II, August 2007   11 / 50
  enhancements



           trust-region (allows steady progress, exploits good starting point);
           algorithm that allows deletion of old cuts;
           group the second-stage problems Q(x; ωj ) into T “chunks” Nt ,
           t = 1, 2, . . . , T , with

                                         {1, 2, . . . , N} = ∪t=1,2,...,T Nt .

           and assign each Nt to a worker processor;
           multiple cuts at each x (each chunk can return its own subgradients);
           asynchronous variant is preferred for our target parallel platform.




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   12 / 50
  trust-region (TR)
   Choose next iterate as

                                    x k+ = arg min Mk (x), subj. to
                                 Ax = b, x ≥ 0,               x − xk       ∞    ≤ ∆k ,

   where ∆k is the trust-region radius.
           Trivial to modify the LP subproblem: just add the bounds

                                             −∆k e ≤ x − x k ≤ ∆k e.

           If candidate point x k+ is “significantly better” (achieves some
           fraction of the decrease predicted by the model) then set
           x k+1 ← x k+ . Possibly delete cuts, increase the trust region.
           Otherwise, set x k+1 ← x k and add subgradient information from x k+
           to improve the model. Possibly delete uninteresting cuts, decrease
           trust region.
Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid       ICCOPT II, August 2007   13 / 50
  TR properties




   Denoting the solution set by S,
           Can delete “irrelevant” cuts liberally, between major iterations;
           dist(x k , S) → 0;
   The algorithm may still be too synchronous: requires complete evaluation
   of Q(x) at a candidate iterate x before proceeding.




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   14 / 50
  Condor



   We’ve heard enough about Condor by now. But to recap:
           Condor pools consist of user workstations, nodes from multiprocessor
           systems and clusters.
           Handles scheduling, matching of user requirements to machine
           characteristics.
           Checkpointing and migration.
           Flocking and Glide-in mechanisms allow jobs to execute across
           multiple pools.




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   15 / 50
  a challenging environment...


   The Condor environment is powerful and inexpensive, but challenging to
   algorithm designers and implementers.
           dynamic/opportunistic: size and composition of worker pool changes
           unpredictably during computation
           heterogeneous: many types of machines, various operating systems,
           different licenses on different machines.
           latency unpredictable, generally slow: workers can be next to each
           other in a rack, or separated by 6000 miles.
   Problems that are large and compute-intensive — and algorithms that are
   asynchronous — are best suited to this platform.




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   16 / 50
  TR may not be asynchronous enough!


   The TR approach still synchronizes on the function evaluation at each
   candidate point x k+ .
   If there are T chunks of second-stage scenarios, can’t use more than T
   processors. For many problems of interest, we cannot make T very large
   (10 – 100) without making the work-per-chunk too small and creating too
   much contention at the master.
   May wait for a long time for the last chunk to be evaluated, if its host is
   suspended or disappears.
   An asynchronous trust-region (ATR) algorithm increases parallelism and
   throughput.




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   17 / 50
  ATR



           Maintain an incumbent x I : the best point found so far (smallest
           value of Q).
           Maintain a basket B of 3 to 20 other x points — possible new
           incumbents — for which the second-stage LPs are currently being
           solved.
           When space becomes available in B, generate a new candidate point
           by solving a TR subproblem around the current incumbent:
            x − x I ∞ ≤ ∆. (x I becomes the parent incumbent of the new
           point.)




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   18 / 50
  ATR (continued)



           When evaluation of a point x ∈ B is completed, accept it as the new
           incumbent if
                  it is better than the current incumbent x I ;
                  Q(x) gives a significant decrease over its parent incumbent.
           Populate B initially by solving TR subproblems around early
           incumbents, using partial subgradient information. (Synchronicity
           parameter σ.)
           Strategies for cut deletion and adjustment of trust region are adapted
           from the strategies for the synchronous TR algorithm.




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   19 / 50
  SSN: computational results




           first stage: 89 variables, 1 constraint;
           second stage: 706 variables, 175 constraints, 2284 nonzeros.
   Study the effect of asynchronicity, parallelism on large sampled instances.
   We report results for N = 104 and N = 105 scenarios, with synch
   parameter σ = .7.




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   20 / 50
                                                          ck
                                                  ll i
                                               pa cs

                                               wa ffic
                                                 . r
                                                      i te




                                                     clo
                                                        o
                                       s



                                                    pr
                                     nk




                                                     e
                                                   s/



                                                 r.
                                    u
                       r
         n




                                                t
                       |
                   i te


                                 ch

                                             cu

                                                    av
                   |B
      ru
    ATR 186 - 25           50 19                         .94 51
    ATR 148 - 25 100 15                                  .89 55
    ATR 144 - 50 100 18                                  .82 47
    ATR      79 - 50 200 18                              .70 31
    ATR 104 3 25           50   9                        .88 61
    ATR      69 3 25 100        8                        .93 47
    ATR      67 3 50 100        9                        .86 43
    ATR      61 3 50 200        6                        .90 54
    ATR 245 6 25           50 14                         .93 91
    ATR 197 6 25 100 12                                  .87 97
    ATR 164 6 50 100 13                                  .81 81
    ATR 135 6 50 200 12                                  .71 80
      SSN, N = 10, 000. 1.75M ×                        7.06M.
   (Results obtained 8/7/2007.)



Jeff Linderoth, Stephen Wright (UW-Madison)    Stochastic Programming on a Grid   ICCOPT II, August 2007   21 / 50
                                                                                )
                                                                  in
                                                      clo cy
                                                                (m
                                                               n
                                                            cie

                                                           ck
                                               pa cs
                                                 . r


                                                      effi
                                                       i te

                                                         o
                                        s



                                                     pr
                                      nk

                                                   s/




                                                  ll
                                                 r.
                                      u
                       r
         n




                                                t
                       |




                                               wa
                   i te



                                   ch

                                             cu

                                                    av
                   |B
      ru




     ATR 107 - 100 100 75                                 .17 427
     ATR   84 - 100 200 73                                .22 275
     ATR 123 3 100 100 33                                 .93 199
     ATR 108 3 100 200 32                                 .77 216
       SSN, N = 100, 000. 17.5M ×                        70.6M.
   (Results obtained 8/8/2007.)




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   22 / 50
  storm: computational results


                                                     n
   Cargo flight scheduling problem (Mulvey and Ruszczy`ski).
           first stage: 121 variables;
           second stage: 1259 variables.
   For a 250000 scenario sampled approx, LP has size

                                      132, 000, 185 × 314, 750, 121

   Started from a solution for a 3000-scenario approximation, whose quality is
   very good. TR takes a single step and terminates, ATR doesn’t take any
   steps, just verifies quality of starting point.
   (For a chunk of 2000 scenarios, task size is about 150 seconds.)



Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   23 / 50
                                                                                 )
                                                                   in
                                                       clo cy
                                                                 (m
                                                                n
                                                             cie

                                                            ck
                                               pa cs
                                                            r



                                                       effi
                                                        i te

                                                          o
                                       s



                                                      pr
                                     nk

                                                    s/




                                                   ll
                                                  r.
                                    u
                     r
         n




                                                t
                     |




                                                  .



                                               wa
                 i te



                                 ch

                                             cu

                                               av
                 |B
      ru




      TR 17 - 125 125 106 .55 146
     ATR 25 3 125 125 106 .90 116
        storm, N = 250, 000. 132M × 315M.




Jeff Linderoth, Stephen Wright (UW-Madison)    Stochastic Programming on a Grid   ICCOPT II, August 2007   24 / 50
  storm with 107 scenarios
   LP has size approximately 5.5 × 109 rows and 1.3 × 1010 columns.
   Used machines at Wisconsin, NCSA (Illinois), New Mexico, Argonne, Italy,
   Columbia. 800 machines requested, 556 actually used during the run
   (average of 433 at any one time). Performed in 2001.

                                   600


                                   500


                                   400
                        #workers




                                   300


                                   200


                                   100


                                    0
                                         0   20000     40000    60000          80000   100000   120000   140000
                                                                        Sec.
Jeff Linderoth, Stephen Wright (UW-Madison)           Stochastic Programming on a Grid              ICCOPT II, August 2007   25 / 50
                                                                           )
                                                                        rs
                                                                        y
                                                                    nc

                                                                     (h
                                                                 cie

                                                                ck
                                                   pa cs
                                                                r



                                                           effi
                                                            i te




                                                           clo
                                                              o
                                          s




                                                          pr
                                        nk

                                                        s/




                                                       ll
                                                      r.
                                        u
                       r
           n




                                                   t
                       |




                                                      .



                                                   wa
                   i te



                                     ch


                                                cu

                                                   av
                   |B
        ru



      ATR 39 4 1024 1024 433 .67 31.9
          storm, N = 107 . Columns: 1.3 × 1010 .
     Solved 4 × 108 second-stage linear programs during the run

     (3472 per second). Average task 774 seconds.
     Total computation time 9014 hours (more than one year).




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   26 / 50
  solution quality



           Can we get useful estimates for the optimal objective values of the
           true problem from the sampled problem?
           Can we get confidence intervals on these estimates?
           How do the solutions of the sampled approximation relate to those of
           the real problem?
   Using ATR, along with relevant theory (some recent), we have performed
   computational and statistical studies of these issues for some difficult
   problems from the literature.




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   27 / 50
  lower bound for Z ∗
   It’s well known that
                                                    EZN ≤ Z ∗ .
   This is true for any unbiased estimator. In particular can use certain
   variance reduction techniques (e.g. Latin hypercube) to select the sample
   {ω1 , ω2 , . . . , ωN }.
   Generate M batches — each a sampled approximation of size N of the
           (i ) (i )        (i )
   form {ω1 , ω2 , . . . , ωN }, i = 1, 2, . . . , M — and solve the M SAAs to
   obtain optimal values
                                   (1)   (2)         (M)
                                 ZN , ZN , . . . , ZN .
   Then estimate EZN by
                                                                M
                                                                        (i )
                                             LM = M −1                ZN .
                                                               i =1

   Use sample variance, central limit theorem to get a confidence interval.
Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   28 / 50
  upper bound for Z ∗
                            ˆ
   Given any feasible point x , we have
                                                    Q(ˆ ) ≥ Q(x ∗ ).
                                                      x
   Choose an x that appears to be nearly optimal, e.g. min of some QN .
             ˆ
                                         ¯
   Choose T i.i.d. samples, each of size N (using MC or LH):
                                    (i )     (i )          (i )
                                {¯ 1 , ω2 , . . . , ωN }, i = 1, 2, . . . , T ,
                                 ω ¯                ¯¯
   Defining
                                                                         ¯
                                                                         N
                                 (i )                                                 (i )
                                QN (ˆ )
                                 ¯ x              ˆ ¯
                                              = c x + N −1
                                                    T
                                                                               Q(ˆ ; ωj ).
                                                                                 x ¯
                                                                         j=1
   we get an unbiased estimator:
                                                                  T
                                                                             (i )
                                           UN,T = T −1
                                            ¯                            QN (ˆ).
                                                                          ¯ x
                                                                  i =1

   Again, use sample variance to get confidence interval.
Jeff Linderoth, Stephen Wright (UW-Madison)     Stochastic Programming on a Grid          ICCOPT II, August 2007   29 / 50
  results on bounds: SSN



   Results of Mak, Morton, Wood (1999).
                                   lower                 upper
      batch/sample               30 × 1000            1 × 100000
        estimate                    9.22                  9.98
     95% confidence                 ±0.21                ±0.11
                                                                          ˆ
   Using different techniques, Mak et al, generate an approximate solution x
   using N = 2000, and obtain
           upper bound 10.06 ± 0.12 (95% confidence interval);
           with 95% likelihood, the optimal Z ∗ is within 0.77 of this value.




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   30 / 50
  solution estimates for SSN (95% confidence intervals)
   Monte Carlo:
                                   N              Lower                   Upper
                                   50        4.11 ± 1.23            12.88 ± 0.12
                                  100        7.66 ± 1.31            11.31 ± 0.12
                                  500        8.54 ± 0.34            10.42 ± 0.12
                                  1000       9.31 ± 0.23            10.20 ± 0.06
                                  5000       9.98 ± 0.21            10.01 ± 0.09

   Latin Hypercube:

                                   N               Lower                  Upper
                                  50         10.10 ± 0.81           11.39 ± 0.02
                                  100         8.90 ± 0.36           10.52 ± 0.03
                                  500         9.87 ± 0.22           10.05 ± 0.02
                                 1000         9.83 ± 0.29            9.97 ± 0.03
                                 5000         9.84 ± 0.10            9.90 ± 0.03

Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   31 / 50
                                             SSN Monte Carlo
             14
                                                                    Lower Bound
                                                                    Upper Bound
             12


             10
     Value




             8


             6


             4


                  10                100                        1000                 10000
                                                   N




Jeff Linderoth, Stephen Wright (UW-Madison)    Stochastic Programming on a Grid    ICCOPT II, August 2007   32 / 50
                                          SSN Latin Hypercube
             14
                                                                   Lower Bound
                                                                   Upper Bound
             12


             10
     Value




             8


             6


             4


                  10                100                       1000                 10000
                                                  N




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid    ICCOPT II, August 2007   33 / 50
  Multistage Decision Making


                                                                            Random vectors
        ξ1          ξ2          ξ3                   ξT                     ξ1 ∈ Rn1 , ξ2 ∈
                                                                            Rn2 , . . . , ξT ∈ RnT
        x1          x2                 xT −1         xT
                                                                            Make sequence of
                                                                            decisions x1 ∈ X1 , x2 ∈
                                                                            X 2 , . . . , xT ∈ X T .
           Risk Neutral: We always aim to optimize the expected value of our
           current decision xt
           Linear: Assume Xt are polyhedra
           Discrete: Assume ξt are drawn from a discrete distribution.



Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid       ICCOPT II, August 2007   34 / 50
  Scenario Tree




                                                             N: Set of nodes in the tree
                                                             ρ(n): Unique predecessor of node n
                                                             in the tree
                                                             S(n): Set of successor nodes of n
             x0                                              qn : Probability that the sequence
                    ˆ
                    ξ1
                                                             of events leading to node n occurs
                           ˆ
                           ξ2
                                  xn                         xn : Decision taken at node n
                     xρ(n)




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   35 / 50
  Multistage Stochastic Programming


   Deterministic Equivalent

                                               T
               zSP = min                   qn cn xn      Tn xρ(n) + Wn xn = hn ∀n ∈ N
                                     n∈N


   Value Function of node n
                                                                                                   
                                                                                                   
                          def             T
            Qn (xρ(n) ) = min            cn xn +         qmn Qm (xn ) | Wn xn = hn − Tn xρ(n)
                                                         ˆ
                                xn                                                                 
                                                m∈S(n)


           ˆ
           qmn : conditional probability of node n given node m




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   36 / 50
  Nested Decomposition


           0: Root node of the scenario tree
           x0 : Initial state of the system

   Recursive Formulation
                                               zSP = Q0 (x0 )
                                       def
           Cost to go: Gn (x) =                 m∈S(n)     qmn Qm (x)
                                                           ˆ
            k
           Mn (x): Lower bound on Gn (x) in iteration k

                               T       k
            Qn (xρ(n) ) ≥ min cn xn + Mn (xn )                 Wn xn = hn − Tn xρ(n)              ((MLPn ))
                               xn




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   37 / 50
  Action Pictures




            x0
Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   38 / 50
  MWImplementation



           MWTask—Work
                  Collection of nodes (going the same direction) from the same stage
                  The xρ(n) from these nodes
           MWTask—Result
                  (Forward): xn
                  (Backwards): Cut(s) for Gρ(n)
           act on completed task() is responsible for updating node state
           and deciding which nodes to evaluate next




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   39 / 50
  Synchronicity is Bad!




                                  All    processors
                                  waiting for this
                                  node to finish!


Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   40 / 50
  MW Implementation—Asynchronous
   Don’t wait for all children (Sn ) to report before starting an evaluation of
    k
   Mn .
   Management of cuts is important!
           We may require lots of memory to store the cuts
                  Ex.: 27,000 nodes in period T -1, each node contains 20 cuts,
                  xn ∈ R100 ⇒ ≥ 400MB just to store cuts
           Grid: Since we don’t have guarantees about worker processors, we
           cannot store cuts on the workers.
           Grid: All cuts (must) be stored on the master processor
                  Leads to memory overload of master
                  Leads to increased “service time” of the master for worker requests.
                  (contention)
           Grid: ⇒ We must do what we can to compress and reduce the
           number of cuts
                  Don’t record duplicates
                  Aggregate nodes
Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   41 / 50
  Cut Management—Aggregation




                                                                  Form the deterministic
                                                                  equivalent of a group of nodes,
                                                                  and treat this as one larger
                                                                  “supernode”
                                                                  Node subproblems get larger
                                                                  Fewer cuts




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   42 / 50
  MWAND




           MW Asychronous Nested Decomposition
                  Our “Monster Solver”
                  Magic WAND? :-)
           Uses the COIN Osi Interface to build MLPn
           Uses the COIN Clp (simplex) solver to solve MLPn
           Does not use the COIN-Smi to manipulate stochastic program




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   43 / 50
  sutil: A Stochastic Programming Utility Library

                                                                          Czyzyk, Linderoth, and Shen
           Reads SMPS files
           Creates (implicity) sampled scenario trees
           Creates deterministic equivalents
           Aggregates nodes
           Passes stochastic information between processors



           sutil is available at COR@L
                  Computational Optimization Reseach @ Lehigh:
                  http://coral.ie.lehigh.edu
                  http://coral.ie.lehigh.edu/sutil/



Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid    ICCOPT II, August 2007   44 / 50
  Computational Environment—Multistage



           Right now, we are using a “baby grid”
           All Linux machines
           Helpful Hint: Don’t unleash your code onto a big grid unless you are
           reasonable sure it is working well.

                Location                               Processor                          Number
                Wisconsin               Vary in speed by factor of roughly 20               785
                 NCSA                            Intel Xeon 3.2 GHz                        1280
             Argonne/U of C                      Intel Xeon 2.4 GHz                         288




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   45 / 50
  A    small   Multistage SSN



                      A                                           Set of stages T
                                                                  Set J of links
                                                                  Sets It of demands
        B              C             D                            Random demand dt (ξ) ∈ R|It |
                                                                  Budget each period
                                                                  Install capacity on links each
                       E             F                            period to minimize the total
                                                                  expected unserved demand




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   46 / 50
  Some         (Limited)   Computational Results




             T =5
                    Aggregate last three
                    periods together
                    Clustering/Tasking: Set by                     K        N         DE Size
                    hand for each instance
                                                                   30     0.81M     18M * 31M
             α1 = 0.8, α2 = 0.1                                    50     6.25M    140M * 236M
             K : Realizations/Period                               60     12.9M    290M * 488M

             N: Number of scenarios
             DE: Size of deterministic
             equivalent




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid     ICCOPT II, August 2007   47 / 50
  Computational Results



           It: Number of iterations (Times MLP0 was solved)
           E: Parallel utilization
                                             Time machines solving MLPn
                                               Time machines available

                   K      It      Avg Workers           Wall Time           CPU Time         E
                   30     9            62                2:34:21            6:15:15:10      67%
                   50     7            75               1:12:49:27          85:20:24:15     77%
                   60     11          162               3:16:51:00         431:12:15:37     73%




Jeff Linderoth, Stephen Wright (UW-Madison)    Stochastic Programming on a Grid    ICCOPT II, August 2007   48 / 50
  Workers in Solving ssn5-60




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   49 / 50
  Conclusions


   Grid computing has made it possible to solve many interesting instances of
   stochastic optimization that were intractable for uniprocessor computing.
   Has allowed computational experimentation on issues that were previously
   investigated theoretically, e.g.
           quality of solutions,
           sharpness of solutions,
           convergence of sampled solution to true solution as sample size grows.

   Need to pull together good algorithms, smart implementations, MW
   infrastructure, Condor grids.




Jeff Linderoth, Stephen Wright (UW-Madison)   Stochastic Programming on a Grid   ICCOPT II, August 2007   50 / 50