Docstoc

Approximation Algorithms for Stochastic Optimization

Document Sample
Approximation Algorithms for Stochastic Optimization Powered By Docstoc
					Risk-averse Stochastic Optimization:
        Models + Algorithms


         Chaitanya Swamy
       University of Waterloo
 Risk-averse Stochastic Optimization:
Probabilistically-constrained models +
       Algorithms for Black-box
             Distributions

          Chaitanya Swamy
        University of Waterloo
    Two-Stage Recourse Model
Given : Probability distribution over inputs.
Stage I : Make some advance decisions – plan ahead
           or hedge against uncertainty.
Observe the actual input scenario.
Stage II : Take recourse. Can augment earlier
           solution paying a recourse cost.

Choose stage I decisions to minimize
  (stage I cost) + (expected stage II recourse cost).
2-Stage Stochastic Facility Location
                               Distribution over clients gives
                               the set of clients to serve.

                               Stage I: Open some facilities in
                               advance; pay cost fi for facility i.
 facility   stage I facility   Stage I cost = ∑(i opened) fi .
 client set D
2-Stage Stochastic Facility Location
                                    Distribution over clients gives
                                    the set of clients to serve.

                                    Stage I: Open some facilities in
                                    advance; pay cost fi for facility i.
   facility   stage I facility      Stage I cost = ∑(i opened) fi .
              stage II facility
Actual scenario A = {       clients to serve}, materializes.
Stage II: Can open more facilities to serve clients in A; pay cost
fiA to open facility i. Assign clients in A to facilities.
Stage II cost = ∑ i opened in fiA + (cost of serving clients in A).
                   scenario A
Want to decide which facilities to open in stage I.
Goal: Minimize Total Cost =
        (stage I cost) + EA D [stage II cost for A].


 How is the probability distribution specified?
 • A short (polynomial) list of possible scenarios
 • Independent probabilities that each client exists
 • A black box that can be sampled. black-box setting
Risk-averse stochastic optimization
• E[.] measure does not adequately model the
  “risk” associated with stage-I decisions
• Same E[.] value  same “risk involved”: given
  two solutions with same E[.] cost, prefer solution
  with more “assured” or “reliable” second-stage
  component (costs). E.g. portfolio investment
• Want to capture above notion of risk-averseness,
  where one seeks to avoid disaster scenarios
 Modeling risk-aversion: attempt 1
  Choose stage I decisions to minimize        Budget model
     (stage I cost) + (expected stage II recourse cost)
  subject to
  (stage II cost of scenario A) ≤ B for every scenario A
  Gupta-Ravi-Sinha: considered stochastic Steiner tree in this
  budget model in the polynomial-scenario setting
Budget model provides greatest degree of risk-aversion
 Modeling risk-aversion: attempt 1
  Choose stage I decisions to minimize        Budget model
     (stage I cost) + (expected stage II recourse cost)
  subject to
  (stage II cost of scenario A) ≤ B for every scenario A
  Gupta-Ravi-Sinha: considered stochastic Steiner tree in this
  budget model in the polynomial-scenario setting
Budget model provides greatest degree of risk-aversion, BUT
– limited modeling power: cannot get any approximation
  guarantees in black-box setting with bounded sample size
– overly conservative: protects every scenario regardless of its
  probability
           Closely-related model
Choose stage I decisions to minimize        Robust model
  (stage I cost) + (maximum stage II recourse cost)
• Dhamdhere et al. considered this model, again in the
    polynomial-scenario setting
•   “Guessing” B = max. (stage II cost) “reduces” robust-
    problem to the budget problem
•   Modeling issues: not clear how to even specify
    exponentially many scenarios
     – Feige et al.: scenarios specified by cardinality constraint;
       seems rather stylized for stochastic optimization
     – Will consider distribution-based robust model:
       scenario-collection = support of distribution
• Same drawbacks as in the budget model – no guarantees
    possible in black-box setting
Modeling risk-aversion: attempt 2
                                              recall,
                                           budget model
Choose stage I decisions to minimize
   (stage I cost) + (expected stage II recourse cost)
subject to
(stage II cost of scenario A) ≤ B for every scenario A
• For the budget-model, one can prove approximation
 results if one is allowed to violate the budget-
 constraints with a small probability
• Can turn the above solution concept around and
 incorporate it into the model to arrive at the
 following new model
Modeling risk-aversion: attempt 2
                                               Risk-averse
 Choose stage I decisions to minimize         budget model
    (stage I cost) + (expected stage II recourse cost)
 subject to
    PrA[(stage II cost of scenario A) > B] ≤ r
 r: input – can tradeoff risk-averseness vs. conservatism
• Called probabilistically- or chance- constrained program
• Chance constraint called Value-at-Risk (VaR) constraint in
  finance literature: popular for risk-optimization in finance
• Related robust model: minimize (stage I cost) +
       (1-r)-quantile of (stage II recourse cost)
     Approximation Algorithm
Hard to solve the problem exactly.
Even special cases are #P-hard.
Settle for approximate solutions. Give polytime
algorithm that always finds near-optimal solutions.
A is a a-approximation algorithm if,
•A runs in polynomial time.
•A(I) ≤ a.OPT(I) on all instances I,
a is called the approximation ratio of A.
                       Our Results
• Obtain approximation algorithms for various risk-averse
 budgeted (and robust) problems in the black-box setting:
 facility location, set cover, vertex cover, multicut on trees, min cut
• Give a fully polynomial approximation scheme for solving the
 LP-relaxations of a large class of risk-averse problems  can
 use existing algorithms for deterministic or 2-stage version of
 problem to get approximation algorithm for risk-averse problem
• First approximation results for chance-constrained programs
 and black-box distributions (Kleinberg-Rabani-Tardos consider
 chance-constrained versions of bin-packing, knapsack but for
 specialized product distributions)
                  Related Work
• Gupta et al.: gave a const.-approx. for stochastic Steiner
  tree in the poly-scenario budget model
• Dhamdhere et al., Feige et al.: gave approx. algorithms for
  various problems in robust model with poly-scenarios,
  cardinality-collections
• So-Zhang-Ye: consider another risk measure called
  conditional VaR; give an approx. scheme for solving the
  LP-relaxations of problems in black-box setting
   – Can use our techniques to solve a generalization of their
     model, where one has probabilistic budget constraints
• Lots of work in the standard 2-stage model: Dye et al., Ravi-
  Sinha, Immorlica et al., Gupta et al.+, Shmoys-S, S-Shmoys ……
  Risk-averse Set Cover (RASC)
Universe U = {e1, …, en }, subsets S1, S2, …, Sm  U, set S has
weight wS.
Deterministic problem (DSC): Pick a minimum weight
collection of sets that covers each element.
Risk-averse budgeted version: Target set of elements to be
covered is given by a probability distribution.
   – choose some sets initially paying wS for set S
   – subset A  U to be covered is revealed
   – can pick additional sets paying wSA for set S.
Minimize (w-cost of sets picked in stage I) +
    EA U [wA-cost of new sets picked for scenario A].
subject to PrA U [wA-cost for scenario A > B] ≤ r
Fractional risk-averse set cover
Fractional risk-averse problem: can buy sets
fractionally in stage I and in each scenario A to cover
the elements in A to an extent of 1

Not clear how to solve even the fractional problem
in the polynomial-scenario setting.
Why? The set of feasible solutions
     {(x, {yA}A) : (x, yA) covers A for each scenario A,
                   PrA[∑S wAS yA,S > B] ≤ r}
is NOT a convex set.
How to get an LP-relaxation?
        An LP for fractional RASC
For simplicity, consider wSA = WS for every scenario A.
xS : indicates if set S is picked in stage I
rA : indicates if budget-constraint is NOT met for A
{yA,S} : decisions in scenario A when budget-constraint is met for A
{zA,S}: decisions in scenario A when budget-constraint is not met for A

Minimize ∑S wSxS + ∑AU pA ∑S WS(yA,S + zA,S)
subject to,          ∑A pArA           ≤r
                     ∑S WS yA,S        ≤B         for each A
       ∑S:eS xS + ∑S:eS yA,S + rA    ≥1         for each A, eA
       ∑S:eS xS + ∑S:eS (yA,S + zA,S) ≥ 1       for each A, eA
                      xS, yA,S, zA,S    ≥0        for each S, A.
Minimize ∑S wSxS + ∑AU pA ∑S WS(yA,S + zA,S)
subject to,          ∑A pArA          ≤r       Coupling constraint
                     ∑S WS yA,S       ≤B     for each A
       ∑S:eS xS + ∑S:eS yA,S + rA   ≥1     for each A, eA
      ∑S:eS xS + ∑S:eS (yA,S + zA,S) ≥ 1   for each A, eA
                     xS, yA,S, zA,S    ≥0    for each S, A.
• Exponential number of variables and exponential
  number of constraints.
• The scenarios are no longer separable: i.e., a first-stage
  solution x alone is not enough to specify LP solution:
  need to specify the rAs – what does solving LP mean?
   – Contrast wrt. standard 2-stage model, or fractional risk-
     averse problem
Theorem 1: For any e,k>0, in time poly(1/ekr), can compute a first-
stage soln. x that extends to an LP-soln. (x, {(yA,zA,rA)}A) of
cost ≤ (1+e)OPT where ∑A pArA ≤ r(1+k).
Dependence on 1/kr is unavoidable in black-box setting.

Theorem 2 (rounding theorem): Given a soln. x that extends to an
LP-soln. (x, {(yA,zA,rA)}A) of cost C and ∑A pArA = P, can round x to
• a soln. x' for fractional RASC s.t. w.x' + EA[opt. frac. cost of A] ≤ 2C,
                                    PrA[opt. frac. cost of A > 2B] ≤ 2P
[Can now use any LP-based “local” approx. for 2-stage SC to round x']
• a soln (X, {YA}A) for (integer) RASC s.t.
        w.X + EA[W.YA] ≤ 4aC,         PrA[W.YA > 4aB] ≤ 2P
  using any LP-based a-approx. algo. for DSC.
                 Rounding the LP
Given a soln. x that extends to an LP-soln. (x, {(yA,zA,rA)}A) of
cost C and ∑A pArA = P
LP constraints:        ∑S WS yA,S         ≤B         for each A
       ∑S:eS xS + ∑S:eS yA,S + rA       ≥1         for each A, eA
       ∑S:eS xS + ∑S:eS (yA,S + zA,S)   ≥1         for each A, eA
For every A, either we have
  rA ≥ 0.5       OR      ∑S:eS xS + ∑S:eS yA,S ≥ 0.5 for each eA
“Threshold rounding”: if rA ≥ 0.5, set r'A = 1, else r'A = 0; set x' = 2x
Let fA(x') = opt. fractional cost of scenario A given stage-I soln. x'
        fA(x') ≤ W.(yA+zA)  w.x' + EA[fA(x')] ≤ 2C
In scenario A, if rA ≤ 0.5, then (x', 2yA) covers A  fA(x') ≤ 2B
So      PrA[fA(x') > 2B] ≤ PrA[rA > 0.5] ≤ 2 ∑A pArA = 2P
                                  Rounding (contd.)
Rounding x' to an integer soln. to RASC: can use an a-approximation
algorithm for 2-stage stochastic problem that is
(i) LP-based, (ii) “local”, i.e., gives per-scenario cost guarantees,
[(iii) can be implemented given only a first-stage solution]
to obtain integer solution (X, {YA}A) of cost ≤ a.2C, and
                       PrA[cost of A > a.2B] ≤ 2P
• set cover, vertex cover, multicut on trees: Shmoys-S gave such a 2b-
  approx. algorithm using an LP-based b-approx. algo. for deterministic
  problem  get ratios of 4log n, 8, 8 respectively
• min s-t cut: can use O(log n)-approx. algorithm of Dhamdhere et al.
  for stochastic min s-t cut, which is local
• Also, facility location: not set cover, but very similar rounding; get 11-
  approx. using variant of Shmoys-S algorithm for 2-stage FL
   Solving the fractional-RASC LP:
   Sample Average Approximation
Sample Average Approximation (SAA) method:
  – Sample some N times from distribution
  – Estimate pA by qA = frequency of occurrence of scenario A = nA/N.
  – Construct sample average LP, where pA is replaced by qA in LP
How large should N be?

 Wanted result: With poly-bounded N,
 x is an optimal solution to sample average problem 
 x is a near-optimal soln. to true problem with small blow-up of r
   Solving the fractional-RASC LP
Minimize ∑S wSxS + ∑AU pA ∑S WS(yA,S + zA,S)
subject to,          ∑A pArA          ≤r     (*)
                     ∑S WS yA,S       ≤B     for each A
       ∑S:eS xS + ∑S:eS yA,S + rA   ≥1     for each A, eA
      ∑S:eS xS + ∑S:eS (yA,S + zA,S) ≥ 1   for each A, eA
                     xS, yA,S, zA,S    ≥0    for each S, A.

1) Lagrangify coupling constraint (*) to get a separable
problem
   Solving the fractional-RASC LP
Minimize ∑S wSxS + ∑AU pA ∑S WS(yA,S + zA,S)
subject to,          ∑A pArA          ≤r     (*) ] x D ≥ 0
                     ∑S WS yA,S       ≤B     for each A
       ∑S:eS xS + ∑S:eS yA,S + rA   ≥1     for each A, eA
      ∑S:eS xS + ∑S:eS (yA,S + zA,S) ≥ 1   for each A, eA
                     xS, yA,S, zA,S    ≥0    for each S, A.

1) Lagrangify coupling constraint (*) to get a separable
problem
    Solving the fractional-RASC LP
                                  h(D; x) = w.x + ∑AU pA gA(D; x)

MaxD ≥ 0 [-Dr + min ( ∑S wSxS + ∑AU pA (D rA + ∑S WS(yA,S + zA,S)))]
                  OPT(D)
subject to,             ∑S WS yA,S        ≤B       for each A
        ∑S:eS xS + ∑S:eS yA,S + rA      ≥1       for each A, eA
       ∑S:eS xS + ∑S:eS (yA,S + zA,S)   ≥1       for each A, eA
                       xS, yA,S, zA,S     ≥0       for each S, A.

After Lagrangification, inner minimization problem
becomes a separable 2-stage problem
   Solving the fractional-RASC LP
MaxD ≥ 0 [-Dr + min ( h(D; x) = w.x + ∑AU pA gA(D; x))]
2) Argue that for each fixed D, can compute efficiently a
“near-optimal” solution to inner-minimization problem
3) Use this to search for “right” value of the Lagrange-multiplier D:
   Solving the fractional-RASC LP
MaxD ≥ 0 [-Dr + min ( h(D; x) = w.x + ∑AU pA gA(D; x))]
2) Argue that for each fixed D, can compute efficiently a
“near-optimal” solution to inner-minimization problem
3) Use this to search for “right” value of the Lagrange-multiplier D:
search is complicated by (i) only have approx. solns. for each D, (ii)
cannot actually compute ∑A pArA but have to estimate it
Problems with 2): Cannot compute a “good” optimal solution;
2-stage problem does not fall into the solvable class in Shmoys-S, or
Charikar-Chekuri-Pal – their arguments do not directly apply
Crucial insight: For the search in 3) to work, suffices to prove the
weak guarantee: can compute x s.t. h(D; x) ≈ (1+s)OPT(D) + hD
Weak enough that can show that sample-average-approximation
works, by using approx.-subgradient proof technique (S-Shmoys)
 2) Near-optimal soln. for fixed D
Use sample average approximation:
replace minxP (h(D; x) = w.x + ∑AU pA gA(D; x))      (PD)
with    minxP (h'(D; x) = w.x + ∑AU qA gA(D; x))    (SA-PD)
where qA = frequency of occurrence of scenario A in N samples

 Want to show: With poly-bounded N,
  (*) if x solves (SA-PD) then h(D; x) ≈ (1+s)OPTD + hD.

h(D; .) and h'(D; .) can take very different values; BUT can prove
(*) by showing that their “slopes” are “close” to each other
 2) Near-optimal soln. for fixed D
Use sample average approximation: replace problem
  minxP (h(D; x) = w.x + ∑AU pA gA(D; x))  (PD) with
  minxP (h'(D; x) = w.x + ∑AU qA gA(D; x)) (SA-PD)
                       h(D; x)
where qA = frequency of occurrence of scenario A in N samples
                                 h'(D; x)
 To show: With poly-bounded N,
    (*) if x solves (SA-PD) *then h(D; x) ≈ (1+s)OPTD + hD.
                        x x                    x

h(D; .) and h'(D; .) can take very different values; BUT can prove
(*) by showing that their “slopes” are “close” to each other
     (Approximate) Subgradients and
        closeness-in-subgradients
 “Slope”  subgradient
                                   m
 For a (convex) function g:   ,
 dm is a subgradient of g(.) at u, if "v, g(v) – g(u) ≥ d.(v–u).
 d is an (e,h)-subgradient of g at u, if "v, g(v) – g(u) ≥ d.(v–u) – e.g(v) – e.g(u) – h.

Closeness-in-subgradients: At “most” points u in P, $vector d'u such that
(#) d'u is a subgradient of g'(.) at u, AND an (e,h)-subgradient of g(.) at u.
Lemma (S-Shmoys): For any convex functions g(.), g'(.), if (#) holds then,
  x solves minxP g'(x)  x is a near-optimal solution to minxP g(x).
[(#) holds with high probability for h(D; .) and h'(D; .) (for suitable e,h).]
   Closeness-in-subgradients (contd.)
dm is a subgradient of g(.) at u, if "v, g(v) – g(u) ≥ d.(v–u).
d is an (e,h)-subgradient of g at u, if "v, g(v) – g(u) ≥ d.(v–u) – e.g(v) – e.g(u) – h.
Closeness-in-subgradients: At “most” points u in P, $vector d'u such that
(#) d'u is a subgradient of g'(.) at u, AND an (e,h)-subgradient of g(.) at u.
Lemma: For any convex functions g(.), g'(.), if (#) holds then,
   x solves minxP g'(x)  x is a near-optimal solution to minxP g(x).
Intuition:
• Minimizer of convex function is determined by subgradient.
                                                                                     P
• Ellipsoid-based algorithm of SS04 for convex minimization
   only uses (e-) subgradients: uses (e-) subgradient to cut                 g(x) ≤ g(u)
                                                                   u
   ellipsoid at a feasible point u in P                        du
   (#)  can run SS04 algorithm on both minxP g(x) and
   minxP g'(x) using same vector d'u to cut ellipsoid at uP
    algorithm will return x that is near-optimal for both problems.
        Closeness-in-subgradients of
             h(D; .) and h'(D; .)
True problem:           minxP (h(D; x) = w.x + ∑AU pA gA(D; x))             (PD)
Sample average problem: minxP (h'(D; x) = w.x + ∑AU qA gA(D; x))            (SA-PD)

To show: At “most” points u in P, $vector d'u such that
d'u is a subgradient of h'(D; .) at u, AND an (e,hD)-subgradient of h(D; .) at u.

Fix uP. Let l = maxS WS /wS.
• subgradient of h(D; .) at u is du = (du,S) with du,S = wS – ∑A pAzA,S = wS – E[zA,S],
  where zA,S = quantity derived from optimal dual soln. to gA(D; u)
• subgradient of h'(D; .) at u is d'u = (d'u,S) with d'u,S = wS – ∑A qAzA,S = wS –
  E'[zA,S]
• structure of dual implies that zA,S ≤ WS + D for all S
 using poly(l/eh) samples can ensure that |d'u,S – du,S| ≤ e.wS + hD/2m "S whp.
suffices to show that d'u is an (e, hD)-subgradient of h(D; .) at u whp.
Union bound shows that this holds for “most” points in P
        Summary and Extensions
• Although LP-relaxation of (fractional) problem is non-
  separable, has exponential size, can still compute near-
  optimal LP-first-stage decisions: present an FPTAS
   – LP-first-stage decisions are sufficient to round and obtain near-
     optimal solution to fractional problem, which can be further
     rounded using various known approx. algorithms.
   – Many applications: set cover, vertex cover, facility location,
     min s-t cut, multicut on trees: obtain first approximation
     algorithms for chance-constraints + black-box model
• Get same results for (i) non-uniform budgets; (ii) risk-averse
  robust problems; (iii) simultaneous budget constraints, e.g.,
  Pr[facility cost > BF or service cost > BS or total cost > B] ≤ r
• (iv) B=0 problem: interesting one-stage problem; choose
  initial decisions so as to satisfy “most” scenarios
            Open Questions
• Approximation results for other problems
 in the risk-averse models.
• Models and algorithms for multi-stage risk-
 averse stochastic optimization (in black-box
 setting).
• Risk-averse stochastic scheduling.
• Other combinations of multiple probabilistic
 budget constraints.
Thank You.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:2/14/2012
language:English
pages:36