VIEWS: 3 PAGES: 36 POSTED ON: 2/14/2012
Risk-averse Stochastic Optimization: Models + Algorithms Chaitanya Swamy University of Waterloo Risk-averse Stochastic Optimization: Probabilistically-constrained models + Algorithms for Black-box Distributions Chaitanya Swamy University of Waterloo Two-Stage Recourse Model Given : Probability distribution over inputs. Stage I : Make some advance decisions – plan ahead or hedge against uncertainty. Observe the actual input scenario. Stage II : Take recourse. Can augment earlier solution paying a recourse cost. Choose stage I decisions to minimize (stage I cost) + (expected stage II recourse cost). 2-Stage Stochastic Facility Location Distribution over clients gives the set of clients to serve. Stage I: Open some facilities in advance; pay cost fi for facility i. facility stage I facility Stage I cost = ∑(i opened) fi . client set D 2-Stage Stochastic Facility Location Distribution over clients gives the set of clients to serve. Stage I: Open some facilities in advance; pay cost fi for facility i. facility stage I facility Stage I cost = ∑(i opened) fi . stage II facility Actual scenario A = { clients to serve}, materializes. Stage II: Can open more facilities to serve clients in A; pay cost fiA to open facility i. Assign clients in A to facilities. Stage II cost = ∑ i opened in fiA + (cost of serving clients in A). scenario A Want to decide which facilities to open in stage I. Goal: Minimize Total Cost = (stage I cost) + EA D [stage II cost for A]. How is the probability distribution specified? • A short (polynomial) list of possible scenarios • Independent probabilities that each client exists • A black box that can be sampled. black-box setting Risk-averse stochastic optimization • E[.] measure does not adequately model the “risk” associated with stage-I decisions • Same E[.] value same “risk involved”: given two solutions with same E[.] cost, prefer solution with more “assured” or “reliable” second-stage component (costs). E.g. portfolio investment • Want to capture above notion of risk-averseness, where one seeks to avoid disaster scenarios Modeling risk-aversion: attempt 1 Choose stage I decisions to minimize Budget model (stage I cost) + (expected stage II recourse cost) subject to (stage II cost of scenario A) ≤ B for every scenario A Gupta-Ravi-Sinha: considered stochastic Steiner tree in this budget model in the polynomial-scenario setting Budget model provides greatest degree of risk-aversion Modeling risk-aversion: attempt 1 Choose stage I decisions to minimize Budget model (stage I cost) + (expected stage II recourse cost) subject to (stage II cost of scenario A) ≤ B for every scenario A Gupta-Ravi-Sinha: considered stochastic Steiner tree in this budget model in the polynomial-scenario setting Budget model provides greatest degree of risk-aversion, BUT – limited modeling power: cannot get any approximation guarantees in black-box setting with bounded sample size – overly conservative: protects every scenario regardless of its probability Closely-related model Choose stage I decisions to minimize Robust model (stage I cost) + (maximum stage II recourse cost) • Dhamdhere et al. considered this model, again in the polynomial-scenario setting • “Guessing” B = max. (stage II cost) “reduces” robust- problem to the budget problem • Modeling issues: not clear how to even specify exponentially many scenarios – Feige et al.: scenarios specified by cardinality constraint; seems rather stylized for stochastic optimization – Will consider distribution-based robust model: scenario-collection = support of distribution • Same drawbacks as in the budget model – no guarantees possible in black-box setting Modeling risk-aversion: attempt 2 recall, budget model Choose stage I decisions to minimize (stage I cost) + (expected stage II recourse cost) subject to (stage II cost of scenario A) ≤ B for every scenario A • For the budget-model, one can prove approximation results if one is allowed to violate the budget- constraints with a small probability • Can turn the above solution concept around and incorporate it into the model to arrive at the following new model Modeling risk-aversion: attempt 2 Risk-averse Choose stage I decisions to minimize budget model (stage I cost) + (expected stage II recourse cost) subject to PrA[(stage II cost of scenario A) > B] ≤ r r: input – can tradeoff risk-averseness vs. conservatism • Called probabilistically- or chance- constrained program • Chance constraint called Value-at-Risk (VaR) constraint in finance literature: popular for risk-optimization in finance • Related robust model: minimize (stage I cost) + (1-r)-quantile of (stage II recourse cost) Approximation Algorithm Hard to solve the problem exactly. Even special cases are #P-hard. Settle for approximate solutions. Give polytime algorithm that always finds near-optimal solutions. A is a a-approximation algorithm if, •A runs in polynomial time. •A(I) ≤ a.OPT(I) on all instances I, a is called the approximation ratio of A. Our Results • Obtain approximation algorithms for various risk-averse budgeted (and robust) problems in the black-box setting: facility location, set cover, vertex cover, multicut on trees, min cut • Give a fully polynomial approximation scheme for solving the LP-relaxations of a large class of risk-averse problems can use existing algorithms for deterministic or 2-stage version of problem to get approximation algorithm for risk-averse problem • First approximation results for chance-constrained programs and black-box distributions (Kleinberg-Rabani-Tardos consider chance-constrained versions of bin-packing, knapsack but for specialized product distributions) Related Work • Gupta et al.: gave a const.-approx. for stochastic Steiner tree in the poly-scenario budget model • Dhamdhere et al., Feige et al.: gave approx. algorithms for various problems in robust model with poly-scenarios, cardinality-collections • So-Zhang-Ye: consider another risk measure called conditional VaR; give an approx. scheme for solving the LP-relaxations of problems in black-box setting – Can use our techniques to solve a generalization of their model, where one has probabilistic budget constraints • Lots of work in the standard 2-stage model: Dye et al., Ravi- Sinha, Immorlica et al., Gupta et al.+, Shmoys-S, S-Shmoys …… Risk-averse Set Cover (RASC) Universe U = {e1, …, en }, subsets S1, S2, …, Sm U, set S has weight wS. Deterministic problem (DSC): Pick a minimum weight collection of sets that covers each element. Risk-averse budgeted version: Target set of elements to be covered is given by a probability distribution. – choose some sets initially paying wS for set S – subset A U to be covered is revealed – can pick additional sets paying wSA for set S. Minimize (w-cost of sets picked in stage I) + EA U [wA-cost of new sets picked for scenario A]. subject to PrA U [wA-cost for scenario A > B] ≤ r Fractional risk-averse set cover Fractional risk-averse problem: can buy sets fractionally in stage I and in each scenario A to cover the elements in A to an extent of 1 Not clear how to solve even the fractional problem in the polynomial-scenario setting. Why? The set of feasible solutions {(x, {yA}A) : (x, yA) covers A for each scenario A, PrA[∑S wAS yA,S > B] ≤ r} is NOT a convex set. How to get an LP-relaxation? An LP for fractional RASC For simplicity, consider wSA = WS for every scenario A. xS : indicates if set S is picked in stage I rA : indicates if budget-constraint is NOT met for A {yA,S} : decisions in scenario A when budget-constraint is met for A {zA,S}: decisions in scenario A when budget-constraint is not met for A Minimize ∑S wSxS + ∑AU pA ∑S WS(yA,S + zA,S) subject to, ∑A pArA ≤r ∑S WS yA,S ≤B for each A ∑S:eS xS + ∑S:eS yA,S + rA ≥1 for each A, eA ∑S:eS xS + ∑S:eS (yA,S + zA,S) ≥ 1 for each A, eA xS, yA,S, zA,S ≥0 for each S, A. Minimize ∑S wSxS + ∑AU pA ∑S WS(yA,S + zA,S) subject to, ∑A pArA ≤r Coupling constraint ∑S WS yA,S ≤B for each A ∑S:eS xS + ∑S:eS yA,S + rA ≥1 for each A, eA ∑S:eS xS + ∑S:eS (yA,S + zA,S) ≥ 1 for each A, eA xS, yA,S, zA,S ≥0 for each S, A. • Exponential number of variables and exponential number of constraints. • The scenarios are no longer separable: i.e., a first-stage solution x alone is not enough to specify LP solution: need to specify the rAs – what does solving LP mean? – Contrast wrt. standard 2-stage model, or fractional risk- averse problem Theorem 1: For any e,k>0, in time poly(1/ekr), can compute a first- stage soln. x that extends to an LP-soln. (x, {(yA,zA,rA)}A) of cost ≤ (1+e)OPT where ∑A pArA ≤ r(1+k). Dependence on 1/kr is unavoidable in black-box setting. Theorem 2 (rounding theorem): Given a soln. x that extends to an LP-soln. (x, {(yA,zA,rA)}A) of cost C and ∑A pArA = P, can round x to • a soln. x' for fractional RASC s.t. w.x' + EA[opt. frac. cost of A] ≤ 2C, PrA[opt. frac. cost of A > 2B] ≤ 2P [Can now use any LP-based “local” approx. for 2-stage SC to round x'] • a soln (X, {YA}A) for (integer) RASC s.t. w.X + EA[W.YA] ≤ 4aC, PrA[W.YA > 4aB] ≤ 2P using any LP-based a-approx. algo. for DSC. Rounding the LP Given a soln. x that extends to an LP-soln. (x, {(yA,zA,rA)}A) of cost C and ∑A pArA = P LP constraints: ∑S WS yA,S ≤B for each A ∑S:eS xS + ∑S:eS yA,S + rA ≥1 for each A, eA ∑S:eS xS + ∑S:eS (yA,S + zA,S) ≥1 for each A, eA For every A, either we have rA ≥ 0.5 OR ∑S:eS xS + ∑S:eS yA,S ≥ 0.5 for each eA “Threshold rounding”: if rA ≥ 0.5, set r'A = 1, else r'A = 0; set x' = 2x Let fA(x') = opt. fractional cost of scenario A given stage-I soln. x' fA(x') ≤ W.(yA+zA) w.x' + EA[fA(x')] ≤ 2C In scenario A, if rA ≤ 0.5, then (x', 2yA) covers A fA(x') ≤ 2B So PrA[fA(x') > 2B] ≤ PrA[rA > 0.5] ≤ 2 ∑A pArA = 2P Rounding (contd.) Rounding x' to an integer soln. to RASC: can use an a-approximation algorithm for 2-stage stochastic problem that is (i) LP-based, (ii) “local”, i.e., gives per-scenario cost guarantees, [(iii) can be implemented given only a first-stage solution] to obtain integer solution (X, {YA}A) of cost ≤ a.2C, and PrA[cost of A > a.2B] ≤ 2P • set cover, vertex cover, multicut on trees: Shmoys-S gave such a 2b- approx. algorithm using an LP-based b-approx. algo. for deterministic problem get ratios of 4log n, 8, 8 respectively • min s-t cut: can use O(log n)-approx. algorithm of Dhamdhere et al. for stochastic min s-t cut, which is local • Also, facility location: not set cover, but very similar rounding; get 11- approx. using variant of Shmoys-S algorithm for 2-stage FL Solving the fractional-RASC LP: Sample Average Approximation Sample Average Approximation (SAA) method: – Sample some N times from distribution – Estimate pA by qA = frequency of occurrence of scenario A = nA/N. – Construct sample average LP, where pA is replaced by qA in LP How large should N be? Wanted result: With poly-bounded N, x is an optimal solution to sample average problem x is a near-optimal soln. to true problem with small blow-up of r Solving the fractional-RASC LP Minimize ∑S wSxS + ∑AU pA ∑S WS(yA,S + zA,S) subject to, ∑A pArA ≤r (*) ∑S WS yA,S ≤B for each A ∑S:eS xS + ∑S:eS yA,S + rA ≥1 for each A, eA ∑S:eS xS + ∑S:eS (yA,S + zA,S) ≥ 1 for each A, eA xS, yA,S, zA,S ≥0 for each S, A. 1) Lagrangify coupling constraint (*) to get a separable problem Solving the fractional-RASC LP Minimize ∑S wSxS + ∑AU pA ∑S WS(yA,S + zA,S) subject to, ∑A pArA ≤r (*) ] x D ≥ 0 ∑S WS yA,S ≤B for each A ∑S:eS xS + ∑S:eS yA,S + rA ≥1 for each A, eA ∑S:eS xS + ∑S:eS (yA,S + zA,S) ≥ 1 for each A, eA xS, yA,S, zA,S ≥0 for each S, A. 1) Lagrangify coupling constraint (*) to get a separable problem Solving the fractional-RASC LP h(D; x) = w.x + ∑AU pA gA(D; x) MaxD ≥ 0 [-Dr + min ( ∑S wSxS + ∑AU pA (D rA + ∑S WS(yA,S + zA,S)))] OPT(D) subject to, ∑S WS yA,S ≤B for each A ∑S:eS xS + ∑S:eS yA,S + rA ≥1 for each A, eA ∑S:eS xS + ∑S:eS (yA,S + zA,S) ≥1 for each A, eA xS, yA,S, zA,S ≥0 for each S, A. After Lagrangification, inner minimization problem becomes a separable 2-stage problem Solving the fractional-RASC LP MaxD ≥ 0 [-Dr + min ( h(D; x) = w.x + ∑AU pA gA(D; x))] 2) Argue that for each fixed D, can compute efficiently a “near-optimal” solution to inner-minimization problem 3) Use this to search for “right” value of the Lagrange-multiplier D: Solving the fractional-RASC LP MaxD ≥ 0 [-Dr + min ( h(D; x) = w.x + ∑AU pA gA(D; x))] 2) Argue that for each fixed D, can compute efficiently a “near-optimal” solution to inner-minimization problem 3) Use this to search for “right” value of the Lagrange-multiplier D: search is complicated by (i) only have approx. solns. for each D, (ii) cannot actually compute ∑A pArA but have to estimate it Problems with 2): Cannot compute a “good” optimal solution; 2-stage problem does not fall into the solvable class in Shmoys-S, or Charikar-Chekuri-Pal – their arguments do not directly apply Crucial insight: For the search in 3) to work, suffices to prove the weak guarantee: can compute x s.t. h(D; x) ≈ (1+s)OPT(D) + hD Weak enough that can show that sample-average-approximation works, by using approx.-subgradient proof technique (S-Shmoys) 2) Near-optimal soln. for fixed D Use sample average approximation: replace minxP (h(D; x) = w.x + ∑AU pA gA(D; x)) (PD) with minxP (h'(D; x) = w.x + ∑AU qA gA(D; x)) (SA-PD) where qA = frequency of occurrence of scenario A in N samples Want to show: With poly-bounded N, (*) if x solves (SA-PD) then h(D; x) ≈ (1+s)OPTD + hD. h(D; .) and h'(D; .) can take very different values; BUT can prove (*) by showing that their “slopes” are “close” to each other 2) Near-optimal soln. for fixed D Use sample average approximation: replace problem minxP (h(D; x) = w.x + ∑AU pA gA(D; x)) (PD) with minxP (h'(D; x) = w.x + ∑AU qA gA(D; x)) (SA-PD) h(D; x) where qA = frequency of occurrence of scenario A in N samples h'(D; x) To show: With poly-bounded N, (*) if x solves (SA-PD) *then h(D; x) ≈ (1+s)OPTD + hD. x x x h(D; .) and h'(D; .) can take very different values; BUT can prove (*) by showing that their “slopes” are “close” to each other (Approximate) Subgradients and closeness-in-subgradients “Slope” subgradient m For a (convex) function g: , dm is a subgradient of g(.) at u, if "v, g(v) – g(u) ≥ d.(v–u). d is an (e,h)-subgradient of g at u, if "v, g(v) – g(u) ≥ d.(v–u) – e.g(v) – e.g(u) – h. Closeness-in-subgradients: At “most” points u in P, $vector d'u such that (#) d'u is a subgradient of g'(.) at u, AND an (e,h)-subgradient of g(.) at u. Lemma (S-Shmoys): For any convex functions g(.), g'(.), if (#) holds then, x solves minxP g'(x) x is a near-optimal solution to minxP g(x). [(#) holds with high probability for h(D; .) and h'(D; .) (for suitable e,h).] Closeness-in-subgradients (contd.) dm is a subgradient of g(.) at u, if "v, g(v) – g(u) ≥ d.(v–u). d is an (e,h)-subgradient of g at u, if "v, g(v) – g(u) ≥ d.(v–u) – e.g(v) – e.g(u) – h. Closeness-in-subgradients: At “most” points u in P, $vector d'u such that (#) d'u is a subgradient of g'(.) at u, AND an (e,h)-subgradient of g(.) at u. Lemma: For any convex functions g(.), g'(.), if (#) holds then, x solves minxP g'(x) x is a near-optimal solution to minxP g(x). Intuition: • Minimizer of convex function is determined by subgradient. P • Ellipsoid-based algorithm of SS04 for convex minimization only uses (e-) subgradients: uses (e-) subgradient to cut g(x) ≤ g(u) u ellipsoid at a feasible point u in P du (#) can run SS04 algorithm on both minxP g(x) and minxP g'(x) using same vector d'u to cut ellipsoid at uP algorithm will return x that is near-optimal for both problems. Closeness-in-subgradients of h(D; .) and h'(D; .) True problem: minxP (h(D; x) = w.x + ∑AU pA gA(D; x)) (PD) Sample average problem: minxP (h'(D; x) = w.x + ∑AU qA gA(D; x)) (SA-PD) To show: At “most” points u in P, $vector d'u such that d'u is a subgradient of h'(D; .) at u, AND an (e,hD)-subgradient of h(D; .) at u. Fix uP. Let l = maxS WS /wS. • subgradient of h(D; .) at u is du = (du,S) with du,S = wS – ∑A pAzA,S = wS – E[zA,S], where zA,S = quantity derived from optimal dual soln. to gA(D; u) • subgradient of h'(D; .) at u is d'u = (d'u,S) with d'u,S = wS – ∑A qAzA,S = wS – E'[zA,S] • structure of dual implies that zA,S ≤ WS + D for all S using poly(l/eh) samples can ensure that |d'u,S – du,S| ≤ e.wS + hD/2m "S whp. suffices to show that d'u is an (e, hD)-subgradient of h(D; .) at u whp. Union bound shows that this holds for “most” points in P Summary and Extensions • Although LP-relaxation of (fractional) problem is non- separable, has exponential size, can still compute near- optimal LP-first-stage decisions: present an FPTAS – LP-first-stage decisions are sufficient to round and obtain near- optimal solution to fractional problem, which can be further rounded using various known approx. algorithms. – Many applications: set cover, vertex cover, facility location, min s-t cut, multicut on trees: obtain first approximation algorithms for chance-constraints + black-box model • Get same results for (i) non-uniform budgets; (ii) risk-averse robust problems; (iii) simultaneous budget constraints, e.g., Pr[facility cost > BF or service cost > BS or total cost > B] ≤ r • (iv) B=0 problem: interesting one-stage problem; choose initial decisions so as to satisfy “most” scenarios Open Questions • Approximation results for other problems in the risk-averse models. • Models and algorithms for multi-stage risk- averse stochastic optimization (in black-box setting). • Risk-averse stochastic scheduling. • Other combinations of multiple probabilistic budget constraints. Thank You.