Document Sample

Discounting the Future in Systems Theory Luca de Alfaro, UC Santa Cruz Tom Henzinger, UC Berkeley Rupak Majumdar, UC Los Angeles Chess Review May 11, 2005 Berkeley, CA A Graph Model of a System a b c Property c ("eventually c") a b c Property c ("eventually c") a b c c … some trace has the property c Property c ("eventually c") a b c c … some trace has the property c c … all traces have the property c Richer Models FAIRNESS: -automaton Parity game ADVERSARIAL graph CONCURRENCY: game graph Stochastic game PROBABILITIES: Markov decision process Concurrent Game 1,1 2,2 1,1 1,2 1,2 2,1 2,2 a b c 2,1 player "left" player "right" -for modeling open systems [Abramsky, Alur, Kupferman, Vardi, …] -for strategy synthesis ("control") [Ramadge, Wonham, Pnueli, Rosner] Property c 1,1 2,2 1,1 1,2 1,2 2,1 2,2 a b c 2,1 hhleftii c … player "left" has a strategy to enforce c Property c 1,1 2,2 1,1 1,2 1,2 2,1 2,2 Pr(1): 0.5 a b c Pr(2): 0.5 2,1 hhleftii c … player "left" has a strategy to enforce c left c … player “left" has a randomized strategy to enforce c Qualitative Models Trace: sequence of observations Property p: assigns a reward to each trace boolean rewards Model m: generates a set of traces (game) graph Value(p,m): defined from the rewards of the generated traces B 9 or 8 (98) Stochastic Game a b c right right left 1 2 left 1 2 a: 0.6 a: 0.5 a: 0.0 a: 0.0 1 b: 0.4 b: 0.5 1 c: 1.0 c: 1.0 a: 0.1 a: 0.2 a: 0.7 a: 0.0 2 b: 0.9 b: 0.8 2 b: 0.3 b: 1.0 Property c Probability with which player "left" can enforce c ? a b c right right left 1 2 left 1 2 a: 0.6 a: 0.5 a: 0.0 a: 0.0 1 b: 0.4 b: 0.5 1 c: 1.0 c: 1.0 a: 0.1 a: 0.2 a: 0.7 a: 0.0 2 b: 0.9 b: 0.8 2 b: 0.3 b: 1.0 Semi-Quantitative Models Trace: sequence of observations Property p: assigns a reward to each trace boolean rewards Model m: generates a set of traces (game) graph Value(p,m): defined from the rewards of the generated traces [0,1] µ R sup or inf (sup inf) A Systems Theory Class of properties p over traces Algorithm for Distance between computing models w.r.t. Value(p,m) over property values models m A Systems Theory -regular properties Class of properties p over traces Algorithm for GRAPHS Distance between computing models w.r.t. Value(p,m) over property values models m m-calculus bisimilarity Transition Graph Q states d: Q 2Q transition relation Graph Regions Q states d: Q 2Q transition relation =[Q!B] regions 9pre, 8pre: 9pre(R) 9 RµQ 8 8pre(R) Graph Property Values: Reachability R Given RµQ, find the states from which some trace leads to R. R Graph Property Values: Reachability R = (m X) (R Ç 9pre(X)) Given RµQ, find the states from which some trace leads to R. R R[ R pre(R) ... R[ pre(R) [ pre2(R) Concurrent Game Q states l, r moves of both players d: Q l r Q transition function Game Regions Q states l, r moves of both players d: Q l r Q transition function =[Q!B] regions lpre, rpre: q lpre(R) iff (l l ) (r r) d(q,l,r) R 2,* 1,2 lpre(R) RµQ 1,1 Game Property Values: Reachability left R Given RµQ, find the states from which player "left" has a strategy to force the game to R. R Game Property Values: Reachability left R = (m X) (R Ç lpre(X)) Given RµQ, find the states from which player "left" has a strategy to force the game to R. left R R[ R lpre(R) ... R[ lpre(R) [ lpre2(R) An Open Systems Theory -regular properties Class of winning conditions p over traces GAME Algorithm for GRAPHS Distance between computing models w.r.t. Value(p,m) over property values models m (lpre,rpre) fixpoint calculus alternating bisimilarity [Alur, H, Kupferman, Vardi] An Open Systems Theory -regular properties hhleftiiR Class of winning conditions p over traces GAME Algorithm for GRAPHS Every deterministic computing Value(p,m) over fixpoint formula f models m computes Value(p,m), where p is the linear (lpre,rpre) fixpoint calculus interpretation [Vardi] of f. (m X) (R Ç lpre(X)) An Open Systems Theory Two states agree on the values of all fixpoint formulas iff they are alternating bisimilar [Alur, H, Kupferman, Vardi]. GAME Algorithm for GRAPHS Distance between computing models w.r.t. Value(p,m) over property values models m (lpre,rpre) fixpoint alternating bisimilarity calculus Stochastic Game Q states l, r moves of both players d: Q l r Dist(Q) probabilistic transition function Quantitative Game Regions Q states l, r moves of both players d: Q l r Dist(Q) probabilistic transition function = [ Q ! [0,1] ] quantitative regions lpre, rpre: lpre(R)(q) = (sup l l ) (inf r r) R(d(q,l,r)) Quantitative Game Regions Q states l, r moves of both players d: Q l r Dist(Q) probabilistic transition function = [ Q ! [0,1] ] quantitative regions B lpre, rpre: lpre(R)(q) = (sup l l ) (inf r r) R(d(q,l,r)) 9 8 Probability with which player "left" can enforce c : (m X) (c Ç lpre(X)) Ç = pointwise max 0 0 1 a b c right right left 1 2 left 1 2 a: 0.6 a: 0.5 a: 0.0 a: 0.0 1 b: 0.4 b: 0.5 1 c: 1.0 c: 1.0 a: 0.1 a: 0.2 a: 0.7 a: 0.0 2 b: 0.9 b: 0.8 2 b: 0.3 b: 1.0 Probability with which player "left" can enforce c : (m X) (c Ç lpre(X)) Ç = pointwise max 0 1 1 a b c right right left 1 2 left 1 2 a: 0.6 a: 0.5 a: 0.0 a: 0.0 1 b: 0.4 b: 0.5 1 c: 1.0 c: 1.0 a: 0.1 a: 0.2 a: 0.7 a: 0.0 2 b: 0.9 b: 0.8 2 b: 0.3 b: 1.0 Probability with which player "left" can enforce c : (m X) (c Ç lpre(X)) Ç = pointwise max 0.8 1 1 a b c right right left 1 2 left 1 2 a: 0.6 a: 0.5 a: 0.0 a: 0.0 1 b: 0.4 b: 0.5 1 c: 1.0 c: 1.0 a: 0.1 a: 0.2 a: 0.7 a: 0.0 2 b: 0.9 b: 0.8 2 b: 0.3 b: 1.0 Probability with which player "left" can enforce c : (m X) (c Ç lpre(X)) Ç = pointwise max 0.96 1 1 a b c right right left 1 2 left 1 2 a: 0.6 a: 0.5 a: 0.0 a: 0.0 1 b: 0.4 b: 0.5 1 c: 1.0 c: 1.0 a: 0.1 a: 0.2 a: 0.7 a: 0.0 2 b: 0.9 b: 0.8 2 b: 0.3 b: 1.0 Probability with which player "left" can enforce c : (m X) (c Ç lpre(X)) Ç = pointwise max 1 1 1 a b c right right left 1 2 left 1 2 a: 0.6 a: 0.5 a: 0.0 a: 0.0 1 b: 0.4 b: 0.5 1 c: 1.0 c: 1.0 a: 0.1 a: 0.2 a: 0.7 a: 0.0 2 b: 0.9 b: 0.8 2 b: 0.3 b: 1.0 In the limit, the deterministic fixpoint formulas work for all -regular properties [de Alfaro, Majumdar]. A Probabilistic Systems Theory -regular properties Class of properties p over traces MARKOV Algorithm for DECISION PROCESSES Distance between computing models w.r.t. Value(p,m) over property values models m quantitative fixpoint calculus quantitative bisimilarity [Desharnais, Gupta, Jagadeesan, Panangaden] A Probabilistic Systems Theory quantitative -regular properties Class of properties p max expected value over traces of satisfying R MARKOV Algorithm for DECISION PROCESSES Every deterministic computing fixpoint formula f Value(p,m) over computes expected models m Value(p,m), where p quantitative fixpoint calculus is the linear interpretation of f. (m X) (R Ç 9pre(X)) Qualitative Bisimilarity e: Q2 ! {0,1} … equivalence relation F … function on equivalences F(e)(q,q') = 0 if q and q' disagree on observations = min { e(r,r’) | r2 9pre(q) Æ r’2 9pre(q’) } else Qualitative bisimilarity … greatest fixpoint of F Quantitative Bisimilarity d: Q2 ! [0,1] … pseudo-metric ("distance") F … function on pseudo-metrics F(d)(q,q') = 1 if q and q' disagree on observations ¼ max of supl infr d(d(q,l,r),d(q',l,r)) supr infl d(d(q,l,r),d(q',l,r)) else Quantitative bisimilarity … greatest fixpoint of F Natural generalization of bisimilarity from binary relations to pseudo-metrics. A Probabilistic Systems Theory Two states agree on the values of all quantitative fixpoint formulas iff their quantitative bisimilarity distance is 0. MARKOV Algorithm for DECISION PROCESSES Distance between computing models w.r.t. Value(p,m) over property values models m quantitative fixpoint quantitative bisimilarity calculus Great BUT … 1 The theory is too precise. Even the smallest change in the probability of a transition can cause an arbitrarily large change in the value of a property. 2 The theory is not computational. We cannot bound the rate of convergence for quantitative fixpoint formulas. Solution: Discounting Economics: A dollar today is better than a dollar tomorrow. Value of $1.- today: 1 Tomorrow: a for discount factor 0 < a < 1 Day after tomorrow: a2 etc. Solution: Discounting Economics: A dollar today is better than a dollar tomorrow. Value of $1.- today: 1 Tomorrow: a for discount factor 0 < a < 1 Day after tomorrow: a2 etc. Engineering: A bug today is worse than a bug tomorrow. Discounted Reachability Reward (a c ) = ak if c is first true after k transitions 0 if c is never true The reward is proportional to how quickly c is satisfied. Discounted Property a c a 1 a b c a2 a c Discounted Property a c a 1 a b c a2 a c Discounted fixpoint calculus: pre(f) a ¢ pre(f) Fully Quantitative Models Trace: sequence of observations Property p: assigns a reward to each trace real reward Model m: generates a set of traces (game) graph Value(p,m): defined from the rewards of the generated traces [0,1] µ R sup or inf (sup inf) Discounted Bisimilarity d: Q2 ! [0,1] … pseudo-metric ("distance") F … function on pseudo-metrics F(d)(q,q') = 1 if q and q' disagree on observations ¼ max of supl infr d(d(q,l,r),d(q',l,r)) supr infl d(d(q,l,r),d(q',l,r)) else Quantitative bisimilarity … greatest fixpoint of F a ¢ A Discounted Systems Theory discounted -regular properties Class of winning rewards p over traces Algorithm for STOCHASTIC GAMES Distance between computing models w.r.t. Value(p,m) over property values models m discounted fixpoint calculus discounted bisimilarity A Discounted Systems Theory discounted -regular properties max expected reward a R Class of expected achievable by rewards p over traces left player Algorithm for STOCHASTIC Every discounted computing GAMES deterministic fixpoint Value(p,m) over formula f computes models m Value(p,m), where p is the linear discounted fixpoint calculus interpretation of f. (m X) (R Ç a ¢ lpre(X)) A Discounted Systems Theory The difference between two states in the values of discounted fixpoint formulas is bounded by their discounted bisimilarity distance. Algorithm for STOCHASTIC GAMES Distance between computing models w.r.t. Value(p,m) over property values models m discounted fixpoint discounted bisimilarity calculus Discounting is Robust Continuity over Traces: Every discounted fixpoint formula defines a reward function on traces that is continuous in the Cantor metric. Continuity over Models: If transition probabilities are perturbed by e, then discounted bisimilarity distances change by at most f(e). Discounting is robust against effects at infinity, and against numerical perturbations. Discounting is Computational The iterative evaluation of an a-discounted fixpoint formula converges geometrically in a. (So we can compute to any desired precision.) Discounting is Approximation If the discount factor tends towards 1, then we recover the classical theory: • lima! 1 a-discounted interpretation of fixpoint formula f = classical interpretation of f • lima! 1 a-discounted bisimilarity = classical (alternating; quantitative) bisimilarity Further Work • Exact computation of discounted values of temporal formulas over finite-state systems [de Alfaro, Faella, H, Majumdar, Stoelinga]. • Discounting real-time systems: continuous discounting of time delay rather than discrete discounting of number of steps [Prabhu]. Conclusions • Discounting provides a continuous and computational approximation theory of discrete and probabilistic processes. • Discounting captures an important engineering intuition. "In the long run, we're all dead." J.M. Keynes

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 2 |

posted: | 9/29/2011 |

language: | English |

pages: | 55 |

OTHER DOCS BY xiangpeng

Feel free to Contact Us with any questions you might have.