Docstoc

PPT

Document Sample
PPT Powered By Docstoc
					        Discounting the Future
        in Systems Theory
        Luca de Alfaro, UC Santa Cruz
        Tom Henzinger, UC Berkeley
        Rupak Majumdar, UC Los Angeles




Chess Review
May 11, 2005
Berkeley, CA
    A Graph Model of a System




a               b               c
    Property c ("eventually c")




a                b                 c
     Property c ("eventually c")




 a                  b                     c




 c       … some trace has the property c
     Property c ("eventually c")




 a                  b                      c




 c       … some trace has the property c
 c       … all traces have the property c
           Richer Models

            FAIRNESS:
            -automaton
                                  Parity game
           ADVERSARIAL
graph     CONCURRENCY:
            game graph
                                  Stochastic game

           PROBABILITIES:
        Markov decision process
                  Concurrent Game
      1,1
      2,2
                                           1,1
                  1,2                      1,2
                  2,1                      2,2
       a                        b                       c
                  2,1


                          player "left"
                          player "right"


-for modeling open systems [Abramsky, Alur, Kupferman, Vardi, …]
-for strategy synthesis ("control") [Ramadge, Wonham, Pnueli, Rosner]
                    Property c
  1,1
  2,2
                                    1,1
              1,2                   1,2
              2,1                   2,2
   a                      b                       c
              2,1



hhleftii c    … player "left" has a strategy to enforce c
                                 Property c
               1,1
               2,2
                                                 1,1
                           1,2                   1,2
                           2,1                   2,2
Pr(1): 0.5      a                      b                       c
Pr(2): 0.5                 2,1



             hhleftii c    … player "left" has a strategy to enforce c
             left c      … player “left" has a randomized strategy to
                                  enforce c
     Qualitative Models

Trace:        sequence of observations

Property p:   assigns a reward to each trace
              boolean rewards

Model m:      generates a set of traces
              (game) graph

Value(p,m): defined from the rewards of the
              generated traces
    B         9 or 8 (98)
                      Stochastic Game




       a                            b                     c


   right                        right
left
             1        2      left
                                          1        2
           a: 0.6   a: 0.5              a: 0.0   a: 0.0
   1       b: 0.4   b: 0.5      1       c: 1.0   c: 1.0
           a: 0.1   a: 0.2              a: 0.7   a: 0.0
   2       b: 0.9   b: 0.8      2       b: 0.3   b: 1.0
                             Property c
   Probability with which player "left" can enforce c ?




       a                               b                     c


   right                           right
left
             1        2         left
                                             1        2
           a: 0.6   a: 0.5                 a: 0.0   a: 0.0
   1       b: 0.4   b: 0.5         1       c: 1.0   c: 1.0
           a: 0.1   a: 0.2                 a: 0.7   a: 0.0
   2       b: 0.9   b: 0.8         2       b: 0.3   b: 1.0
Semi-Quantitative Models

Trace:        sequence of observations

Property p:   assigns a reward to each trace
              boolean rewards

Model m:      generates a set of traces
              (game) graph

Value(p,m): defined from the rewards of the
              generated traces
 [0,1] µ R    sup or inf (sup inf)
             A Systems Theory


                  Class of properties p
                      over traces




 Algorithm for
                                          Distance between
  computing
                                             models w.r.t.
Value(p,m) over
                                           property values
   models m
             A Systems Theory

              -regular properties
                  Class of properties p
                      over traces




 Algorithm for
                       GRAPHS             Distance between
  computing
                                             models w.r.t.
Value(p,m) over
                                           property values
   models m
m-calculus                                 bisimilarity
      Transition Graph

Q               states
d: Q  2Q       transition relation
        Graph Regions

Q                       states
d: Q  2Q               transition relation
=[Q!B]                 regions
9pre, 8pre:   



       9pre(R)      9

                            RµQ
                    8
        8pre(R)
Graph Property Values: Reachability
  R
 Given RµQ, find the states from which some trace leads to R.




                                              R
Graph Property Values: Reachability
  R = (m X) (R Ç 9pre(X))
 Given RµQ, find the states from which some trace leads to R.




                  R
                                    R[
                                              R
                                  pre(R)
                ...           R[
                           pre(R) [
                           pre2(R)
          Concurrent Game

Q                    states
l, r               moves of both players
d: Q  l  r  Q   transition function
              Game Regions

Q                                   states
l, r                              moves of both players
d: Q  l  r  Q                  transition function
=[Q!B]                             regions
lpre, rpre:   
        q  lpre(R)           iff      (l  l ) (r  r)
                                       d(q,l,r)  R
                      2,*

                             1,2
       lpre(R)
                                      RµQ
                            1,1
     Game Property Values: Reachability
left R
Given RµQ, find the states from which player "left" has a strategy to
force the game to R.




                                                      R
     Game Property Values: Reachability
left R = (m X) (R Ç lpre(X))
Given RµQ, find the states from which player "left" has a strategy to
force the game to R.




                        left R
                                               R[
                                                       R
                                             lpre(R)
                       ...               R[
                                      lpre(R) [
                                      lpre2(R)
                An Open Systems Theory

                      -regular properties
                             Class of winning
                          conditions p over traces




                                  GAME
         Algorithm for
                                 GRAPHS          Distance between
          computing
                                                    models w.r.t.
        Value(p,m) over
                                                  property values
           models m
(lpre,rpre) fixpoint calculus               alternating bisimilarity
                                             [Alur, H, Kupferman, Vardi]
                An Open Systems Theory

                       -regular properties             hhleftiiR
                             Class of winning
                          conditions p over traces




                                  GAME
         Algorithm for
                                 GRAPHS              Every deterministic
          computing
        Value(p,m) over                              fixpoint formula f
           models m                                  computes Value(p,m),
                                                     where p is the linear
(lpre,rpre) fixpoint calculus                        interpretation [Vardi] of f.
      (m X) (R Ç lpre(X))
           An Open Systems Theory

             Two states agree on the values of all
             fixpoint formulas iff they are alternating
             bisimilar [Alur, H, Kupferman, Vardi].




                           GAME
    Algorithm for
                          GRAPHS           Distance between
     computing
                                              models w.r.t.
   Value(p,m) over
                                            property values
      models m
(lpre,rpre) fixpoint                 alternating bisimilarity
     calculus
                Stochastic Game

Q                          states
l, r                     moves of both players
d: Q  l  r  Dist(Q)   probabilistic transition function
           Quantitative Game Regions

Q                                 states
l, r                            moves of both players
d: Q  l  r  Dist(Q)          probabilistic transition function
 = [ Q ! [0,1] ]                 quantitative regions



lpre, rpre:   
        lpre(R)(q)   =   (sup l  l ) (inf r  r) R(d(q,l,r))
           Quantitative Game Regions

Q                                 states
l, r                            moves of both players
d: Q  l  r  Dist(Q)          probabilistic transition function
 = [ Q ! [0,1] ]                 quantitative regions
                B

lpre, rpre:   
        lpre(R)(q)   =   (sup l  l ) (inf r  r) R(d(q,l,r))
                              9            8
Probability with which player "left" can enforce c :

(m X) (c Ç lpre(X))                  Ç = pointwise max



                 0                       0                     1
       a                             b                     c


   right                         right
left
             1         2      left
                                             1      2
           a: 0.6    a: 0.5              a: 0.0   a: 0.0
   1       b: 0.4    b: 0.5      1       c: 1.0   c: 1.0
           a: 0.1    a: 0.2              a: 0.7   a: 0.0
   2       b: 0.9    b: 0.8      2       b: 0.3   b: 1.0
Probability with which player "left" can enforce c :

(m X) (c Ç lpre(X))                  Ç = pointwise max



                 0                       1                     1
       a                             b                     c


   right                         right
left
             1         2      left
                                             1      2
           a: 0.6    a: 0.5              a: 0.0   a: 0.0
   1       b: 0.4    b: 0.5      1       c: 1.0   c: 1.0
           a: 0.1    a: 0.2              a: 0.7   a: 0.0
   2       b: 0.9    b: 0.8      2       b: 0.3   b: 1.0
Probability with which player "left" can enforce c :

(m X) (c Ç lpre(X))                    Ç = pointwise max



                 0.8                       1                     1
       a                               b                     c


   right                           right
left
             1           2      left
                                               1      2
           a: 0.6      a: 0.5              a: 0.0   a: 0.0
   1       b: 0.4      b: 0.5      1       c: 1.0   c: 1.0
           a: 0.1      a: 0.2              a: 0.7   a: 0.0
   2       b: 0.9      b: 0.8      2       b: 0.3   b: 1.0
Probability with which player "left" can enforce c :

(m X) (c Ç lpre(X))                 Ç = pointwise max



                 0.96                   1                     1
       a                            b                     c


   right                        right
left
             1          2    left
                                            1      2
           a: 0.6   a: 0.5              a: 0.0   a: 0.0
   1       b: 0.4   b: 0.5      1       c: 1.0   c: 1.0
           a: 0.1   a: 0.2              a: 0.7   a: 0.0
   2       b: 0.9   b: 0.8      2       b: 0.3   b: 1.0
Probability with which player "left" can enforce c :

(m X) (c Ç lpre(X))                  Ç = pointwise max



                 1                       1                       1
       a                             b                     c


   right                         right
left
             1         2      left
                                             1      2
           a: 0.6    a: 0.5              a: 0.0   a: 0.0
   1       b: 0.4    b: 0.5      1       c: 1.0   c: 1.0
           a: 0.1    a: 0.2              a: 0.7   a: 0.0
   2       b: 0.9    b: 0.8      2       b: 0.3   b: 1.0


In the limit, the deterministic fixpoint formulas work for all
-regular properties [de Alfaro, Majumdar].
            A Probabilistic Systems Theory

                      -regular properties
                          Class of properties p
                              over traces




                              MARKOV
         Algorithm for        DECISION
                             PROCESSES            Distance between
          computing
                                                     models w.r.t.
        Value(p,m) over
                                                   property values
           models m
quantitative fixpoint calculus             quantitative bisimilarity
                                           [Desharnais, Gupta, Jagadeesan,
                                                    Panangaden]
            A Probabilistic Systems Theory

              quantitative -regular properties
                            Class of properties p
                                                        max expected value
                                over traces
                                                         of satisfying R



                                MARKOV
         Algorithm for          DECISION
                               PROCESSES            Every deterministic
          computing
                                                    fixpoint formula f
        Value(p,m) over
                                                    computes expected
           models m
                                                    Value(p,m), where p
quantitative fixpoint calculus                      is the linear
                                                    interpretation of f.
      (m X) (R Ç 9pre(X))
                Qualitative Bisimilarity


e: Q2 ! {0,1}                   … equivalence relation
F                               … function on equivalences
       F(e)(q,q') = 0      if q and q' disagree on observations
                  = min    { e(r,r’) | r2 9pre(q) Æ r’2 9pre(q’) }

else

Qualitative bisimilarity        … greatest fixpoint of F
                Quantitative Bisimilarity


d: Q2 ! [0,1]                       … pseudo-metric ("distance")
F                                   … function on pseudo-metrics
       F(d)(q,q') = 1      if q and q' disagree on observations
                  ¼ max of supl infr d(d(q,l,r),d(q',l,r))
                           supr infl d(d(q,l,r),d(q',l,r)) else

Quantitative bisimilarity           … greatest fixpoint of F


                Natural generalization of bisimilarity from
                binary relations to pseudo-metrics.
        A Probabilistic Systems Theory

            Two states agree on the values of all
            quantitative fixpoint formulas iff their
            quantitative bisimilarity distance is 0.




                         MARKOV
     Algorithm for       DECISION
                        PROCESSES            Distance between
      computing
                                                models w.r.t.
    Value(p,m) over
                                              property values
       models m
quantitative fixpoint                 quantitative bisimilarity
     calculus
           Great        BUT …


1 The theory is too precise.
      Even the smallest change in the probability of a
      transition can cause an arbitrarily large change
      in the value of a property.

2 The theory is not computational.
      We cannot bound the rate of convergence for
      quantitative fixpoint formulas.
           Solution: Discounting

Economics:
     A dollar today is better than a dollar tomorrow.
     Value of $1.- today:     1
     Tomorrow:                a     for discount factor 0 < a < 1
     Day after tomorrow:      a2
     etc.
           Solution: Discounting

Economics:
     A dollar today is better than a dollar tomorrow.
     Value of $1.- today:     1
     Tomorrow:                a     for discount factor 0 < a < 1
     Day after tomorrow:      a2
     etc.

Engineering:
     A bug today is worse than a bug tomorrow.
         Discounted Reachability


Reward (a c ) = ak              if       c is first true after
                                          k transitions
                        0        if       c is never true

The reward is proportional to how quickly c is satisfied.
         Discounted Property a c


                        a               1
     a              b               c

a2

                   a c
         Discounted Property a c


                               a                   1
     a                     b                   c

a2

                         a c


Discounted fixpoint calculus: pre(f)   a ¢ pre(f)
Fully Quantitative Models

Trace:        sequence of observations

Property p:   assigns a reward to each trace
              real reward

Model m:      generates a set of traces
              (game) graph

Value(p,m): defined from the rewards of the
              generated traces
 [0,1] µ R    sup or inf (sup inf)
                Discounted Bisimilarity


d: Q2 ! [0,1]                   … pseudo-metric ("distance")
F                               … function on pseudo-metrics
       F(d)(q,q') = 1      if q and q' disagree on observations
                  ¼ max of supl infr d(d(q,l,r),d(q',l,r))
                           supr infl d(d(q,l,r),d(q',l,r)) else

Quantitative bisimilarity               … greatest fixpoint of F

                 a
                 ¢
           A Discounted Systems Theory

              discounted -regular properties
                           Class of winning
                         rewards p over traces




        Algorithm for       STOCHASTIC
                              GAMES              Distance between
         computing
                                                    models w.r.t.
       Value(p,m) over
                                                  property values
          models m
discounted fixpoint calculus            discounted bisimilarity
             A Discounted Systems Theory

           discounted -regular properties                  max expected
                                                            reward a R
                            Class of expected               achievable by
                          rewards p over traces               left player




         Algorithm for       STOCHASTIC           Every discounted
          computing            GAMES              deterministic fixpoint
        Value(p,m) over                           formula f computes
           models m                               Value(p,m), where p
                                                  is the linear
discounted fixpoint calculus                      interpretation of f.
     (m X) (R Ç a ¢
     lpre(X))
       A Discounted Systems Theory

       The difference between two states in the values
       of discounted fixpoint formulas is bounded by
       their discounted bisimilarity distance.




    Algorithm for      STOCHASTIC
                         GAMES           Distance between
     computing
                                            models w.r.t.
   Value(p,m) over
                                          property values
      models m
discounted fixpoint               discounted bisimilarity
     calculus
            Discounting is Robust

Continuity over Traces:
        Every discounted fixpoint formula defines a reward
        function on traces that is continuous in the Cantor metric.

Continuity over Models:
        If transition probabilities are perturbed by e, then
        discounted bisimilarity distances change by at most f(e).

 Discounting is robust against effects at infinity,
 and against numerical perturbations.
 Discounting is Computational



The iterative evaluation of an a-discounted
fixpoint formula converges geometrically in a.

(So we can compute to any desired precision.)
  Discounting is Approximation


If the discount factor tends towards 1,
then we recover the classical theory:
• lima! 1 a-discounted interpretation of fixpoint formula f
           = classical interpretation of f
• lima! 1 a-discounted bisimilarity
           = classical (alternating; quantitative) bisimilarity
             Further Work

•   Exact computation of discounted values of
    temporal formulas over finite-state systems
    [de Alfaro, Faella, H, Majumdar, Stoelinga].
•   Discounting real-time systems: continuous
    discounting of time delay rather than discrete
    discounting of number of steps [Prabhu].
              Conclusions

•   Discounting provides a continuous and
    computational approximation theory of
    discrete and probabilistic processes.
•   Discounting captures an important
    engineering intuition.
    "In the long run, we're all dead." J.M. Keynes

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:9/29/2011
language:English
pages:55