PPT

Document Sample

```					        Discounting the Future
in Systems Theory
Luca de Alfaro, UC Santa Cruz
Tom Henzinger, UC Berkeley
Rupak Majumdar, UC Los Angeles

Chess Review
May 11, 2005
Berkeley, CA
A Graph Model of a System

a               b               c
Property c ("eventually c")

a                b                 c
Property c ("eventually c")

a                  b                     c

 c       … some trace has the property c
Property c ("eventually c")

a                  b                      c

 c       … some trace has the property c
 c       … all traces have the property c
Richer Models

FAIRNESS:
-automaton
Parity game
graph     CONCURRENCY:
game graph
Stochastic game

PROBABILITIES:
Markov decision process
Concurrent Game
1,1
2,2
1,1
1,2                      1,2
2,1                      2,2
a                        b                       c
2,1

player "left"
player "right"

-for modeling open systems [Abramsky, Alur, Kupferman, Vardi, …]
-for strategy synthesis ("control") [Ramadge, Wonham, Pnueli, Rosner]
Property c
1,1
2,2
1,1
1,2                   1,2
2,1                   2,2
a                      b                       c
2,1

hhleftii c    … player "left" has a strategy to enforce c
Property c
1,1
2,2
1,1
1,2                   1,2
2,1                   2,2
Pr(1): 0.5      a                      b                       c
Pr(2): 0.5                 2,1

hhleftii c    … player "left" has a strategy to enforce c
left c      … player “left" has a randomized strategy to
enforce c
Qualitative Models

Trace:        sequence of observations

Property p:   assigns a reward to each trace
boolean rewards

Model m:      generates a set of traces
(game) graph

Value(p,m): defined from the rewards of the
generated traces
B         9 or 8 (98)
Stochastic Game

a                            b                     c

right                        right
left
1        2      left
1        2
a: 0.6   a: 0.5              a: 0.0   a: 0.0
1       b: 0.4   b: 0.5      1       c: 1.0   c: 1.0
a: 0.1   a: 0.2              a: 0.7   a: 0.0
2       b: 0.9   b: 0.8      2       b: 0.3   b: 1.0
Property c
Probability with which player "left" can enforce c ?

a                               b                     c

right                           right
left
1        2         left
1        2
a: 0.6   a: 0.5                 a: 0.0   a: 0.0
1       b: 0.4   b: 0.5         1       c: 1.0   c: 1.0
a: 0.1   a: 0.2                 a: 0.7   a: 0.0
2       b: 0.9   b: 0.8         2       b: 0.3   b: 1.0
Semi-Quantitative Models

Trace:        sequence of observations

Property p:   assigns a reward to each trace
boolean rewards

Model m:      generates a set of traces
(game) graph

Value(p,m): defined from the rewards of the
generated traces
[0,1] µ R    sup or inf (sup inf)
A Systems Theory

Class of properties p
over traces

Algorithm for
Distance between
computing
models w.r.t.
Value(p,m) over
property values
models m
A Systems Theory

-regular properties
Class of properties p
over traces

Algorithm for
GRAPHS             Distance between
computing
models w.r.t.
Value(p,m) over
property values
models m
m-calculus                                 bisimilarity
Transition Graph

Q               states
d: Q  2Q       transition relation
Graph Regions

Q                       states
d: Q  2Q               transition relation
=[Q!B]                 regions
9pre, 8pre:   

9pre(R)      9

RµQ
8
8pre(R)
Graph Property Values: Reachability
 R
Given RµQ, find the states from which some trace leads to R.

R
Graph Property Values: Reachability
 R = (m X) (R Ç 9pre(X))
Given RµQ, find the states from which some trace leads to R.

R
R[
R
pre(R)
...           R[
pre(R) [
pre2(R)
Concurrent Game

Q                    states
l, r               moves of both players
d: Q  l  r  Q   transition function
Game Regions

Q                                   states
l, r                              moves of both players
d: Q  l  r  Q                  transition function
=[Q!B]                             regions
lpre, rpre:   
q  lpre(R)           iff      (l  l ) (r  r)
d(q,l,r)  R
2,*

1,2
lpre(R)
RµQ
1,1
Game Property Values: Reachability
left R
Given RµQ, find the states from which player "left" has a strategy to
force the game to R.

R
Game Property Values: Reachability
left R = (m X) (R Ç lpre(X))
Given RµQ, find the states from which player "left" has a strategy to
force the game to R.

left R
R[
R
lpre(R)
...               R[
lpre(R) [
lpre2(R)
An Open Systems Theory

-regular properties
Class of winning
conditions p over traces

GAME
Algorithm for
GRAPHS          Distance between
computing
models w.r.t.
Value(p,m) over
property values
models m
(lpre,rpre) fixpoint calculus               alternating bisimilarity
[Alur, H, Kupferman, Vardi]
An Open Systems Theory

-regular properties             hhleftiiR
Class of winning
conditions p over traces

GAME
Algorithm for
GRAPHS              Every deterministic
computing
Value(p,m) over                              fixpoint formula f
models m                                  computes Value(p,m),
where p is the linear
(lpre,rpre) fixpoint calculus                        interpretation [Vardi] of f.
(m X) (R Ç lpre(X))
An Open Systems Theory

Two states agree on the values of all
fixpoint formulas iff they are alternating
bisimilar [Alur, H, Kupferman, Vardi].

GAME
Algorithm for
GRAPHS           Distance between
computing
models w.r.t.
Value(p,m) over
property values
models m
(lpre,rpre) fixpoint                 alternating bisimilarity
calculus
Stochastic Game

Q                          states
l, r                     moves of both players
d: Q  l  r  Dist(Q)   probabilistic transition function
Quantitative Game Regions

Q                                 states
l, r                            moves of both players
d: Q  l  r  Dist(Q)          probabilistic transition function
 = [ Q ! [0,1] ]                 quantitative regions

lpre, rpre:   
lpre(R)(q)   =   (sup l  l ) (inf r  r) R(d(q,l,r))
Quantitative Game Regions

Q                                 states
l, r                            moves of both players
d: Q  l  r  Dist(Q)          probabilistic transition function
 = [ Q ! [0,1] ]                 quantitative regions
B

lpre, rpre:   
lpre(R)(q)   =   (sup l  l ) (inf r  r) R(d(q,l,r))
9            8
Probability with which player "left" can enforce c :

(m X) (c Ç lpre(X))                  Ç = pointwise max

0                       0                     1
a                             b                     c

right                         right
left
1         2      left
1      2
a: 0.6    a: 0.5              a: 0.0   a: 0.0
1       b: 0.4    b: 0.5      1       c: 1.0   c: 1.0
a: 0.1    a: 0.2              a: 0.7   a: 0.0
2       b: 0.9    b: 0.8      2       b: 0.3   b: 1.0
Probability with which player "left" can enforce c :

(m X) (c Ç lpre(X))                  Ç = pointwise max

0                       1                     1
a                             b                     c

right                         right
left
1         2      left
1      2
a: 0.6    a: 0.5              a: 0.0   a: 0.0
1       b: 0.4    b: 0.5      1       c: 1.0   c: 1.0
a: 0.1    a: 0.2              a: 0.7   a: 0.0
2       b: 0.9    b: 0.8      2       b: 0.3   b: 1.0
Probability with which player "left" can enforce c :

(m X) (c Ç lpre(X))                    Ç = pointwise max

0.8                       1                     1
a                               b                     c

right                           right
left
1           2      left
1      2
a: 0.6      a: 0.5              a: 0.0   a: 0.0
1       b: 0.4      b: 0.5      1       c: 1.0   c: 1.0
a: 0.1      a: 0.2              a: 0.7   a: 0.0
2       b: 0.9      b: 0.8      2       b: 0.3   b: 1.0
Probability with which player "left" can enforce c :

(m X) (c Ç lpre(X))                 Ç = pointwise max

0.96                   1                     1
a                            b                     c

right                        right
left
1          2    left
1      2
a: 0.6   a: 0.5              a: 0.0   a: 0.0
1       b: 0.4   b: 0.5      1       c: 1.0   c: 1.0
a: 0.1   a: 0.2              a: 0.7   a: 0.0
2       b: 0.9   b: 0.8      2       b: 0.3   b: 1.0
Probability with which player "left" can enforce c :

(m X) (c Ç lpre(X))                  Ç = pointwise max

1                       1                       1
a                             b                     c

right                         right
left
1         2      left
1      2
a: 0.6    a: 0.5              a: 0.0   a: 0.0
1       b: 0.4    b: 0.5      1       c: 1.0   c: 1.0
a: 0.1    a: 0.2              a: 0.7   a: 0.0
2       b: 0.9    b: 0.8      2       b: 0.3   b: 1.0

In the limit, the deterministic fixpoint formulas work for all
-regular properties [de Alfaro, Majumdar].
A Probabilistic Systems Theory

-regular properties
Class of properties p
over traces

MARKOV
Algorithm for        DECISION
PROCESSES            Distance between
computing
models w.r.t.
Value(p,m) over
property values
models m
quantitative fixpoint calculus             quantitative bisimilarity
A Probabilistic Systems Theory

quantitative -regular properties
Class of properties p
max expected value
over traces
of satisfying R

MARKOV
Algorithm for          DECISION
PROCESSES            Every deterministic
computing
fixpoint formula f
Value(p,m) over
computes expected
models m
Value(p,m), where p
quantitative fixpoint calculus                      is the linear
interpretation of f.
(m X) (R Ç 9pre(X))
Qualitative Bisimilarity

e: Q2 ! {0,1}                   … equivalence relation
F                               … function on equivalences
F(e)(q,q') = 0      if q and q' disagree on observations
= min    { e(r,r’) | r2 9pre(q) Æ r’2 9pre(q’) }

else

Qualitative bisimilarity        … greatest fixpoint of F
Quantitative Bisimilarity

d: Q2 ! [0,1]                       … pseudo-metric ("distance")
F                                   … function on pseudo-metrics
F(d)(q,q') = 1      if q and q' disagree on observations
¼ max of supl infr d(d(q,l,r),d(q',l,r))
supr infl d(d(q,l,r),d(q',l,r)) else

Quantitative bisimilarity           … greatest fixpoint of F

Natural generalization of bisimilarity from
binary relations to pseudo-metrics.
A Probabilistic Systems Theory

Two states agree on the values of all
quantitative fixpoint formulas iff their
quantitative bisimilarity distance is 0.

MARKOV
Algorithm for       DECISION
PROCESSES            Distance between
computing
models w.r.t.
Value(p,m) over
property values
models m
quantitative fixpoint                 quantitative bisimilarity
calculus
Great        BUT …

1 The theory is too precise.
Even the smallest change in the probability of a
transition can cause an arbitrarily large change
in the value of a property.

2 The theory is not computational.
We cannot bound the rate of convergence for
quantitative fixpoint formulas.
Solution: Discounting

Economics:
A dollar today is better than a dollar tomorrow.
Value of \$1.- today:     1
Tomorrow:                a     for discount factor 0 < a < 1
Day after tomorrow:      a2
etc.
Solution: Discounting

Economics:
A dollar today is better than a dollar tomorrow.
Value of \$1.- today:     1
Tomorrow:                a     for discount factor 0 < a < 1
Day after tomorrow:      a2
etc.

Engineering:
A bug today is worse than a bug tomorrow.
Discounted Reachability

Reward (a c ) = ak              if       c is first true after
k transitions
0        if       c is never true

The reward is proportional to how quickly c is satisfied.
Discounted Property a c

a               1
a              b               c

a2

 a c
Discounted Property a c

a                   1
a                     b                   c

a2

 a c

Discounted fixpoint calculus: pre(f)   a ¢ pre(f)
Fully Quantitative Models

Trace:        sequence of observations

Property p:   assigns a reward to each trace
real reward

Model m:      generates a set of traces
(game) graph

Value(p,m): defined from the rewards of the
generated traces
[0,1] µ R    sup or inf (sup inf)
Discounted Bisimilarity

d: Q2 ! [0,1]                   … pseudo-metric ("distance")
F                               … function on pseudo-metrics
F(d)(q,q') = 1      if q and q' disagree on observations
¼ max of supl infr d(d(q,l,r),d(q',l,r))
supr infl d(d(q,l,r),d(q',l,r)) else

Quantitative bisimilarity               … greatest fixpoint of F

a
¢
A Discounted Systems Theory

discounted -regular properties
Class of winning
rewards p over traces

Algorithm for       STOCHASTIC
GAMES              Distance between
computing
models w.r.t.
Value(p,m) over
property values
models m
discounted fixpoint calculus            discounted bisimilarity
A Discounted Systems Theory

discounted -regular properties                  max expected
reward a R
Class of expected               achievable by
rewards p over traces               left player

Algorithm for       STOCHASTIC           Every discounted
computing            GAMES              deterministic fixpoint
Value(p,m) over                           formula f computes
models m                               Value(p,m), where p
is the linear
discounted fixpoint calculus                      interpretation of f.
(m X) (R Ç a ¢
lpre(X))
A Discounted Systems Theory

The difference between two states in the values
of discounted fixpoint formulas is bounded by
their discounted bisimilarity distance.

Algorithm for      STOCHASTIC
GAMES           Distance between
computing
models w.r.t.
Value(p,m) over
property values
models m
discounted fixpoint               discounted bisimilarity
calculus
Discounting is Robust

Continuity over Traces:
Every discounted fixpoint formula defines a reward
function on traces that is continuous in the Cantor metric.

Continuity over Models:
If transition probabilities are perturbed by e, then
discounted bisimilarity distances change by at most f(e).

Discounting is robust against effects at infinity,
and against numerical perturbations.
Discounting is Computational

The iterative evaluation of an a-discounted
fixpoint formula converges geometrically in a.

(So we can compute to any desired precision.)
Discounting is Approximation

If the discount factor tends towards 1,
then we recover the classical theory:
• lima! 1 a-discounted interpretation of fixpoint formula f
= classical interpretation of f
• lima! 1 a-discounted bisimilarity
= classical (alternating; quantitative) bisimilarity
Further Work

•   Exact computation of discounted values of
temporal formulas over finite-state systems
[de Alfaro, Faella, H, Majumdar, Stoelinga].
•   Discounting real-time systems: continuous
discounting of time delay rather than discrete
discounting of number of steps [Prabhu].
Conclusions

•   Discounting provides a continuous and
computational approximation theory of
discrete and probabilistic processes.
•   Discounting captures an important
engineering intuition.
"In the long run, we're all dead." J.M. Keynes

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 2 posted: 9/29/2011 language: English pages: 55