Docstoc

V π_s_

Document Sample
V π_s_ Powered By Docstoc
					                          ∞
     V π (s)      =                E γ t−1 rt | s0 = s                                                          (1)
                          t=1


                  =                π(s, a) R(s, a) + γ                      P (s, s , a)V π (s )                (2)
                              a                                         s



     dπ (s )      =       lim P r {st = s | s0 , π}                             (does not depend on s0 )        (3)
                          t→∞

                  =                dπ (s)              π(s, a)P (s, s , a)                                      (4)
                              s                   a



                                     T
         π          1
     ρ       = lim                           rt         (does not depend on s0 )                                (5)
               T →∞ T
                                    t=1

             =            dπ (s)              π(s, a)R(s, a)                                                    (6)
                      s                  a

In trying to form an overall discounted performance measure for π, can we use
              π    π
J(π) =     s d (s)V (s)? It turns out we then end up with no effect of the
discounting:

     J(π)        =                dπ (s)V π (s)                                                                 (7)
                          s


                 =                dπ (s)              π(s, a) R(s, a) + γ               P (s, s , a)V π (s )    (8)
                          s                   a                                     s

                 = ρπ + γ                     dπ (s)              π(s, a)        P (s, s , a)V π (s )           (9)
                                         s                a                 s

                 = ρπ + γ                     V π (s )             dπ (s)        π(s, a)P (s, s , a)           (10)
                                         s                    s             a

                 = ρπ + γ                     V π (s )dπ (s )                                                  (11)
                                         s
                 = ρπ + γJ(π)                                                                                  (12)
                 = ρπ + γρπ + γ 2 J(π)                                                                         (13)
                 = ρπ + γρπ + γ 2 ρπ + γ 3 ρπ + · · ·                                                          (14)
                     1
                 =      ρπ                                                                                     (15)
                   1−γ
which is basically a scaled ρπ , with no effect of discounting.




                                                                    1

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:4/22/2013
language:Unknown
pages:1