Coastal by panniuniu

VIEWS: 11 PAGES: 32

									   Finding Approximate POMDP
Solutions through Belief Compression



                     Based on slides by
                  Nicholas Roy, MIT
            Reliable Navigation

 Conventional
 trajectories may not
 be robust to
 localisation error



  Estimated robot position
Robot position distribution
       True robot position
             Goal position
Perception and Control


     Control algorithms



     Perception    Control




         World state
                  Perception and Control




 Assumed full                                                      Exact POMDP
 observability                                                     planning

Probabilistic                                  Probabilistic
 Perception     P(x)   argmax P(x)   Control    Perception      P(x)   Control
   Model                                          Model



                World state                              World state
                Perception and Control




Assume full                                                           Exact POMDP
observability                                                         planning



                Probabilistic
                 Perception     P(x)      Compressed P(x)   Control
                   Model


                                  World state
                        Main Insight


Good policies for real world POMDPs can be found
 by planning over low-dimensional representations
                of the belief space.


        Probabilistic
         Perception      P(x)    Low-dimensional P(x)   Control
           Model


                           World state
           Belief Space Structure



The controller may be globally
uncertain...
but not usually.
        Coastal Navigation

Represent beliefs using

        ~
        b  arg max b( s); H (b)
                  s


Discretise into low-dimensional belief space
MDP
Coastal Navigation
                         A Hard Navigation Problem

                    Average Distance to Goal
                9

                8

                7

                6
Distance in M




                5

                4

                3

                2

                1

                0
                     Maximum Likelihood   AMDP
   Dimensionality Reduction

Principal Components Analysis


     Characteristic
                                  Weights
        Beliefs



               Original Beliefs
Principal Components Analysis

                                                          ~ m
Given belief   bn,                              we want b , m«n.

Collection of

                  Probability of being in state
beliefs drawn
from 200 state
problem




                                                              State
Principal Components Analysis

                                                          ~ m
Given belief   bn,                              we want b , m«n.

One sample
m=9 gives this

                  Probability of being in state
distribution
representation
for one sample
distribution




                                                              State
Principal Components Analysis




   Many real world POMDP distributions are
 characterised by large regions of low probability.

Idea: Create fitting criterion that is (exponentially)
   stronger in low-probability regions (E-PCA)
                                Example EPCA



                                           4 basis
                                           1 bases
                                           2
                                           3 bases
Probability of being in state




                                State
Example Reduction
      Finding Dimensionality

E-PCA will indicate
appropriate number of
bases, depending on
beliefs encountered
                    Planning


S1

                   E-PCA           Discretise


              S2

        S3
 Original POMDP      Low-dimensional   Discrete belief
                                   ~
                      belief space B    space MDP
            Model Parameters

Reward function
                                  p(s)


              Back-project to high
               dimensional belief
                                          s1     s2     s3

       ~             Compute expected reward from belief:
     R(b)
                       R (b)  Eb ( R ( s ))   p ( s ) R ( s )
                        ~
                                                 S
                       Model Parameters

                     ~
1. For each belief bi and action a
                         ~
2. Recover full belief bi              ~                          ~
3. Propagate according to
                                       bi                         bj
   action                                                              Low dimension
4. Propagate according to                                              Full dimension
   observation
             ~
5. Recover bj                          bi                         bj
         ~      ~
6. Set T(bi, a, bj) to probability
   of observation

                         |Z b | |S |         |S |
          ~ ~
       T (bi , a, b j )   p ( z k | sl ) p ( sl | sm , a )b j ( sm )
                         k 1 l 1          m 1
Robot Navigation Example




                                  Initial Distribution
 Goal state




   True (hidden) robot position
                 Goal position
Robot Navigation Example




  True robot position
       Goal position
                            Policy Comparison

                           Average Distance to Goal
                9

                8

                7

                6
Distance in M




                5

                4

                3

                2

                1

                0
                    Maximum Likelihood   AMDP    E-PCA
                                                 6 bases
People Finding
      People Finding as a POMDP



                                   Fully Observable
                                   Robot



Position of person unknown
                                   Robot position
                             True person position
Finding and Tracking People




      Robot position
True person position
People Finding as a POMDP




Factored belief space
   2 dimensions: fully-observable robot position
   6 dimensions: distribution over person positions


     Regular grid gives ≈ 1016 states
        Variable Resolution

Non-regular grid using samples
               ~        ~
              T(b1, a1, b2)
         ~
         b1          ~
                     b2       ~
                              b3   ~
                                   b4

           ~       ~                    ~
         T(b1, a2, b5)                  b5




Compute model parameters using nearest-neighbour
  Refining the Grid

             ~
         V(b1 )
           b


         ~
         ~
       V(b'1)
         b'


Sample beliefs according to policy
Construct new model
                      ~      ~
Keep new belief if V(b'1) > V(b1)
             The Optimal Policy


Original distribution




 Reconstruction using
  EPCA and 6 bases




      Robot position
True person position
                                               Policy Comparison
Average # of Actions to find Person

                                                        Average time to find person
                                      250


                                      200


                                      150


                                      100


                                       50   Fully observable MDP

                                        0
                                              Closest     Densest   Maximum          E-PCA       Refined
                                                                    Likelihood                   E-PCA

                                                                                          E-PCA: 72 states
                                                                                 Refined E-PCA: 260 states
  Nick’s Thesis Contributions

Good policies for real world POMDPs can be
found by planning over a low-dimensional
representation of the belief space, using E-PCA.


POMDPs can scale to bigger, more complicated
real-world problems.
POMDPs can be used for real deployed robots.

								
To top