# Coastal by panniuniu

VIEWS: 11 PAGES: 32

• pg 1
```									   Finding Approximate POMDP
Solutions through Belief Compression

Based on slides by
Nicholas Roy, MIT

Conventional
trajectories may not
be robust to
localisation error

Estimated robot position
Robot position distribution
True robot position
Goal position
Perception and Control

Control algorithms

Perception    Control

World state
Perception and Control

Assumed full                                                      Exact POMDP
observability                                                     planning

Probabilistic                                  Probabilistic
Perception     P(x)   argmax P(x)   Control    Perception      P(x)   Control
Model                                          Model

World state                              World state
Perception and Control

Assume full                                                           Exact POMDP
observability                                                         planning

Probabilistic
Perception     P(x)      Compressed P(x)   Control
Model

World state
Main Insight

Good policies for real world POMDPs can be found
by planning over low-dimensional representations
of the belief space.

Probabilistic
Perception      P(x)    Low-dimensional P(x)   Control
Model

World state
Belief Space Structure

The controller may be globally
uncertain...
but not usually.

Represent beliefs using

~
b  arg max b( s); H (b)
s

Discretise into low-dimensional belief space
MDP
A Hard Navigation Problem

Average Distance to Goal
9

8

7

6
Distance in M

5

4

3

2

1

0
Maximum Likelihood   AMDP
Dimensionality Reduction

Principal Components Analysis

Characteristic
Weights
Beliefs

Original Beliefs
Principal Components Analysis

~ m
Given belief   bn,                              we want b , m«n.

Collection of

Probability of being in state
beliefs drawn
from 200 state
problem

State
Principal Components Analysis

~ m
Given belief   bn,                              we want b , m«n.

One sample
m=9 gives this

Probability of being in state
distribution
representation
for one sample
distribution

State
Principal Components Analysis

Many real world POMDP distributions are
characterised by large regions of low probability.

Idea: Create fitting criterion that is (exponentially)
stronger in low-probability regions (E-PCA)
Example EPCA

4 basis
1 bases
2
3 bases
Probability of being in state

State
Example Reduction
Finding Dimensionality

E-PCA will indicate
appropriate number of
bases, depending on
beliefs encountered
Planning

S1

E-PCA           Discretise

S2

S3
Original POMDP      Low-dimensional   Discrete belief
~
belief space B    space MDP
Model Parameters

Reward function
p(s)

Back-project to high
dimensional belief
s1     s2     s3

~             Compute expected reward from belief:
R(b)
R (b)  Eb ( R ( s ))   p ( s ) R ( s )
~
S
Model Parameters

~
1. For each belief bi and action a
~
2. Recover full belief bi              ~                          ~
3. Propagate according to
bi                         bj
action                                                              Low dimension
4. Propagate according to                                              Full dimension
observation
~
5. Recover bj                          bi                         bj
~      ~
6. Set T(bi, a, bj) to probability
of observation

|Z b | |S |         |S |
~ ~
T (bi , a, b j )   p ( z k | sl ) p ( sl | sm , a )b j ( sm )
k 1 l 1          m 1

Initial Distribution
Goal state

True (hidden) robot position
Goal position

True robot position
Goal position
Policy Comparison

Average Distance to Goal
9

8

7

6
Distance in M

5

4

3

2

1

0
Maximum Likelihood   AMDP    E-PCA
6 bases
People Finding
People Finding as a POMDP

Fully Observable
Robot

Position of person unknown
Robot position
True person position
Finding and Tracking People

Robot position
True person position
People Finding as a POMDP

Factored belief space
2 dimensions: fully-observable robot position
6 dimensions: distribution over person positions

Regular grid gives ≈ 1016 states
Variable Resolution

Non-regular grid using samples
~        ~
T(b1, a1, b2)
~
b1          ~
b2       ~
b3   ~
b4

~       ~                    ~
T(b1, a2, b5)                  b5

Compute model parameters using nearest-neighbour
Refining the Grid

~
V(b1 )
b

~
~
V(b'1)
b'

Sample beliefs according to policy
Construct new model
~      ~
Keep new belief if V(b'1) > V(b1)
The Optimal Policy

Original distribution

Reconstruction using
EPCA and 6 bases

Robot position
True person position
Policy Comparison
Average # of Actions to find Person

Average time to find person
250

200

150

100

50   Fully observable MDP

0
Closest     Densest   Maximum          E-PCA       Refined
Likelihood                   E-PCA

E-PCA: 72 states
Refined E-PCA: 260 states
Nick’s Thesis Contributions

Good policies for real world POMDPs can be
found by planning over a low-dimensional
representation of the belief space, using E-PCA.

POMDPs can scale to bigger, more complicated
real-world problems.
POMDPs can be used for real deployed robots.

```
To top