Finding Approximate POMDP Solutions through Belief Compression Based on slides by Nicholas Roy, MIT Reliable Navigation Conventional trajectories may not be robust to localisation error Estimated robot position Robot position distribution True robot position Goal position Perception and Control Control algorithms Perception Control World state Perception and Control Assumed full Exact POMDP observability planning Probabilistic Probabilistic Perception P(x) argmax P(x) Control Perception P(x) Control Model Model World state World state Perception and Control Assume full Exact POMDP observability planning Probabilistic Perception P(x) Compressed P(x) Control Model World state Main Insight Good policies for real world POMDPs can be found by planning over low-dimensional representations of the belief space. Probabilistic Perception P(x) Low-dimensional P(x) Control Model World state Belief Space Structure The controller may be globally uncertain... but not usually. Coastal Navigation Represent beliefs using ~ b arg max b( s); H (b) s Discretise into low-dimensional belief space MDP Coastal Navigation A Hard Navigation Problem Average Distance to Goal 9 8 7 6 Distance in M 5 4 3 2 1 0 Maximum Likelihood AMDP Dimensionality Reduction Principal Components Analysis Characteristic Weights Beliefs Original Beliefs Principal Components Analysis ~ m Given belief bn, we want b , m«n. Collection of Probability of being in state beliefs drawn from 200 state problem State Principal Components Analysis ~ m Given belief bn, we want b , m«n. One sample m=9 gives this Probability of being in state distribution representation for one sample distribution State Principal Components Analysis Many real world POMDP distributions are characterised by large regions of low probability. Idea: Create fitting criterion that is (exponentially) stronger in low-probability regions (E-PCA) Example EPCA 4 basis 1 bases 2 3 bases Probability of being in state State Example Reduction Finding Dimensionality E-PCA will indicate appropriate number of bases, depending on beliefs encountered Planning S1 E-PCA Discretise S2 S3 Original POMDP Low-dimensional Discrete belief ~ belief space B space MDP Model Parameters Reward function p(s) Back-project to high dimensional belief s1 s2 s3 ~ Compute expected reward from belief: R(b) R (b) Eb ( R ( s )) p ( s ) R ( s ) ~ S Model Parameters ~ 1. For each belief bi and action a ~ 2. Recover full belief bi ~ ~ 3. Propagate according to bi bj action Low dimension 4. Propagate according to Full dimension observation ~ 5. Recover bj bi bj ~ ~ 6. Set T(bi, a, bj) to probability of observation |Z b | |S | |S | ~ ~ T (bi , a, b j ) p ( z k | sl ) p ( sl | sm , a )b j ( sm ) k 1 l 1 m 1 Robot Navigation Example Initial Distribution Goal state True (hidden) robot position Goal position Robot Navigation Example True robot position Goal position Policy Comparison Average Distance to Goal 9 8 7 6 Distance in M 5 4 3 2 1 0 Maximum Likelihood AMDP E-PCA 6 bases People Finding People Finding as a POMDP Fully Observable Robot Position of person unknown Robot position True person position Finding and Tracking People Robot position True person position People Finding as a POMDP Factored belief space 2 dimensions: fully-observable robot position 6 dimensions: distribution over person positions Regular grid gives ≈ 1016 states Variable Resolution Non-regular grid using samples ~ ~ T(b1, a1, b2) ~ b1 ~ b2 ~ b3 ~ b4 ~ ~ ~ T(b1, a2, b5) b5 Compute model parameters using nearest-neighbour Refining the Grid ~ V(b1 ) b ~ ~ V(b'1) b' Sample beliefs according to policy Construct new model ~ ~ Keep new belief if V(b'1) > V(b1) The Optimal Policy Original distribution Reconstruction using EPCA and 6 bases Robot position True person position Policy Comparison Average # of Actions to find Person Average time to find person 250 200 150 100 50 Fully observable MDP 0 Closest Densest Maximum E-PCA Refined Likelihood E-PCA E-PCA: 72 states Refined E-PCA: 260 states Nick’s Thesis Contributions Good policies for real world POMDPs can be found by planning over a low-dimensional representation of the belief space, using E-PCA. POMDPs can scale to bigger, more complicated real-world problems. POMDPs can be used for real deployed robots.
Pages to are hidden for
"Coastal"Please download to view full document