Embed
Email

Poster

Document Sample

Shared by: linzhengnd
Categories
Tags
Stats
views:
2
posted:
11/16/2011
language:
English
pages:
1
Solving Factored MDPs with Continuous and Discrete Variables

Carlos Guestrin Milos Hauskrecht Branislav Kveton

Intel Research, Berkeley Department of Computer Science Intelligent Systems Program







Introduction Approximate LP for HMDPs Factored -HALP Algorithm Experimental Results

Hybrid Markov Decision Processes Linear Value Function Factored -HALP Formulation Irrigation Network Example



 Many real-world stochastic planning problems have continuous  Value function represented as a linear combination of k basis  HALP formulation contains infinite number of constraints, one  Irrigation network is a network of irrigation channels connected

and discrete variables, naturally formulated as hybrid MDPs functions: k for each state x and action a by regulation devices

(HMDPs) V( x )   w i fi x   Discretization of continuous state and action variables to (1 /

 There are few methods for solving Hybrid MDPs i1

2 + 1) equally spaced values Large irrigation network n-ring-of-rings topology

Inflow

 Basis functions fi(x) depend on continuous and discrete regulation

device

variables. Optimization is performed over weights w

Hybrid MDPs are Complex to Solve  Total number points per factor exponential only in the Outflow

dimension of factor regulation





 Traditional solution techniques are affected by the curse of

HALP Formulation  Number of constraints is finite, although exponential in the

device



n-ring topology

dimensionality number of variables

 Discrete-state MDPs  Hybrid approximate LP (HALP) formulation:

 State and action spaces grow exponentially with the

minimize w  w i i Efficient Solution for Factored -HALP

number of variables i

 Continuous-state MDPs

 State and action spaces are infinitely large

subject to :  w F x , a  R x , a  0,

i

i i x  X , a  A

1. Discretize continuous state and action variables Regulation device represented Irrigation channel represented



 Often, no closed-form representation for the value function  where 2. Identify subsets of variables Xi and Ai (Xj and Aj) that the by a discrete action node by a continuous variable



exists  i is state relevance weight functions Fi(x, a) (Rj(x, a)) depend on

 Naïve discretization often leads to exponential complexity  Fi(x, a) is a difference between basis function fi(x) and its 3. Compute Fi(xi, ai) and (Rj(xj, aj)) for all possible  Transition functions represent water flows between channels

discounted backprojection configurations of Xi and Ai (Xj and Aj) given actions at regulation devices

4. Calculate state relevance weights i  Objective is the operation of valves to maintain optimal water

Factored Hybrid MDPs i    x f x dx i C 5. Use ALP algorithm for factored discrete-valued variables to levels

xD x C

find the vector of optimal weights w (Guestrin et al. 2001)  Reward function characterizes preferred water levels

Fi x , a  f x      px x , af xdx

i i C

 Multiagent factored hybrid A1 x x

Experimental Results

D C

MDP (HMDP) is a 4-tuple

(X, A, P, R): Near Feasibility Implies Near Optimality

 X is a vector of state

X1 X’1

Quality of HALP Approximation

R1

 Continuous formulation of the irrigation network problem

variables (discrete or  Solution of -HALP likely violates constraints in the HALP cannot be solved exactly by any MDP solver

continuous)  Proposition 1 Let w be an optimal solution of the HALP.  Proposition 2 Let w be an optimal solution of the HALP and

A2  Evaluation of solution quality (mean and standard deviation)

 A is a vector of action Then, for any Lyapunov function L(x): w be an optimal solution of the -HALP, such that solution w is

ˆ ˆ

w and running time (in seconds):

variables (discrete or d-infeasible. Then:

continuous) X2 X’2   2 TL V   Hw

ˆ  -HALP Alternative solutions

The quality of the -HALP



 Continuous variables V  Hw  min V   Hw   w F x, a  R x, a  d

ˆ

1,

1

 Mean

42.8

Std

3.0

Time

2

Method

Random

Mean

35.9

Std

2.7

solution beats alternative



R2

1 , 1  w  ,1 L

i

i i

d 1/2 60.3 3.0 21 Local 55.4 2.5

approximate optimization

are restricted to [0,1] x  X , a  A



V  Hw 

2 1/4 61.9 2.9 184 Global 1 60.4 3.0

techniques on the large

irrigation network

 P is a transition model

1, 1  1/8 72.2 3.5 1068 Global 4 66.0 3.6 example

A3  Analogous to de Farias and Van Roy 2001 result for 1 / 16 73.8 3.0 13219 Global 16 68.2 3.2

represented by DBN approximate LP for discrete MDPs n-ring

 R is a reward function

Quality of -HALP Approximation

 n=6 n=9 n = 12 n = 15 n = 18

is sum of local rewards X3 X’3 Mean Time Mean Time Mean Time Mean Time Mean Time

1 28.4 1 37.5 1 46.9 1 55.6 2 64.5 3



Representational and Computational Challenges 1/2

1/4

33.5

35.1

3

11

43.0

45.2

5

21

52.6

54.2

9

43

62.9

64.2

17

63

72.1

74.5

28

85

ˆ

 Theorem 1 Let w be an optimal solution of the -HALP 1/8 40.1 46 51.4 85 62.2 118 73.2 168 84.9 193



Representation of Conditional Probabilities  Constraints require representation of backprojections, functions

satisfying the d-infeasibility condition. Then, for any Lyapunov 1 / 16 40.4 331 51.8 519 63.7 709

n-ring-of-rings

75.5 963 86.8 1285



function L(x):  n=6 n=9 n = 12 n = 15 n = 18

of continuous and discrete variables Mean Time Mean Time Mean Time Mean Time Mean Time



 Parametric representation of transition model  HALP requires solution of (linear) convex problem with infinite  d 2 T L 1 14.8 1 16.2 2 17.5 4 18.5 5 19.7 6

ˆ

V  Hw 2  min V   Hw 1/2 38.6 12 50.5 25 44.0 103 75.8 69 87.6 107

 Discrete child with discrete parents: number of constraints 1 , 1  1  w  ,1 L 1/4 40.1 82 53.6 184 66.7 345 79.0 590 93.1 861

1/8 48.0 581 62.4 1250 76.1 2367 90.5 3977 104.5 6377

 Tabular, decision trees, noisy-or, etc. 1 / 16 47.1 4736 62.3 11369 77.6 22699 92.4 35281 107.8 53600



 Discrete child with continuous and discrete parents:

Choice of Representation Achieving d-Infeasibility Solution quality

improves with higher

Time complexity grows

polynomially with

Time complexity grows

polynomially with network

d j (Par(X i ' )) Discriminant function grid resolution  higher grid resolution topology size n

P(X i '| Par(X i ' ))  1/

 d (Par(X ' ))

u

u i Normalizing factor

 Continuous basis functions defined as polynomials  Appropriate choice of -grid to achieve d-infeasibility



 Continuous child with continuous and discrete parents: fi x i    xj

m j ,i

 wiFi x, a  R x, a   wiFi x G , aG   Rx G , aG   d

ˆ ˆ Conclusions

 

x j x i i i 

P(Xi '| Par(Xi ' ))   Beta X h ParX, h ParX

1 2

i i ii i

 Basis function decomposition along continuous and (xG, aG) is the closest -grid point to the state-action pair (x, a)  HALP provides effective formulation for solving hybrid MDPs

Mixture of beta distributions Moment > 0 Moment > 0 discrete factors  Including bounds on the quality of the solution

fi x i   fiD x iD fiC x iC 

 Factored hybrid MDPs allow for closed-form representation

Optimal Policy and Value Function  Lipschitz modulus of the discretized functions of HALP constraints

d Worst-case Lipschitz  Number of constraints remains infinite

 constant over functions



 Value function of an optimal policy satisfies the Bellman-  Closed-form representation of the objective function Number of factors MK max wiFi(x, a) and Rj(x, a)



 Mixture of betas transition model for continuous factors  Exploit factorization for efficient discretization, -HALP

Hamilton-Jacobi fixed point equation:  Provide bounds on the effect of discretization

 Decomposition of the constraints along continuous and discrete

 

V x   sup R x, a    px  x , aV x dx  

 

C

functions and closed-form representation Summary of Factored -HALP Algorithm  Lipschitz constant grows linearly in the number of variables



  px x , af xdx

a  

 x x

D    Using factored LP decomposition to solve -HALP

C

i C



Value function V(x) difficult to compute and represent

x x

D C  Discretize continuous variables using a regular e-spaced grid  For fixed tree-width, running time is polynomial in the

Approximate  Formulate a linear program with constraints restricted only to

   number of variables and discretization level 1/

     

Closed-form solution of the value function may not exist

solutions   p x  x , a fi x   p x  x , a fi x  dx  

 

due to the recursive integral definition grid points

 x iD iD iC iC iC



 iD  xiC   Solve the LP using an ALP algorithm for factored discrete MDPs



Related docs
Other docs by linzhengnd
i-Health
Views: 0  |  Downloads: 0
State employees recall events of September 11
Views: 7  |  Downloads: 0
0804050421330_2110
Views: 4  |  Downloads: 0
Listino2009 - Meetup
Views: 0  |  Downloads: 0
TwoSurveyCalculator
Views: 0  |  Downloads: 0
Guidelines.xlsx
Views: 0  |  Downloads: 0
APPALACHIA AND THE OZARKS
Views: 2  |  Downloads: 0
Proliferation Studies
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!