VIEWS: 63 PAGES: 14 CATEGORY: Legal POSTED ON: 2/16/2010 Public Domain
Topic 1 Markov Chains Introduction, Deﬁnition, and a Simple Example Consider a system that can be in any one of a ﬁnite or countably inﬁnite number of states. Let S denote this set of states. The set S is called the state space of the system. Let Xt denote the state of the system at time t. In a stochastic system, Xt is a random variable deﬁned on a common probability space, hence the state of the system follows a random process. In the ﬁrst part of the class we assume that time changes in a discrete manner, hence we only need to know the sequence {Xn }∞ to characterize 0 the evolution of the system. The simplest example is that of independent and identically distributed random variables. In this case, Xn describes the outcome of a repeated experiment, at the nth trial. In such systems, the future states of the systems is independent of past and present states. In most other systems that arise in practice past and present states inﬂuence the evolution of the system, hence the future states. Obviously, if present and past systems determine the future states of the system uniquely, it means that there is not randomness in the system from that time on. Many systems have the property that given the present state, the past states have no inﬂuence on the future. This property is called Markov property and mathematically is deﬁned as follows P (Xn+1 = xn+1 |Xn = xn , Xn−1 = xn−1 , . . . , X0 = x0 ) = P (Xn+1 = xn+1 |Xn = xn ) for every choice of n and values x0 , · · · , xn+1 . Note that the Markov property in effect says that given the present, the future is independent of the past (minimum information for probabilistic description, as we see later). Under this condition the sequence {Xn }∞ is said to form a Markov Chain and is completely identiﬁed by its 1-step transition 0 probabilities P (Xn+1 = xn+1 |Xn = xn ). In this class we will restrict attention to Markov chains with stationary transition probabilities, i.e. those such that P (Xn+1 = y|Xn = x) = P (Xm+1 = y|Xm = x) for all n, m (also called temporally homogeneous case). Needless to say that the transitions of stationary Markov chain, whose state space is ﬁnite and of size N , can be fully described by a square matrix of size N , i.e. P = [p(i, j)]N ×N = [P (Xn+1 = Sj |Xn = Si )]N ×N . Markov chain are generally described by a graph, where nodes represent the state space and edges represent the transition probabilities. 1 1 Example Example 1 Consider a machine that at the start of the day is either broken down or is operating. If a machine is broken in the morning, it will be successfully repaired with probability p (operating the next day); and assume that the probability of failure for an operation machine at the end of the day is q. Finally, let π0 (0) denote the probability that the machine is broken down initially, i.e. at the start of the 0th day (π0 (0) = 0 implies that the machine is operational at the start of the 0th day, i.e. a deterministic initial state.). And ﬁnally assume that the repair and breakdowns happen independently. p 1−q 1−p G B q Figure 1: Example 1. Let random variable Xn represent the state of the machine at the start of day n; where Xn (B) = 0 and Xn (G) = 1. According to the above description P (Xn+1 = 1|Xn = 0) = P (Xn+1 = 1|Xn = 0) = P (X0 = 0) = Since there are only two states, we can compute P (Xn+1 = 0|Xn = 0) = P (Xn+1 = 0|Xn = 0) = P (X0 = 1) = The transition matrix is written as P = [P (i, j)] = 2 1.1 Sample-path Evolution We are interested in P (Xn = 0) and P (Xn = 1). P (Xn+1 = 0) = P (Xn+1 = 0|Xn = 1)P (Xn = 1) + P (Xn+1 = 0|Xn = 0)P (Xn = 0) = q(1 − P (Xn = 0)) + (1 − p)P (Xn = 0) = (1 − p − q)P (Xn = 0) + q. (1) On the other hand, we know P (X0 = 0) = π0 (0). Putting n = 0 in Equation (1) we have P (X1 = 0) = (1 − p − q)π0 (0) + q. Now we can use an induction-like procedure to see n−1 n P (Xn = 0) = (1 − p − q) π0 (0) + q (1 − p − q)i j=0 1 − (1 − p − q)n = .(1 − p − q)n π0 (0) + q (2) 1 − (1 − p − q) First we consider the case when 0 < p+q < 2. Then by the formula for the sum of a ﬁnite geometric progression we have q q P (Xn = 0) = + (1 − p − q)n (π0 (0) − ), p+q p+q p q P (Xn = 1) = + (1 − p − q)n (π0 (0) − ). (3) p+q p+q 1.1.1 Limit behavior The assumption 0 < p + q < 2 implies that |1 − p − q| < 1. As a result, it can be shown that, starting at any initial distribution [π0 (0)π0 (1)], as n → ∞, P (Xn = 0) converges to a ﬁxed point (sometimes called the steady-state distribution): q p lim P (Xn = 0) = and lim P (Xn = 1) = (4) n→∞ p+q n→∞ p+q Deﬁne P (n) (x, y) = P (Xn = y|X0 = x) (we will see why this is an appropriate notation, later). Similarly, we can show that for ∀x ∈ {0, 1} q p lim P (n) (x, 0) = and lim P (n) (x, 1) = (5) n→∞ p+q n→∞ p+q 3 1.1.2 Stationary Distribution Another interesting observation is that we can choose π0 (0) so that P (Xn = 0) and P (Xn = 1) are independent of n. What is such a choice? Such choice of initial probability distributions is called Stationary Distribution. The above discussion underlines a desirable behavior for the evolution of the machine when 0 < p + q < 2. In this scenario, we see that for ∀x ∈ {0, 1} lim P (Xn = 0) = lim P (n) (x, 0) = π0 (0) and lim P (Xn = 1) = lim P (n) (x, 1) = π0 (1). n→∞ n→∞ n→∞ n→∞ (6) This implies that, as time evolves, the probability of being in a certain state becomes more and more independent of the initial state. 1.2 Time Averages and Ergodicity Let’s deﬁne variable T00 to be deﬁned as the random time for the machine to break down again, i.e. T00 = min{n : n ≥ 1 and Xn = 0|X0 = 0}. Furthermore, deﬁne the random variable Nn (0) to be the number of days that the machine was broken down prior to the nth day. p Exercise 1 Show that E(T00 ) = 1 + q 4 1 From Section (1.1.2) π(0) = E(T00 ) . On the other hand, from the strong law of large numbers we n conclude that for large n we can write Nn (0) ≈ E(T00 ) . More precisely we can write Nn (0) 1 lim → = π(0). n→∞ n E(T00 ) This relationship establishes that the time average of broken days ( Nn (0) ) converges to the sample mean n (π(0)). This reﬂects on another desirable property of the above Markov chain, which is known as ergodicity. Notice that similar arguments are valid for T11 , π(1), and Nn (1), deﬁned appropriately. Notice that if p = 0 (unrepairable damage) while q > 0, then π(1) = 0, which corresponds to the existence of only a transient operational state for the machine. In other words, no matter how small the probability of breakdown, the machine will eventually fail permanently (since p = 0). In this case, state 0 is called an absorbing state. This is in contrast to the case, when p > 0 and q > 0, which both states are visited inﬁnitely often, in such situation, both states are called recurrent. 1.3 Multi-step Transition Probabilities Notice that since repairs and breakdowns are independent in time, the systems follows a Markov chain (states B and G summarize the whole past). Using this fact we can compute the joint distribution of X0 , X1 , . . . , Xn . For instance, let n = 2 and let xi ∈ {0, 1} for i = 0, 1, 2. Then P (X0 = x0 , X1 = x1 , X2 = x2 ) = P (X0 = x0 , X1 = x1 )× (7) 5 The following table gives the joint distribution of X0 , X1 , and X2 for various values of x0 , x1 , x2 . Exercise 2 Show that P (X1 = x1 , X2 = x2 |X0 = x0 ) = P (X1 = x1 |X0 = x0 )P (X2 = x2 |X1 = x1 ) Exercise 3 Fill the following table. x0 x1 x2 P (X2 = x2 , X1 = x1 |X0 = x0 ) 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Exercise 4 Using the above result, Calculate P (X2 = 1|X0 = 0) Create a similar table: x0 x2 P (X2 = x2 |X0 = x0 ) 0 0 0 1 1 0 1 1 Exercise 5 Using Exercise 4, check that the (i, j)th element of matrix P 2 is equal to P (2) (i, j) = k P (k, j)P (i, k); i.e. P 2 is the second step transition probability matrix. 6 This can be generalized to Chapman-Kolmogorov equation: P (Xm+n = j|X0 = i) = P (Xm+n = j|Xn = k)P (Xn = k|X0 = i) k which is equivalent to: P (n+m) = P (n) .P (m) 1.4 Extensions and Counter-Examples Here we are interested to know if the desirable properties discussed in Sections 1.1 and 1.2 can be extended to all Markov Chains. In other words, we pose the following questions: Q1. Is there always possible to ﬁnd a stationary distribution for a Markov chain? Q2. Is the stationary distribution unique? Or can there be more than one? Q3. Is the stationary distribution π(y) always a limit point for P (n) (x, y) (independent of x)? Q4. Is the average time state spent in a state always inversely proportional to stationary distri- bution of that sate? How about the the sample mean of return time? Through the following examples, we discuss the answers to the above questions: 1.4.1 Existence of a stationary distribution? (Q1) As mentioned before, a ﬁnite (of size N ) temporally homogeneous Markov chain can be fully described by an N × N matrix P. From the deﬁnition of stationary distribution, a probability distribution vector N −1 π = (π(0), π(1), . . . , π(N − 1)) (i.e. i=0 π(i) = 1) is a stationary distribution associated with matrix P if it is a solution of πP = π. Exercise 6 Why? 7 Theorem 1 Consider matrix P to be the transition probability matrix for a ﬁnite Markov chain. The equation πP = π has at least one solution. Proof. Since matrix P is a transition probability matrix, we always have y P (x, y) = 1. In other words, 1 is an eigenvalue of matrix P associated with the eigenvector 1N . This implies that π is an eigenvector of matrix P associated with its eigenvalue 1 (matrices A and A have the same eigenvalues). This in turn implies that equation πP = π has at least one solution. 2 1.4.2 Uniqueness of stationary distribution? (Q2) Here we show that if p = q = 0, then there is more than one stationary distribution for the two-state Markov chain studied before. In fact, it is easy to show any probability distribution vector π0 is a stationary distribution for this system. Exercise 7 Why? 1.4.3 Limiting behavior of P n (x, y)? (Q3) Here we provide counter examples to show scenarios where P n (x, y) → π( y). Counter Example 1. Consider the case when p = q = 1. (i) What is the stationary distribution? Is it unique? 8 (ii) Pick an arbitrary initial distribution (π0 (0), π0 (1)). Calculate P (Xn = 0) (Hint:consider cases when n is odd or even) 1 if n is even 0 if n is even Exercise 8 (iii) Show: P n (0, 0) = and P n (1, 0) = 0 if n is odd 1 if n is odd (iv) What is limn→∞ P n (0, 0)? The problem here is the fact that the chain is ........ A chain which does not have this property is called ... 9 Counter Example 2. Consider the case when p = q = 0. We saw before that any vector (π0 (0), π0 (1)) is a stationary distribution. (i) Calculate P (Xn = 0) Exercise 9 (ii) Calculate P n (0, 0) and P n (0, 1). The problem here is the fact that chain can be ....... into two .... states. A chain which does not have this property is called to be irreducible. In the ﬁrst part of the course, we will try to discuss these issues in details. 10 Nn (y) 1.4.4 Relating limn→∞ n , E(Tyy ), and π(y)? (Q3) Counter Example 3. Consider the case where p = q = 0. As discussed there are uncountably inﬁnite stationary distributions for this chain. (i) Calculate E(T00 ). Nn (0) Nn (0) Exercise 10 (ii) Calculate n . Notice that n is a random variable. Can you talk about the distribution of this r.v? Here the problems arise due to the fact that the MC is not irreducible. 11 2 Controlled Markov Chains Now in Example 1, assume that p and q can be controlled. For instance, consider a case where no or high quality maintenance schedules can be deployed. The choice of maintenance schedule can be modeled as a control action u ∈ 0, 1 where 0, 1, and 2 correspond to no, low and high maintenance schedules. It is clear that the choice of control at time n, i.e. u(n) will affect the transition probability matrix P (u(n)), and hence, the time averages and limiting behavior of the chain. Furthermore, it is natural to assume that there is a cost associated with application of a particular control at time n, given the state of the system. The central question we will try to address is how to ﬁnd the sequence of control actions in such a problem. Note that in such problems, choice of control could be open loop vs closed loop. This mathemat- ically means that the sequence (or time varying function) π = {g0 , g1 , . . .} usually is used to describe the full plan of actions (closed loop policy) into the future. which one do you think results in optimal solutions? why? From this, the question of perfect or imperfect observation will arise. This mathematically means: Note that in case of perfect observation, it is intuitive that there is no loss of performance if one restrict attentions to Markov Policies. How would you justify this statement? 12 Example 2 Consider the machine studied in Example 1, where the probability of transitions between “bad” and “good” states can be manipulated every morning in the following manner: by spending 1 unit of maintanence cost, the transition probability from “bad” state to “good” state can be increased to 1 − , while the transition probability form “good” to “bad” state is reduced to δ. When machine is operated in good state during time slot n it produces A unit of reward. What is the optimal policy when considering total reward over a horizon of T = 5? (This is the kind of problems we will address In the second part of the course.) How about if one accounts for inﬂation, i.e. when 1 unit of reward (cost) at time t, t ≥ 0 is only worth β t unit of reward (cost), for some known 0 < β < 1? This allows us to consider T → ∞. 13 Example 3 Consider N machines of those studied in Example 1. Assume every morning you wake up, pick up one machine to talk to work and leave the others at home. If the machine you brought to work is functional (in ”good” state) that day, you earn C, if not, you loose l. If you want to maximize your earnings over the next month, which machine will you pick? Why? Notice that here you do not get to see the state of all machines, only the one you take to work! Hence a Partially Observable Markov Decision Problem (POMDP) or MDP with imperfect observation. (This is the kind of problem we will discuss in the last part of the course.) 14