Topic1 Markov Chains Introduction, Definition, anda Simple Example by kmb15358


									                                                       Topic 1
                                               Markov Chains
                             Introduction, Definition, and a Simple Example

    Consider a system that can be in any one of a finite or countably infinite number of states. Let S
denote this set of states. The set S is called the state space of the system. Let Xt denote the state of the
system at time t. In a stochastic system, Xt is a random variable defined on a common probability space,
hence the state of the system follows a random process. In the first part of the class we assume that
time changes in a discrete manner, hence we only need to know the sequence {Xn }∞ to characterize

the evolution of the system. The simplest example is that of independent and identically distributed
random variables. In this case, Xn describes the outcome of a repeated experiment, at the nth trial. In
such systems, the future states of the systems is independent of past and present states. In most other
systems that arise in practice past and present states influence the evolution of the system, hence the
future states. Obviously, if present and past systems determine the future states of the system uniquely,
it means that there is not randomness in the system from that time on.
    Many systems have the property that given the present state, the past states have no influence on the
future. This property is called Markov property and mathematically is defined as follows

        P (Xn+1 = xn+1 |Xn = xn , Xn−1 = xn−1 , . . . , X0 = x0 ) = P (Xn+1 = xn+1 |Xn = xn )

for every choice of n and values x0 , · · · , xn+1 .
    Note that the Markov property in effect says that given the present, the future is independent of
the past (minimum information for probabilistic description, as we see later). Under this condition the
sequence {Xn }∞ is said to form a Markov Chain and is completely identified by its 1-step transition

probabilities P (Xn+1 = xn+1 |Xn = xn ). In this class we will restrict attention to Markov chains with
stationary transition probabilities, i.e. those such that P (Xn+1 = y|Xn = x) = P (Xm+1 = y|Xm =
x) for all n, m (also called temporally homogeneous case). Needless to say that the transitions of
stationary Markov chain, whose state space is finite and of size N , can be fully described by a square
matrix of size N , i.e. P = [p(i, j)]N ×N = [P (Xn+1 = Sj |Xn = Si )]N ×N .
    Markov chain are generally described by a graph, where nodes represent the state space and edges
represent the transition probabilities.

1    Example
Example 1 Consider a machine that at the start of the day is either broken down or is operating. If a
machine is broken in the morning, it will be successfully repaired with probability p (operating the next
day); and assume that the probability of failure for an operation machine at the end of the day is q.
Finally, let π0 (0) denote the probability that the machine is broken down initially, i.e. at the start of the
0th day (π0 (0) = 0 implies that the machine is operational at the start of the 0th day, i.e. a deterministic
initial state.). And finally assume that the repair and breakdowns happen independently.

                                1−q                                      1−p

                                          G                       B


                                           Figure 1: Example 1.

    Let random variable Xn represent the state of the machine at the start of day n; where Xn (B) = 0
and Xn (G) = 1. According to the above description

                                       P (Xn+1 = 1|Xn = 0) =

                                       P (Xn+1 = 1|Xn = 0) =

                                                   P (X0 = 0) =

Since there are only two states, we can compute

                                       P (Xn+1 = 0|Xn = 0) =

                                       P (Xn+1 = 0|Xn = 0) =

                                                   P (X0 = 1) =

    The transition matrix is written as
                                              P = [P (i, j)] =

1.1     Sample-path Evolution

We are interested in P (Xn = 0) and P (Xn = 1).

        P (Xn+1 = 0) = P (Xn+1 = 0|Xn = 1)P (Xn = 1) + P (Xn+1 = 0|Xn = 0)P (Xn = 0)

                       = q(1 − P (Xn = 0)) + (1 − p)P (Xn = 0)

                       = (1 − p − q)P (Xn = 0) + q.                                                (1)

   On the other hand, we know P (X0 = 0) = π0 (0). Putting n = 0 in Equation (1) we have

                                   P (X1 = 0) = (1 − p − q)π0 (0) + q.

   Now we can use an induction-like procedure to see
                       P (Xn = 0) = (1 − p − q) π0 (0) + q              (1 − p − q)i
                                                                  1 − (1 − p − q)n
                                     = .(1 − p − q)n π0 (0) + q                                    (2)
                                                                  1 − (1 − p − q)
   First we consider the case when 0 < p+q < 2. Then by the formula for the sum of a finite geometric
progression we have
                                      q                            q
                       P (Xn = 0) =      + (1 − p − q)n (π0 (0) −     ),
                                     p+q                          p+q
                                      p                            q
                        P (Xn = 1) =     + (1 − p − q)n (π0 (0) −     ).                           (3)
                                     p+q                          p+q

1.1.1    Limit behavior

The assumption 0 < p + q < 2 implies that |1 − p − q| < 1. As a result, it can be shown that, starting
at any initial distribution [π0 (0)π0 (1)], as n → ∞, P (Xn = 0) converges to a fixed point (sometimes
called the steady-state distribution):
                                            q                                       p
                    lim P (Xn = 0) =              and      lim P (Xn = 1) =                        (4)
                    n→∞                    p+q            n→∞                      p+q
   Define P (n) (x, y) = P (Xn = y|X0 = x) (we will see why this is an appropriate notation, later).
Similarly, we can show that for ∀x ∈ {0, 1}
                                            q                                      p
                      lim P (n) (x, 0) =          and      lim P (n) (x, 1) =                      (5)
                     n→∞                   p+q            n→∞                     p+q

1.1.2   Stationary Distribution

Another interesting observation is that we can choose π0 (0) so that P (Xn = 0) and P (Xn = 1) are
independent of n. What is such a choice?

   Such choice of initial probability distributions is called Stationary Distribution.
   The above discussion underlines a desirable behavior for the evolution of the machine when 0 <
p + q < 2. In this scenario, we see that for ∀x ∈ {0, 1}

 lim P (Xn = 0) = lim P (n) (x, 0) = π0 (0)       and      lim P (Xn = 1) = lim P (n) (x, 1) = π0 (1).
n→∞                  n→∞                                   n→∞                 n→∞
This implies that, as time evolves, the probability of being in a certain state becomes more and more
independent of the initial state.

1.2     Time Averages and Ergodicity

Let’s define variable T00 to be defined as the random time for the machine to break down again, i.e.

                               T00 = min{n : n ≥ 1 and Xn = 0|X0 = 0}.

Furthermore, define the random variable Nn (0) to be the number of days that the machine was broken
down prior to the nth day.

Exercise 1 Show that E(T00 ) = 1 +    q

    From Section (1.1.2) π(0) =     E(T00 )
                                            .   On the other hand, from the strong law of large numbers we
conclude that for large n we can write Nn (0) ≈       E(T00 )
                                                              .   More precisely we can write

                                         Nn (0)     1
                                      lim       →         = π(0).
                                     n→∞  n       E(T00 )

This relationship establishes that the time average of broken days ( Nn (0) ) converges to the sample mean

(π(0)). This reflects on another desirable property of the above Markov chain, which is known as
ergodicity. Notice that similar arguments are valid for T11 , π(1), and Nn (1), defined appropriately.
    Notice that if p = 0 (unrepairable damage) while q > 0, then π(1) = 0, which corresponds to the
existence of only a transient operational state for the machine. In other words, no matter how small
the probability of breakdown, the machine will eventually fail permanently (since p = 0). In this case,
state 0 is called an absorbing state. This is in contrast to the case, when p > 0 and q > 0, which both
states are visited infinitely often, in such situation, both states are called recurrent.

1.3      Multi-step Transition Probabilities

Notice that since repairs and breakdowns are independent in time, the systems follows a Markov chain
(states B and G summarize the whole past). Using this fact we can compute the joint distribution of
X0 , X1 , . . . , Xn .
    For instance, let n = 2 and let xi ∈ {0, 1} for i = 0, 1, 2. Then

                                       P (X0 = x0 , X1 = x1 , X2 = x2 )

                                         = P (X0 = x0 , X1 = x1 )×                                      (7)

      The following table gives the joint distribution of X0 , X1 , and X2 for various values of x0 , x1 , x2 .

Exercise 2 Show that

              P (X1 = x1 , X2 = x2 |X0 = x0 ) = P (X1 = x1 |X0 = x0 )P (X2 = x2 |X1 = x1 )

Exercise 3 Fill the following table.

                                x0   x1        x2        P (X2 = x2 , X1 = x1 |X0 = x0 )
                                0    0         0
                                0    0         1
                                0    1         0
                                0    1         1
                                1    0         0
                                1    0         1
                                1    1         0
                                1    1         1

Exercise 4 Using the above result, Calculate

                                                     P (X2 = 1|X0 = 0)

      Create a similar table:
                                          x0        x2     P (X2 = x2 |X0 = x0 )
                                          0         0
                                          0         1
                                          1         0
                                          1         1

Exercise 5 Using Exercise 4, check that the (i, j)th element of matrix P 2 is equal to P (2) (i, j) =

  k   P (k, j)P (i, k); i.e. P 2 is the second step transition probability matrix.

    This can be generalized to Chapman-Kolmogorov equation:

                P (Xm+n = j|X0 = i) =              P (Xm+n = j|Xn = k)P (Xn = k|X0 = i)

which is equivalent to:
                                            P (n+m) = P (n) .P (m)

1.4     Extensions and Counter-Examples

Here we are interested to know if the desirable properties discussed in Sections 1.1 and 1.2 can be
extended to all Markov Chains. In other words, we pose the following questions:

 Q1. Is there always possible to find a stationary distribution for a Markov chain?

 Q2. Is the stationary distribution unique? Or can there be more than one?

 Q3. Is the stationary distribution π(y) always a limit point for P (n) (x, y) (independent of x)?

 Q4. Is the average time state spent in a state always inversely proportional to stationary distri-
        bution of that sate? How about the the sample mean of return time?

    Through the following examples, we discuss the answers to the above questions:

1.4.1    Existence of a stationary distribution? (Q1)

As mentioned before, a finite (of size N ) temporally homogeneous Markov chain can be fully described
by an N × N matrix P. From the definition of stationary distribution, a probability distribution vector
                                           N −1
π = (π(0), π(1), . . . , π(N − 1)) (i.e.   i=0    π(i) = 1) is a stationary distribution associated with matrix
P if it is a solution of πP = π.

Exercise 6 Why?

Theorem 1 Consider matrix P to be the transition probability matrix for a finite Markov chain. The
equation πP = π has at least one solution.

Proof. Since matrix P is a transition probability matrix, we always have        y   P (x, y) = 1. In other
        words, 1 is an eigenvalue of matrix P associated with the eigenvector 1N . This implies that π is
        an eigenvector of matrix P associated with its eigenvalue 1 (matrices A and A have the same
        eigenvalues). This in turn implies that equation πP = π has at least one solution.

1.4.2    Uniqueness of stationary distribution? (Q2)

Here we show that if p = q = 0, then there is more than one stationary distribution for the two-state
Markov chain studied before. In fact, it is easy to show any probability distribution vector π0 is a
stationary distribution for this system.

Exercise 7 Why?

1.4.3    Limiting behavior of P n (x, y)? (Q3)

Here we provide counter examples to show scenarios where

                                              P n (x, y) → π( y).

Counter Example 1. Consider the case when p = q = 1.
        (i) What is the stationary distribution? Is it unique?

(ii) Pick an arbitrary initial distribution (π0 (0), π0 (1)). Calculate P (Xn = 0)
(Hint:consider cases when n is odd or even)

                                         1 if n is even                        0 if n is even
Exercise 8 (iii) Show: P n (0, 0) =                     and P n (1, 0) =
                                         0 if n is odd                         1 if n is odd

(iv) What is limn→∞ P n (0, 0)?

The problem here is the fact that the chain is ........ A chain which does not have this property
is called ...

Counter Example 2. Consider the case when p = q = 0. We saw before that any vector
     (π0 (0), π0 (1)) is a stationary distribution.

     (i) Calculate P (Xn = 0)

     Exercise 9 (ii) Calculate P n (0, 0) and P n (0, 1).

      The problem here is the fact that chain can be ....... into two .... states. A chain which
     does not have this property is called to be irreducible.

In the first part of the course, we will try to discuss these issues in details.

                                 Nn (y)
1.4.4     Relating limn→∞         n
                                        ,   E(Tyy ), and π(y)? (Q3)

Counter Example 3. Consider the case where p = q = 0. As discussed there are uncountably
        infinite stationary distributions for this chain.
        (i) Calculate E(T00 ).

                                            Nn (0)                   Nn (0)
        Exercise 10 (ii) Calculate           n
                                                   .   Notice that    n
                                                                              is a random variable. Can you talk
        about the distribution of this r.v?

Here the problems arise due to the fact that the MC is not irreducible.

2    Controlled Markov Chains
Now in Example 1, assume that p and q can be controlled. For instance, consider a case where no
or high quality maintenance schedules can be deployed. The choice of maintenance schedule can be
modeled as a control action u ∈ 0, 1 where 0, 1, and 2 correspond to no, low and high maintenance
schedules. It is clear that the choice of control at time n, i.e. u(n) will affect the transition probability
matrix P (u(n)), and hence, the time averages and limiting behavior of the chain.
    Furthermore, it is natural to assume that there is a cost associated with application of a particular
control at time n, given the state of the system. The central question we will try to address is how to
find the sequence of control actions in such a problem.
    Note that in such problems, choice of control could be open loop vs closed loop. This mathemat-
ically means that the sequence (or time varying function) π = {g0 , g1 , . . .} usually is used to describe
the full plan of actions (closed loop policy) into the future.

    which one do you think results in optimal solutions? why?
    From this, the question of perfect or imperfect observation will arise. This mathematically means:

    Note that in case of perfect observation, it is intuitive that there is no loss of performance if one
restrict attentions to Markov Policies. How would you justify this statement?

Example 2 Consider the machine studied in Example 1, where the probability of transitions between
“bad” and “good” states can be manipulated every morning in the following manner: by spending 1
unit of maintanence cost, the transition probability from “bad” state to “good” state can be increased
to 1 − , while the transition probability form “good” to “bad” state is reduced to δ. When machine is
operated in good state during time slot n it produces A unit of reward. What is the optimal policy when
considering total reward over a horizon of T = 5? (This is the kind of problems we will address In the
second part of the course.)

   How about if one accounts for inflation, i.e. when 1 unit of reward (cost) at time t, t ≥ 0 is only
worth β t unit of reward (cost), for some known 0 < β < 1? This allows us to consider T → ∞.

Example 3 Consider N machines of those studied in Example 1. Assume every morning you wake up,
pick up one machine to talk to work and leave the others at home. If the machine you brought to work
is functional (in ”good” state) that day, you earn C, if not, you loose l. If you want to maximize your
earnings over the next month, which machine will you pick? Why? Notice that here you do not get
to see the state of all machines, only the one you take to work! Hence a Partially Observable Markov
Decision Problem (POMDP) or MDP with imperfect observation. (This is the kind of problem we will
discuss in the last part of the course.)


To top