Learning Center
Plans & pricing Sign in
Sign Out

Lecture 3 Solving Extensive Form Games 1 Lecture Overview 2


									 Computational Game Theory, Fall 2009                                                                          November 9

                      Lecture 3: Solving Extensive Form Games
 Lecturer: Peter Bro Miltersen                                                                            Scribe: Line Juhl

1    Lecture Overview
This lecture covered extensive form games and how to solve them, for the two-player, zero-sum
case. The method of converting an extensive form game into strategic form and them solve it using
linear programming is not an efficient algorithm, since the conversion yields a matrix of exponential
size in the size of the tree. Instead a better way of representing the solution is presented, finally
leading to a polynomial time algorithm for solving a two-player zero-sum extensive form game.

2    Converting Extensive Form Games into Strategic Form
As an example of such a conversion, we consider an example from last lecture, namely the basic
endgame of poker. Figure 1 shows the game tree constructed last lecture.

                                                               1                 3
                                                               4                 4

                                                      I                                  I
                                              b            c                         b           c
                                                          0                                  6
                                                  1            1                     6               -1
                                         II                                     II
                                     C            F                         C        F
                                     1        1                             1        1
                                     2        2                             2        2
                                 3                    1                -3                1

                       Figure 1: Extensive form of basic endgame of poker.

   If we want to solve this game (finding the value and a maximin strategy for both players), one
way to do so is to convert the game into strategic form and then solve it using linear programming.
The corresponding strategic form (constructed last lecture) is given by the matrix

                                                              C  F
                                                          b b −3 1
                                                          b c 0 −12
                                                          c b −2 1
                                                          c c −1 −1
                                                               2  2

                                                     1                                    1
                                                     6        1                  1        6
                                                                  1         1
                                                              6   6         6    6

                                        I                I        I         I         I                  I
                 1                               6                                            1(6) (6)                        6(6)
                          2   3     4       5                                                     2      3(6)   4(6)   5(6)

 II      II          II       II            II               II       ···        II                 II            II          II     II   II

                                   Figure 2: A dice game in extensive form.

When solving this game, one might first want to reduce the matrix by using the notion of dominance.
We say that one row r1 weakly dominates another non-identical row r2 if each entry in r1 is larger
than or equal to the corresponding entry in r2 . Intuitively any probability mass put into r2 by a
strategy can be moved to r1 instead since each entry gives at least the same payoff. It is therefore
safe to remove the dominated row, since an optimal strategy not using the dominated row exists.
For our matrix game we see that row 3 is weakly dominated by row 1 (a payoff of − 2 is always better
than −2 while the payoff of 1 does not change anything). We therefore remove row 3. Similarly,
row 4 is weakly dominated by row 2. We end up with

                                                                 C              F
                                                             b b −3
                                                                  2             1
                                                             bc 0               −1

This game is easily solved using linear programming, and gives us ( 1 , 6 , 0, 0) (matching the four
rows in the original matrix) as the optimal mixed strategy for Player 1 and ( 1 , 1 ) as the optimal
                                                                                    2 2
mixed strategy for Player 2. The value of the game is − 4 . Intuitively b c seems like the best strategy
(bet when having a heart, check elsewise), and not surprisingly we therefore use this strategy 5 out
of 6 times. However, it would not make sense to use this every time, since Player 2 would then
change his strategy to always fold when Player 1 bets, causing Player 1 to lose more money. Player
1 therefore needs to bluff occasionally—not very surprising to poker players.
    Since this example did not very well illustrate the fact that such a conversion gives an exponential
blowup in the number of nodes in the tree, we consider another example. This time two players roll
a die and Player 2 tries to get a higher number than Player 1, who starts. The extensive form (or at
least some of it) of the game is seen in Figure 2. Each state has six outgoing actions corresponding
to each possible roll. Player 1 tells a number to Player 2 after studying his die (possible lying) and
then Player 2 decides what to do. A corresponding matrix game would have a row for each possible
pure strategy, thus giving 66 = 46656 rows in the matrix, as seen below.

                                                 1 1 1(3) 1(4) 1(5) 1(6)
                                                 1 1 1(3) 1(4) 1(5) 2(6)
                                                 6 6 6(3) 6(4) 6(5) 6(6)

3     Representing and Finding Solutions
As seen in the previous section, converting from extensive form to strategic form gives an expo-
nential blowup, thus possible resulting in an LP practically infeasible to solve. Another related
problem is the representation of the result. For an n × m game matrix, the optimal solution is
given as an n-tuple with probabilities for each pure strategy (summing to 1) specifying the mixed
strategy. This also gives the exponential blowup.
    We therefore seek other ways to both represent solutions and to find them. First we address
the problem of giving the solution in a more compact way.

Definition 1 A behavior strategy is

    • a map from information sets of a player to probability distributions on actions of those
      information sets, or stated differently it is

    • an assignment of probabilities of actions belonging to a player (where they sum to 1 for each
      information set).

    This strategy corresponds to “delaying” the decision of which action to take until the involved
information set is reached when traversing the game tree. See the red numbers on Figure 1 for
a specific behavior strategy. Mixed strategies force us to consider all options from the beginning,
giving us quite a few more possibilities.
    Playing the game according to the behavior strategy is done by traversing the game tree and
letting each player take an action when reaching their information set according to the probability
distribution on the actions belonging to the information set.
    The following theorem by Kuhn tells us that for games of perfect recall (no forgetful players),
mixed and behavior strategies can express precisely the same strategies.

Theorem 2 (Kuhn 1953) For an extensive form game of perfect recall of an arbitrary number
of players, mixed strategies and behavior strategies are behaviorally equivalent.

    Here behaviorally equivalent means that playing a mixed or behavior strategy cannot be dis-
tinguished by somebody viewing from the outside. They simulate each other perfectly.
    Since the size of a behavior strategy is bounded by the number of edges in the tree, such
strategies are preferred when dealing with games of extensive form.
    We have now represented the solution in a more compact way and move on to consider the
following problem.

Algorithmic problem Given two-player, zero-sum games in extensive form, compute value and
maximin/minimax behavior strategies.
    We present here three possible algorithms, where only the last one is a polynomial time algo-

Algorithm 1:

    1. Convert to strategic form (exponential time).

    2. Compute maximin/minimax mixed strategies (exponential time, since the size of the matrix
       is already exponential).

    3. Convert to behaviorally equivalent maximin/minimax strategies (as given in the constructive
       proof of 2).

   Algorithm 1 uses the theory already known, but has exponential running time in the size of the
game tree.

Algorithm 2:

    1. Write Nash equation conditions (for each information set) as a mathematical program, roughly
       the size of the tree.

    2. Solve the program.

   This algorithm is somewhat better than Algorithm 1, since the program does not suffer from
the exponential blowup. However, solving such games can be hard, since the resulting program is
not linear: Variables of the program (the probabilities used for the behavior strategy) often are
multiplied by each other, as seen in the toy example in Figure 3, where Player 1 has more than one
choice along the path to γ and β. The Nash equations will therefore involve “pD · pd ” where pD is
the behavior probability of D and d is the behavior probability of d. Such terms can consist of an
arbitrary number of multiplications, corresponding to the number of choices along the path.

                                                         D         U
                                                         0.2       0.8
                                                    II                   α

                                              L            R

                                          I                    δ
                                   d          u

                                  0.9         0.1
                              γ                     β

                                        Figure 3: A toy example.

4     A Polynomial Time Algorithm
The last algorithm is due to Keller, Megiddo and von Stengel. In order for this algorithm to work,
we need to define two new helpful constructions, sequence form and realization plan.

Definition 3 The sequence form of two-player, zero-sum extensive games is given by the following
two items.

   • Sets Si of sequences for each player, i = 1, 2. Formally a set of sequences for Player i is the
     set of all paths from the root to all other nodes, taking out the actions for Player i.

   • A payoff matrix with a row for every σ ∈ S1 and a column for every τ ∈ S2 . The entries aστ
     of the matrix is given by
                                  aστ =               weight(l),
                                             leaves l consistent
                                                with σ and τ

     where for a leaf l
                                  weight(l) = payoff(l) ·                       pe .
                                                                e is chance
                                                               edge on path
                                                              from root to l

     Here pe is the probability of the chance edge e.

   This definition is best viewed through an example or two. Let us again consider the basic
endgame of poker (Figure 1) and the toy game from Figure 3. For the poker game, we have

                                       S1 = { , b , c , b, c},
                                       S2 = { , C, F }.

as the sets of sequences. For the game in Figure 3 we get

                                     S1 = { , D, U, Dd, Du},
                                     S2 = { , L, R}.

   For basic endgame of poker we get the following payoff matrix.

                                                        C    F
                                                0       0    0
                                                        3    1
                                         b      0       4    4
                                         c      4       0    0
                                         b    0         −9
                                         c    −3
                                               4        0    0

   In this example, each pair of sequences only leads to one leaf, so the sum consists of only one
term for each entry. The pairs not leading to a leaf has the entry 0.

Definition 4 A realization plan for a player i is an assignment of real number to his sequences,
r : Si → R. This number is called the realisation weight of the sequence. The realisation plan
corresponding to a behavior strategy assigns to each sequence the product of behavior probabilities
of that sequence.

One way to view realization weights is that they simply correspond to a change of variables (from
the behavior strategy probabilities) that makes the non-linear program of Algorithm 2 into a linear

   As an example we again consult the game from Figure 3, and find the realisation weights of the
two sequences Dd and D. The red numbers in the figure are the behavior probabilities.

                                  r(Dd) = 0.2 · 0.9 = 0.18,
                                    r(D) = 0.2,
                                            r(Dd)   0.18
                                     p(d) =       =      = 0.9,
                                             r(D)   0.2

where p(d) denotes the probability given to the action d in the behavior strategy. Note that we
can go back and forth between behavior strategies and realisation plans by simple multiplication
and division (unless we divide by 0, but that will never be an issue, since the path containing this
action will never be taken).
    The next lemma connects realization plans and behavior strategies.

Lemma 5 For a two-player, zero-sum game in extensive form the following holds.

  1. The set of realisation plans of Player 1 corresponding to some behavior strategy is a bounded
     non-empty polytope
                                        X = {x | Ex = e, x ≥ 0}.

  2. The set of realisation plans of Player 2 corresponding to some behavior strategy is a bounded
     non-empty polytope
                                        Y = {y | F y = f, y ≥ 0}.

  3. The expected payoff to Player 1 when he plays by x and Player 2 plays by y is xT Ay, where
     A is the sequence form payoff matrix.

   The matrices E and F and the vectors e and f are constructed using the fact that the probability
mass entering a node must be equal to the probability mass leaving the node. For our game from
Figure 3 we therefore have the following equations for Player 1.

                                                          x = 1,
                                                 xD + xU = x ,
                                              xDd + xDu = xD ,
                                    x , xD , xU , xDd , xDu ≥ 0.

   The first three lines corresponds to Ex = e.
   A formal proof of Lemma 5 is omitted since all three items are straightforward. It is, however,
a very good exercise to go through the details of the proof and also verify a few examples!
   The next theorem follows naturally.

Theorem 6 For a two-player, zero-sum game in extensive form with payoff matrix A (from the
sequence form), the maximin realisation plan, r, is given by

                                      r = arg max min xT Ay.
                                                x∈X y∈Y

   Finally, we are ready to give Algorithm 3.

Algorithm 3:      (Koller, Megiddo, von Stengel 1996)

  1. Convert the game to sequence form. In particular, compute the payoff matrix A and the
     matrices and vectors E, e, F, f defining the valid realization plans.

  2. Compute the maximin expression of Theorem 6 using linear programming (possible due to
     the proof of the generalised maximin theorem).

   Since the number of sequences is linear in the number of nodes, we avoid the exponential blowup
when constructing the payoff matrix. This gives us a polynomial time algorithm in the number of
nodes, which is useful for solving games of extensive form. The existence of such an algorithm was
an open problem for quite a while.
   Using the Algorithm we find the linear programs for our two running examples, see Table 1.

             Basic endgame of poker                     Game from Figure 3
 Variables   x , xb , xc , xb , xc (the realisation     x , xD , xU , xDd , xDu (the realisation
             weights)                                   weights)
             q0 (the value)                             q0 (the value)
             qh (representing the contribution to the   qh (representing the contribution to the
             value from plays through the informa-      value from plays through the informa-
             tion set owned by Player 2, h)             tion set owned by Player 2, h)
 Program     max q0                                     max q0

             subject to                                 subject to

              : q0 ≤ qh + 1 xc − 4 xc
                                                         : q0 ≤ qh + α · xu
                      3       9
             C : qh ≤ 4 x b − 4 x b                     L : qh ≤ γ · xDd + β · xDu
             F : qh ≤ 1 x b + 4 x b
                                                        R : qh ≤ δ · x D

             x =1                                       x =1
             xb + xc = x                                xD + xU = x
             xb + xc = x                                xDd + xDu = xD
             x , xb , xc , xb , xc ≥ 0                  x , xD , xU , xDd , xDu ≥ 0

                             Table 1: Using Algorithm 3 on two examples.

   The intuition behind the first constraint in the poker game is that the value is bounded by
the contribution to the value through h plus the contribution from not going through h (Player 1
taking action c and c). The same applies to the other example.
   In general, the linear programs arising are quite intuitive. The reader is invited to try some
more examples, in particular examples with more information sets belonging to Player 2.


To top