VIEWS: 17 PAGES: 7 POSTED ON: 7/7/2011
Computational Game Theory, Fall 2009 November 9 Lecture 3: Solving Extensive Form Games Lecturer: Peter Bro Miltersen Scribe: Line Juhl 1 Lecture Overview This lecture covered extensive form games and how to solve them, for the two-player, zero-sum case. The method of converting an extensive form game into strategic form and them solve it using linear programming is not an eﬃcient algorithm, since the conversion yields a matrix of exponential size in the size of the tree. Instead a better way of representing the solution is presented, ﬁnally leading to a polynomial time algorithm for solving a two-player zero-sum extensive form game. 2 Converting Extensive Form Games into Strategic Form As an example of such a conversion, we consider an example from last lecture, namely the basic endgame of poker. Figure 1 shows the game tree constructed last lecture. Chance 1 3 4 4 I I b c b c 5 0 6 1 1 1 6 -1 II II C F C F 1 1 1 1 2 2 2 2 3 1 -3 1 Figure 1: Extensive form of basic endgame of poker. If we want to solve this game (ﬁnding the value and a maximin strategy for both players), one way to do so is to convert the game into strategic form and then solve it using linear programming. The corresponding strategic form (constructed last lecture) is given by the matrix C F b b −3 1 2 b c 0 −12 c b −2 1 c c −1 −1 2 2 1 Chance 1 1 6 1 1 6 1 1 6 6 6 6 I I I I I I 1 6 1(6) (6) 6(6) 2 3 4 5 2 3(6) 4(6) 5(6) II II II II II II ··· II II II II II II Figure 2: A dice game in extensive form. When solving this game, one might ﬁrst want to reduce the matrix by using the notion of dominance. We say that one row r1 weakly dominates another non-identical row r2 if each entry in r1 is larger than or equal to the corresponding entry in r2 . Intuitively any probability mass put into r2 by a strategy can be moved to r1 instead since each entry gives at least the same payoﬀ. It is therefore safe to remove the dominated row, since an optimal strategy not using the dominated row exists. 3 For our matrix game we see that row 3 is weakly dominated by row 1 (a payoﬀ of − 2 is always better than −2 while the payoﬀ of 1 does not change anything). We therefore remove row 3. Similarly, row 4 is weakly dominated by row 2. We end up with C F b b −3 2 1 bc 0 −1 2 5 This game is easily solved using linear programming, and gives us ( 1 , 6 , 0, 0) (matching the four 6 rows in the original matrix) as the optimal mixed strategy for Player 1 and ( 1 , 1 ) as the optimal 2 2 1 mixed strategy for Player 2. The value of the game is − 4 . Intuitively b c seems like the best strategy (bet when having a heart, check elsewise), and not surprisingly we therefore use this strategy 5 out of 6 times. However, it would not make sense to use this every time, since Player 2 would then change his strategy to always fold when Player 1 bets, causing Player 1 to lose more money. Player 1 therefore needs to bluﬀ occasionally—not very surprising to poker players. Since this example did not very well illustrate the fact that such a conversion gives an exponential blowup in the number of nodes in the tree, we consider another example. This time two players roll a die and Player 2 tries to get a higher number than Player 1, who starts. The extensive form (or at least some of it) of the game is seen in Figure 2. Each state has six outgoing actions corresponding to each possible roll. Player 1 tells a number to Player 2 after studying his die (possible lying) and then Player 2 decides what to do. A corresponding matrix game would have a row for each possible pure strategy, thus giving 66 = 46656 rows in the matrix, as seen below. 1 1 1(3) 1(4) 1(5) 1(6) 1 1 1(3) 1(4) 1(5) 2(6) . . . 6 6 6(3) 6(4) 6(5) 6(6) 2 3 Representing and Finding Solutions As seen in the previous section, converting from extensive form to strategic form gives an expo- nential blowup, thus possible resulting in an LP practically infeasible to solve. Another related problem is the representation of the result. For an n × m game matrix, the optimal solution is given as an n-tuple with probabilities for each pure strategy (summing to 1) specifying the mixed strategy. This also gives the exponential blowup. We therefore seek other ways to both represent solutions and to ﬁnd them. First we address the problem of giving the solution in a more compact way. Deﬁnition 1 A behavior strategy is • a map from information sets of a player to probability distributions on actions of those information sets, or stated diﬀerently it is • an assignment of probabilities of actions belonging to a player (where they sum to 1 for each information set). This strategy corresponds to “delaying” the decision of which action to take until the involved information set is reached when traversing the game tree. See the red numbers on Figure 1 for a speciﬁc behavior strategy. Mixed strategies force us to consider all options from the beginning, giving us quite a few more possibilities. Playing the game according to the behavior strategy is done by traversing the game tree and letting each player take an action when reaching their information set according to the probability distribution on the actions belonging to the information set. The following theorem by Kuhn tells us that for games of perfect recall (no forgetful players), mixed and behavior strategies can express precisely the same strategies. Theorem 2 (Kuhn 1953) For an extensive form game of perfect recall of an arbitrary number of players, mixed strategies and behavior strategies are behaviorally equivalent. Here behaviorally equivalent means that playing a mixed or behavior strategy cannot be dis- tinguished by somebody viewing from the outside. They simulate each other perfectly. Since the size of a behavior strategy is bounded by the number of edges in the tree, such strategies are preferred when dealing with games of extensive form. We have now represented the solution in a more compact way and move on to consider the following problem. Algorithmic problem Given two-player, zero-sum games in extensive form, compute value and maximin/minimax behavior strategies. We present here three possible algorithms, where only the last one is a polynomial time algo- rithm. Algorithm 1: 1. Convert to strategic form (exponential time). 3 2. Compute maximin/minimax mixed strategies (exponential time, since the size of the matrix is already exponential). 3. Convert to behaviorally equivalent maximin/minimax strategies (as given in the constructive proof of 2). Algorithm 1 uses the theory already known, but has exponential running time in the size of the game tree. Algorithm 2: 1. Write Nash equation conditions (for each information set) as a mathematical program, roughly the size of the tree. 2. Solve the program. This algorithm is somewhat better than Algorithm 1, since the program does not suﬀer from the exponential blowup. However, solving such games can be hard, since the resulting program is not linear: Variables of the program (the probabilities used for the behavior strategy) often are multiplied by each other, as seen in the toy example in Figure 3, where Player 1 has more than one choice along the path to γ and β. The Nash equations will therefore involve “pD · pd ” where pD is the behavior probability of D and d is the behavior probability of d. Such terms can consist of an arbitrary number of multiplications, corresponding to the number of choices along the path. I D U 0.2 0.8 II α L R I δ d u 0.9 0.1 γ β Figure 3: A toy example. 4 A Polynomial Time Algorithm The last algorithm is due to Keller, Megiddo and von Stengel. In order for this algorithm to work, we need to deﬁne two new helpful constructions, sequence form and realization plan. Deﬁnition 3 The sequence form of two-player, zero-sum extensive games is given by the following two items. 4 • Sets Si of sequences for each player, i = 1, 2. Formally a set of sequences for Player i is the set of all paths from the root to all other nodes, taking out the actions for Player i. • A payoﬀ matrix with a row for every σ ∈ S1 and a column for every τ ∈ S2 . The entries aστ of the matrix is given by aστ = weight(l), leaves l consistent with σ and τ where for a leaf l weight(l) = payoﬀ(l) · pe . e is chance edge on path from root to l Here pe is the probability of the chance edge e. This deﬁnition is best viewed through an example or two. Let us again consider the basic endgame of poker (Figure 1) and the toy game from Figure 3. For the poker game, we have S1 = { , b , c , b, c}, S2 = { , C, F }. as the sets of sequences. For the game in Figure 3 we get S1 = { , D, U, Dd, Du}, S2 = { , L, R}. For basic endgame of poker we get the following payoﬀ matrix. C F 0 0 0 3 1 b 0 4 4 1 c 4 0 0 b 0 −9 4 3 4 c −3 4 0 0 In this example, each pair of sequences only leads to one leaf, so the sum consists of only one term for each entry. The pairs not leading to a leaf has the entry 0. Deﬁnition 4 A realization plan for a player i is an assignment of real number to his sequences, r : Si → R. This number is called the realisation weight of the sequence. The realisation plan corresponding to a behavior strategy assigns to each sequence the product of behavior probabilities of that sequence. One way to view realization weights is that they simply correspond to a change of variables (from the behavior strategy probabilities) that makes the non-linear program of Algorithm 2 into a linear one! 5 As an example we again consult the game from Figure 3, and ﬁnd the realisation weights of the two sequences Dd and D. The red numbers in the ﬁgure are the behavior probabilities. r(Dd) = 0.2 · 0.9 = 0.18, r(D) = 0.2, r(Dd) 0.18 p(d) = = = 0.9, r(D) 0.2 where p(d) denotes the probability given to the action d in the behavior strategy. Note that we can go back and forth between behavior strategies and realisation plans by simple multiplication and division (unless we divide by 0, but that will never be an issue, since the path containing this action will never be taken). The next lemma connects realization plans and behavior strategies. Lemma 5 For a two-player, zero-sum game in extensive form the following holds. 1. The set of realisation plans of Player 1 corresponding to some behavior strategy is a bounded non-empty polytope X = {x | Ex = e, x ≥ 0}. 2. The set of realisation plans of Player 2 corresponding to some behavior strategy is a bounded non-empty polytope Y = {y | F y = f, y ≥ 0}. 3. The expected payoﬀ to Player 1 when he plays by x and Player 2 plays by y is xT Ay, where A is the sequence form payoﬀ matrix. The matrices E and F and the vectors e and f are constructed using the fact that the probability mass entering a node must be equal to the probability mass leaving the node. For our game from Figure 3 we therefore have the following equations for Player 1. x = 1, xD + xU = x , xDd + xDu = xD , x , xD , xU , xDd , xDu ≥ 0. The ﬁrst three lines corresponds to Ex = e. A formal proof of Lemma 5 is omitted since all three items are straightforward. It is, however, a very good exercise to go through the details of the proof and also verify a few examples! The next theorem follows naturally. Theorem 6 For a two-player, zero-sum game in extensive form with payoﬀ matrix A (from the sequence form), the maximin realisation plan, r, is given by r = arg max min xT Ay. x∈X y∈Y Finally, we are ready to give Algorithm 3. 6 Algorithm 3: (Koller, Megiddo, von Stengel 1996) 1. Convert the game to sequence form. In particular, compute the payoﬀ matrix A and the matrices and vectors E, e, F, f deﬁning the valid realization plans. 2. Compute the maximin expression of Theorem 6 using linear programming (possible due to the proof of the generalised maximin theorem). Since the number of sequences is linear in the number of nodes, we avoid the exponential blowup when constructing the payoﬀ matrix. This gives us a polynomial time algorithm in the number of nodes, which is useful for solving games of extensive form. The existence of such an algorithm was an open problem for quite a while. Using the Algorithm we ﬁnd the linear programs for our two running examples, see Table 1. Basic endgame of poker Game from Figure 3 Variables x , xb , xc , xb , xc (the realisation x , xD , xU , xDd , xDu (the realisation weights) weights) q0 (the value) q0 (the value) qh (representing the contribution to the qh (representing the contribution to the value from plays through the informa- value from plays through the informa- tion set owned by Player 2, h) tion set owned by Player 2, h) Program max q0 max q0 subject to subject to : q0 ≤ qh + 1 xc − 4 xc 4 3 : q0 ≤ qh + α · xu 3 9 C : qh ≤ 4 x b − 4 x b L : qh ≤ γ · xDd + β · xDu F : qh ≤ 1 x b + 4 x b 4 3 R : qh ≤ δ · x D x =1 x =1 xb + xc = x xD + xU = x xb + xc = x xDd + xDu = xD x , xb , xc , xb , xc ≥ 0 x , xD , xU , xDd , xDu ≥ 0 Table 1: Using Algorithm 3 on two examples. The intuition behind the ﬁrst constraint in the poker game is that the value is bounded by the contribution to the value through h plus the contribution from not going through h (Player 1 taking action c and c). The same applies to the other example. In general, the linear programs arising are quite intuitive. The reader is invited to try some more examples, in particular examples with more information sets belonging to Player 2. 7