VIEWS: 6 PAGES: 12 POSTED ON: 3/19/2011
Opponent Modeling in Poker Karen Glocer Mark Deckert June 15, 2007 1 Introduction In recent years much progress has been made on computer gameplay in games of complete information such as chess and go. Computers have surpassed the ability of top chess players and are well on their way to doing so at Go. Games of incomplete information, on the other hand, are far less studied. Despite signiﬁcant ﬁnancial incentives, computerized poker players still perform at a level well below that of poker professionals. They have only reached anything near optimality on 2-player poker, which is much simpler than the normal 9 or 10 player version. We believe that the complex games of incomplete information like poker represent an important future application of machine learning. In particular, we believe that online learning techniques are a good ﬁt for many important problems posed by poker. Currently the most popular form of poker is No-Limit holdem. A yearly series of tournaments, called the World Series of Poker, sees the largest prize pools. Its championship and largest tournament is a No-Limit Holdem tourna- ment whose top prize now exceeds $10,000,000. However, you don’t have to win the world series to earn money playing No-Limit Holdem. With the prevalence of online games, just about anything you can predict with a computer more eﬀectively will help you turn a proﬁt. For this reason we have chosen to explore the game of No-Limit holdem and, more speciﬁcally to use online expert algo- rithms to develop opponent player models, a critical component of successful gameplay and a problem area in current poker research. 2 The Basics of No-Limit Hold-Em No-Limit Holdem is a ”Big Bet” form of poker where players have the freedom to bet any amount of their chips at any decision point in the game. This ability to make diﬀerent sized bets adds a tremendous amount of complexity to the gameplay (in comparison to limit poker, where bet sizes are set). No-Limit holdem is played with up to 10 players, each of whom is dealt 2 cards. These two cards form their ”hand.” There are 4 rounds of betting, between which community cards are dealt. Players use any combination of their hand and the 1 community cards to form a 5 card poker hand. They make betting decisions based on this hand and their beliefs about the other players hands. In order to continue to the ”showdown” where players compare hands and the best hand wins, each player must match the bets made by other players such that each players has put the same amount of chips in the pot. If a player runs out of chips during a hand, he can only vie for the part of the pot he can match and is allowed to continue to the showdown regardless of futher bets made. The state of having all of one’s chips in the pot is called being ”All In” and is critical to our model of poker. Should a player have chips remaining but not wish to match a bet made by another players, he may fold and give up his chance of winning the pot. The majority of No-Limit holdem pots end by one person making a bet that no-one wishes to match. This leads to the phenomenon of bluﬃng, where a player with a poor hand makes a large bet in an attempt to win the pot without showing his cards. At the beginning of each hand, the pot is started with forced bets called ”blinds”. Play proceeds sequentually around the table, allowing each play the decision to ”fold,” match the currrent bet (”call”) or make a bet in additional to the current bet (”raise”). Once all bets have been matched by players who have not folded, that round of betting is complete. Tournament poker is distinguished by the blinds growing larger over time forcing more and more money into the pot such that one person eventually ends up with all the chips. This prevents players from playing a single strategy throughout the course of the tournament and leads us to believe that algorithms designed to track the best expert will be necessary. 3 Related Work Although we have no doubt whatsoever that extensive work has been done on opponent modeling in poker, the literature on opponent modeling in poker is sparse. The cause, alas, is the almighty dollar. Most people who work in this ﬁeld have a strong disincentive to publish because they generally use their results in actual poker games and have no desire to give their strategies to their opponents for free. The only academic group devoted to study of poker is the games group at the University of Alberta. Their primary focus is on heads up poker, the situation where there are only two players and generally they restrict themselves to limit poker as well. To date they have focused on optimal two player poker and have done only initial exploratory work on opponent modeling. In [3], Billings et al. discuss game theoretically optimal strategies for 2 player limit hold’em. Limit hold-em has the same rules as no-limit hold-em discussed previously, except that the size of the bets is limited. In the ﬁrst two rounds of betting, the bet is always in increments of the big blind and in subsequent rounds, the bets are in increments twice the big blind. This limits the possibilities, though not enough to make it an easy problem. Billings et all. use approximations to achieve computational feasibility and create what they call ”psuedo-optimal” strategies for the non-approximated full game. The bots they create are competitive with strong players. Game theretically correct 2 strategies do not adapt to opponents, but instead use mixed strategies such that an opponent cannot exploit their own play. This diﬀers from our own goal of modeling and exploiting opponent tendencies. This paper also illustrates the complexity of even 2-player limit holdem. The game we are exploring is more complex both due to the multiplayer setting and due to the variable sized bets allowed in no-limit Hold’em, which is why we chose to make the all-in or fold simpliﬁcation. In [2], another paper covering the limit version of Texas Hold’em, opponent modeling is discussed in detail. As mentioned, there are many factors which may aﬀect an opponent’s decision and these factors change over time. We agree with Davidson that a good baseline average opponent model is desired. As more data is collected for a particular player, the model should deviate from this baseline. Davidson et al. attempt to use neural networks for opponent modeling, but neural networks don’t yield easily interpretable models and they don’t have the ability to adapt to shifting strategies. The primary work to date on opponent modeling was done by Billings et al. [1]. The setting is the standard limit Texas Hold’em game, as opposed to the simpliﬁed scenario of all-in or fold poker. Instead of trying to estimate the probability of the player calling an all-in bet, Billings et al. attempt to model expected hand strength given both the current actions of the player and the player’s previous actions. The estimation begins post-ﬂop. At this point there are 1081 possible starting hands, and the authors assign two sets of weights to each possible hand that reﬂect the probability of the opponent having that hand. The ﬁrst set of weights reﬂects betting actions but is not player speciﬁc. For example, if a player raises post-ﬂop, then weights on stronger hands are increases and weights on weaker hands are decreased. The second set of weights reﬂects the player’s betting history such as the frequency of folding, calling, or raising before the ﬂop. For the rest of the hand, each action of the player results in updated weights that now reﬂect a rather complex calculation of expected hand strength. This approach diﬀerent fundamentally from ours in many ways. To begin with, all-in or fold poker represents a signiﬁcant simpliﬁcation. Also, the weights in this opponent model are determined heuristically, with no learning involved. 4 Simpliﬁcation: The All-in or Fold strategy Given the complex decision structure created by variable sized bets, we require a simpliﬁcation to study the game. The idea for our simpliﬁcation comes from the foremost poker strategy author, David Sklansky, who posits in his book ”Tournament Poker” that an eﬃctive strategy can be formed by limiting one’s betting decisions to either folding or betting everything, commonly called “going all in”. This strategy blunts all of the best poker players weapons, forcing them to either accept tremendous risk or fold away their chips. Given this simpliﬁed strategy, we believe an eﬀective opponent model can be developed by using online learning to predict whether or not a player will call an all-in bet. With 3 this information, it should be possible to develop a reasonable deterministic strategy strategy using these opponent models and the game state. Though we believe that this research paves the way to making a strong computer poker tournament player, the goal of our research is only to model players and predict their behavior in all-in situations with a degree of accuracy that will allow a computer to make proﬁtable decisions. 5 The learning problem: Know thine enemy A common saying known to successful poker players is as follows, ”Poker is not a card game played with people but a people game played with cards.” The point of this is that knowing your opponent’s tendencies is just as critical to successful play as the ability to judge the value of your hand and bet accordingly. We wish to predict whether a player will call or fold faced with an all-in bet. A player’s decision should depend on several factors: • The player’s position at the table. For competent players, this is a huge factor in how a hand will be played. We are referring where the player is with respect to the dealer, because this determines the order in which the players make their moves. However, position is also important with respect to surrounding players. Generally it is better to have a loose player to your right and a tight player to your left. • The player’s stack size. This is the amount of money the player has at this point in the game. This is important because of the blinds, but also because the player with the large stack can use it to bully other players, while the player with the small stack will face pressure to take risks with the hope of doubling up. • The blinds. If the blinds are large with respect to the player’s stack, the player will be forced to either play or be eaten by the blinds. Similarly, a player can aﬀord to be more picky about which hand to play if the blinds are small with respect to the stack. Note that what is important is the ratio. • Previous actions of other players. This will determine the pot size and the pot odds, but it also contains other important information. For example, which players bet, which called, and which folded, not to mention their relative positions. The above factors are constitute the state of the game. These do not need to be learned. However, the following factors are key in predicting whether a player will call or fold, and require some estimation. As an aside, one of the things that separates a good poker player from a mediocre one is the ability to make good estimates of the following variables just by observing play: • Whether the player is passive or aggressive. 4 • Whether the player is tight or loose. • Whether the player is aware of positional strategies. • Whether the player is aware of other players. • The player’s hand strength. It is not an accident that the hand strength was listed last. While it is not the least important variable, it must be emphasized that playing good poker requires taking all of the above variables into account. What we want to predict is the probability of a player calling an all-in bet given the state of the game, the characteristics of the player we are modeling, and the characteristics of the player who goes all in: P (call|game state, player behavior, all-in player ). More generally, let C be a random variable that reﬂects whether the player calls. you want to ﬁnd the conditional probability P (C|X1 . . . XN ). Most importantly, predicting the probability of a call means that any future poker bot won’t have to estimate its opponent’s hand strength, a signiﬁcant problem in most other poker work and something only a very experienced poker player can do well. Instead, if the probability of a call is 10%, a naive but eﬀective strategy would be to assume that the player’s hand is among the top 10% of hands. Hand ranking is a much easier problem than estimating expected hand strength. The naive approach amounts to putting a uniform distribution on all hands in the top 10%. A more sophisticated approach, which we have not yet considered, would put a more sophisticated distribution on the possible hole cards of any caller given the estimated probability of calling. 6 Data In order to test our learning model, we collected histories of actual games played online for real money. This is actually not a trivial operation. The diﬃculty of collecting good data is that, while an all-in or fold strategy can be eﬀective (someone purportedly won the world series playing this strategy, after all), it’s very uncommon. As such, the hand history of a typical tournament is not especially rich in the speciﬁc type of all-in situations we which to study. While players go all in fairly often, especially towards the end of tournaments, it doesn’t happen often enough to allow us to model the other players in the all-in or fold context. As a result, the best option was simply to play all-in or fold strategies in tournaments ourselves and keep hand histories of the entire tournament. We did this for 19 tournaments, and in the process we collected data for 347 distinct players. For better or worse, the properties of this data dictated our ﬁnal approach. The primary factor was the paucity of data for each player. This is an important point, so we will elaborate. Recall that each time we went all-in, all of the other players still in the hand had to decide whether to call this all-in bet or fold. We instantiate an opponent model for each player we try to model. For each individual model, 5 we get an example when a player is in the position of reacting to an all-in bet. Unfortunately, because tournaments are brief and players are regularly reshuﬄed between tables, we never got more than 18 examples for a single player. On the bright side, we generally saw each player play many other hands that would not constitute examples, and we were still able to learn from those hands. In particular, we were able to use these additional hands to estimate player characteristics like looseness and aggressiveness, both of which aﬀect the calling probability. The other aspect of our implementation aﬀected by the paucity of data is the number of experts. If there are N experts, these algorithms converge in O(log N ) iterations. Because we have so few examples to consider, we need the algorithm to converge extremely rapidly, and therefore we can only use very few experts. 7 Experimental Results Experimental results in ﬁgures 1, 2, 3, 4, and 5 demonstrate that improved loss is achieved by modeling the factors attended to by individual players. Final weight vectors show the important of position and recent player behavior. 8 Conclusion The experimental data suggests that player speciﬁc modeling which adapts rapidly to changing tournament conditions is eﬀective. For players with a rea- sonable amount of data points, player speciﬁc models show signiﬁcantly less loss than the baseline average player model. Moving forward, we plan to expand the number of experts to include more specialized elements of play which are only attended to by expert players. With well tuned experts, an increased learning rate should hopefully start to mirror the quick player assesment based on very few data points that expert players are capable of. With accurate all-in calling probablilties, it becomes feasible to build a bot which estimates overall expected value of an all-in play. Based on observations by human tournament professon- als, this EV threshold should be tuned to something near 1.05 (5% advantage) and a successful bot should emerge. References [1] Darse Billings, Denis Papp, Jonathan Schaeﬀer, and Duane Szafron. Oppo- nent modeling in poker. In Proceedings of the 15th National Conference on Artiﬁcial Intelligence (AAAI-98), pages 493–498, Madison, WI, 1998. AAAI Press. [2] A. Davidson. Opponent modeling in poker: Learning and acting in a hostile environment, 2002. 6 Figure 1: Player Speciﬁc Weights 7 Figure 2: Player Speciﬁc Weights 8 Figure 3: Weights of Aggregate Player 9 Figure 4: Improved Loss for Player Speciﬁc Model 10 Figure 5: Improved Loss for Player Speciﬁc Model 11 [3] Darse Billings et al. Approximating game-theoretic optimal strategies for full-scale poker, 2002. 12