Docstoc

Opponent Modeling in Poker

Document Sample
Opponent Modeling in Poker Powered By Docstoc
					                Opponent Modeling in Poker
                   Karen Glocer             Mark Deckert
                               June 15, 2007


1    Introduction
In recent years much progress has been made on computer gameplay in games
of complete information such as chess and go. Computers have surpassed the
ability of top chess players and are well on their way to doing so at Go. Games
of incomplete information, on the other hand, are far less studied. Despite
significant financial incentives, computerized poker players still perform at a
level well below that of poker professionals. They have only reached anything
near optimality on 2-player poker, which is much simpler than the normal 9 or 10
player version. We believe that the complex games of incomplete information
like poker represent an important future application of machine learning. In
particular, we believe that online learning techniques are a good fit for many
important problems posed by poker.
    Currently the most popular form of poker is No-Limit holdem. A yearly
series of tournaments, called the World Series of Poker, sees the largest prize
pools. Its championship and largest tournament is a No-Limit Holdem tourna-
ment whose top prize now exceeds $10,000,000. However, you don’t have to win
the world series to earn money playing No-Limit Holdem. With the prevalence
of online games, just about anything you can predict with a computer more
effectively will help you turn a profit. For this reason we have chosen to explore
the game of No-Limit holdem and, more specifically to use online expert algo-
rithms to develop opponent player models, a critical component of successful
gameplay and a problem area in current poker research.


2    The Basics of No-Limit Hold-Em
No-Limit Holdem is a ”Big Bet” form of poker where players have the freedom
to bet any amount of their chips at any decision point in the game. This ability
to make different sized bets adds a tremendous amount of complexity to the
gameplay (in comparison to limit poker, where bet sizes are set). No-Limit
holdem is played with up to 10 players, each of whom is dealt 2 cards. These
two cards form their ”hand.” There are 4 rounds of betting, between which
community cards are dealt. Players use any combination of their hand and the



                                       1
community cards to form a 5 card poker hand. They make betting decisions
based on this hand and their beliefs about the other players hands.
    In order to continue to the ”showdown” where players compare hands and
the best hand wins, each player must match the bets made by other players such
that each players has put the same amount of chips in the pot. If a player runs
out of chips during a hand, he can only vie for the part of the pot he can match
and is allowed to continue to the showdown regardless of futher bets made. The
state of having all of one’s chips in the pot is called being ”All In” and is critical
to our model of poker. Should a player have chips remaining but not wish to
match a bet made by another players, he may fold and give up his chance of
winning the pot. The majority of No-Limit holdem pots end by one person
making a bet that no-one wishes to match. This leads to the phenomenon of
bluffing, where a player with a poor hand makes a large bet in an attempt to
win the pot without showing his cards. At the beginning of each hand, the pot
is started with forced bets called ”blinds”. Play proceeds sequentually around
the table, allowing each play the decision to ”fold,” match the currrent bet
(”call”) or make a bet in additional to the current bet (”raise”). Once all bets
have been matched by players who have not folded, that round of betting is
complete. Tournament poker is distinguished by the blinds growing larger over
time forcing more and more money into the pot such that one person eventually
ends up with all the chips. This prevents players from playing a single strategy
throughout the course of the tournament and leads us to believe that algorithms
designed to track the best expert will be necessary.


3     Related Work
Although we have no doubt whatsoever that extensive work has been done
on opponent modeling in poker, the literature on opponent modeling in poker
is sparse. The cause, alas, is the almighty dollar. Most people who work in
this field have a strong disincentive to publish because they generally use their
results in actual poker games and have no desire to give their strategies to their
opponents for free. The only academic group devoted to study of poker is the
games group at the University of Alberta. Their primary focus is on heads up
poker, the situation where there are only two players and generally they restrict
themselves to limit poker as well. To date they have focused on optimal two
player poker and have done only initial exploratory work on opponent modeling.
    In [3], Billings et al. discuss game theoretically optimal strategies for 2
player limit hold’em. Limit hold-em has the same rules as no-limit hold-em
discussed previously, except that the size of the bets is limited. In the first
two rounds of betting, the bet is always in increments of the big blind and in
subsequent rounds, the bets are in increments twice the big blind. This limits
the possibilities, though not enough to make it an easy problem. Billings et
all. use approximations to achieve computational feasibility and create what
they call ”psuedo-optimal” strategies for the non-approximated full game. The
bots they create are competitive with strong players. Game theretically correct


                                          2
strategies do not adapt to opponents, but instead use mixed strategies such that
an opponent cannot exploit their own play. This differs from our own goal of
modeling and exploiting opponent tendencies. This paper also illustrates the
complexity of even 2-player limit holdem. The game we are exploring is more
complex both due to the multiplayer setting and due to the variable sized bets
allowed in no-limit Hold’em, which is why we chose to make the all-in or fold
simplification.
    In [2], another paper covering the limit version of Texas Hold’em, opponent
modeling is discussed in detail. As mentioned, there are many factors which
may affect an opponent’s decision and these factors change over time. We agree
with Davidson that a good baseline average opponent model is desired. As
more data is collected for a particular player, the model should deviate from
this baseline. Davidson et al. attempt to use neural networks for opponent
modeling, but neural networks don’t yield easily interpretable models and they
don’t have the ability to adapt to shifting strategies.
    The primary work to date on opponent modeling was done by Billings et
al. [1]. The setting is the standard limit Texas Hold’em game, as opposed to
the simplified scenario of all-in or fold poker. Instead of trying to estimate the
probability of the player calling an all-in bet, Billings et al. attempt to model
expected hand strength given both the current actions of the player and the
player’s previous actions. The estimation begins post-flop. At this point there
are 1081 possible starting hands, and the authors assign two sets of weights to
each possible hand that reflect the probability of the opponent having that hand.
The first set of weights reflects betting actions but is not player specific. For
example, if a player raises post-flop, then weights on stronger hands are increases
and weights on weaker hands are decreased. The second set of weights reflects
the player’s betting history such as the frequency of folding, calling, or raising
before the flop. For the rest of the hand, each action of the player results in
updated weights that now reflect a rather complex calculation of expected hand
strength. This approach different fundamentally from ours in many ways. To
begin with, all-in or fold poker represents a significant simplification. Also, the
weights in this opponent model are determined heuristically, with no learning
involved.


4    Simplification: The All-in or Fold strategy
Given the complex decision structure created by variable sized bets, we require
a simplification to study the game. The idea for our simplification comes from
the foremost poker strategy author, David Sklansky, who posits in his book
”Tournament Poker” that an effictive strategy can be formed by limiting one’s
betting decisions to either folding or betting everything, commonly called “going
all in”. This strategy blunts all of the best poker players weapons, forcing them
to either accept tremendous risk or fold away their chips. Given this simplified
strategy, we believe an effective opponent model can be developed by using
online learning to predict whether or not a player will call an all-in bet. With


                                        3
this information, it should be possible to develop a reasonable deterministic
strategy strategy using these opponent models and the game state.
    Though we believe that this research paves the way to making a strong
computer poker tournament player, the goal of our research is only to model
players and predict their behavior in all-in situations with a degree of accuracy
that will allow a computer to make profitable decisions.


5     The learning problem: Know thine enemy
A common saying known to successful poker players is as follows, ”Poker is not a
card game played with people but a people game played with cards.” The point
of this is that knowing your opponent’s tendencies is just as critical to successful
play as the ability to judge the value of your hand and bet accordingly. We wish
to predict whether a player will call or fold faced with an all-in bet. A player’s
decision should depend on several factors:

    • The player’s position at the table. For competent players, this is a huge
      factor in how a hand will be played. We are referring where the player
      is with respect to the dealer, because this determines the order in which
      the players make their moves. However, position is also important with
      respect to surrounding players. Generally it is better to have a loose player
      to your right and a tight player to your left.

    • The player’s stack size. This is the amount of money the player has at
      this point in the game. This is important because of the blinds, but also
      because the player with the large stack can use it to bully other players,
      while the player with the small stack will face pressure to take risks with
      the hope of doubling up.

    • The blinds. If the blinds are large with respect to the player’s stack, the
      player will be forced to either play or be eaten by the blinds. Similarly, a
      player can afford to be more picky about which hand to play if the blinds
      are small with respect to the stack. Note that what is important is the
      ratio.

    • Previous actions of other players. This will determine the pot size and the
      pot odds, but it also contains other important information. For example,
      which players bet, which called, and which folded, not to mention their
      relative positions.

   The above factors are constitute the state of the game. These do not need
to be learned. However, the following factors are key in predicting whether a
player will call or fold, and require some estimation. As an aside, one of the
things that separates a good poker player from a mediocre one is the ability to
make good estimates of the following variables just by observing play:

    • Whether the player is passive or aggressive.


                                         4
    • Whether the player is tight or loose.
    • Whether the player is aware of positional strategies.
    • Whether the player is aware of other players.
    • The player’s hand strength.
It is not an accident that the hand strength was listed last. While it is not
the least important variable, it must be emphasized that playing good poker
requires taking all of the above variables into account. What we want to predict
is the probability of a player calling an all-in bet given the state of the game,
the characteristics of the player we are modeling, and the characteristics of the
player who goes all in: P (call|game state, player behavior, all-in player ). More
generally, let C be a random variable that reflects whether the player calls. you
want to find the conditional probability P (C|X1 . . . XN ).
    Most importantly, predicting the probability of a call means that any future
poker bot won’t have to estimate its opponent’s hand strength, a significant
problem in most other poker work and something only a very experienced poker
player can do well. Instead, if the probability of a call is 10%, a naive but
effective strategy would be to assume that the player’s hand is among the top
10% of hands. Hand ranking is a much easier problem than estimating expected
hand strength. The naive approach amounts to putting a uniform distribution
on all hands in the top 10%. A more sophisticated approach, which we have
not yet considered, would put a more sophisticated distribution on the possible
hole cards of any caller given the estimated probability of calling.


6     Data
In order to test our learning model, we collected histories of actual games played
online for real money. This is actually not a trivial operation. The difficulty
of collecting good data is that, while an all-in or fold strategy can be effective
(someone purportedly won the world series playing this strategy, after all), it’s
very uncommon. As such, the hand history of a typical tournament is not
especially rich in the specific type of all-in situations we which to study. While
players go all in fairly often, especially towards the end of tournaments, it
doesn’t happen often enough to allow us to model the other players in the all-in
or fold context.
    As a result, the best option was simply to play all-in or fold strategies in
tournaments ourselves and keep hand histories of the entire tournament. We did
this for 19 tournaments, and in the process we collected data for 347 distinct
players. For better or worse, the properties of this data dictated our final
approach. The primary factor was the paucity of data for each player. This is
an important point, so we will elaborate.
    Recall that each time we went all-in, all of the other players still in the
hand had to decide whether to call this all-in bet or fold. We instantiate an
opponent model for each player we try to model. For each individual model,


                                        5
we get an example when a player is in the position of reacting to an all-in
bet. Unfortunately, because tournaments are brief and players are regularly
reshuffled between tables, we never got more than 18 examples for a single
player. On the bright side, we generally saw each player play many other hands
that would not constitute examples, and we were still able to learn from those
hands. In particular, we were able to use these additional hands to estimate
player characteristics like looseness and aggressiveness, both of which affect the
calling probability.
    The other aspect of our implementation affected by the paucity of data is
the number of experts. If there are N experts, these algorithms converge in
O(log N ) iterations. Because we have so few examples to consider, we need the
algorithm to converge extremely rapidly, and therefore we can only use very few
experts.


7    Experimental Results
Experimental results in figures 1, 2, 3, 4, and 5 demonstrate that improved loss
is achieved by modeling the factors attended to by individual players. Final
weight vectors show the important of position and recent player behavior.


8    Conclusion
The experimental data suggests that player specific modeling which adapts
rapidly to changing tournament conditions is effective. For players with a rea-
sonable amount of data points, player specific models show significantly less loss
than the baseline average player model. Moving forward, we plan to expand the
number of experts to include more specialized elements of play which are only
attended to by expert players. With well tuned experts, an increased learning
rate should hopefully start to mirror the quick player assesment based on very
few data points that expert players are capable of. With accurate all-in calling
probablilties, it becomes feasible to build a bot which estimates overall expected
value of an all-in play. Based on observations by human tournament professon-
als, this EV threshold should be tuned to something near 1.05 (5% advantage)
and a successful bot should emerge.


References
[1] Darse Billings, Denis Papp, Jonathan Schaeffer, and Duane Szafron. Oppo-
    nent modeling in poker. In Proceedings of the 15th National Conference on
    Artificial Intelligence (AAAI-98), pages 493–498, Madison, WI, 1998. AAAI
    Press.

[2] A. Davidson. Opponent modeling in poker: Learning and acting in a hostile
    environment, 2002.


                                        6
Figure 1: Player Specific Weights




               7
Figure 2: Player Specific Weights




               8
Figure 3: Weights of Aggregate Player




                 9
Figure 4: Improved Loss for Player Specific Model




                      10
Figure 5: Improved Loss for Player Specific Model




                      11
[3] Darse Billings et al. Approximating game-theoretic optimal strategies for
    full-scale poker, 2002.




                                     12

				
DOCUMENT INFO