Docstoc

Games Why Study Games Why Are Games Good for AI History of

Document Sample
Games Why Study Games Why Are Games Good for AI History of Powered By Docstoc
					                                                        Why Study Games?

                                            • Many human activities can be modeled as
                                              games
                 Games                          – Negotiations
                                                – Bidding
                                                – TCP/IP
                 CPS 170
                                                – Military confrontations
                 Ron Parr                       – Pursuit/Evasion
                                            • Games are used to train the mind
                                                – Human game-playing, animal play-fighting




Why Are Games Good for AI?                            History of Games in AI
• Games typically have concise rules        • Computer games have been around almost as
                                              long as computers (perhaps longer)
• Well-defined starting and end points
                                                – Chess: Turing (and others) in the 1950s
• Sensing and effecting are simplified          – Checkers: Samuel, 1950s learning program
  – Not true for sports games               •   Usually start with naïve optimism
  – See robocup                             •   Follow with naïve pessimism
• Games are fun!                            •   Simon: Computer chess champ by 1967
• Downside: Getting taken seriously (not)   •   Many, e.g., Kasparov, predicted that a computer
  – See robo search and rescue                  would never be champion
                                                                    Game Setup
               Games Today
                                                • Most commonly, we study games that are:
                                                  –   2 player
  • Computers perform at champion level           –   Alternating
    – Backgammon, Checkers, Chess, Othello        –   Zero-sum
                                                  –   Perfect information
  • Computers perform well
                                                • Examples: Checkers, chess, backgammon
    – Bridge
                                                • Assumptions can be relaxed at some expense
  • Computers still do badly
                                                • Economics studies case where number of
    – Go, Hex                                     agents is very large
                                                  – Individual actions don’t change the dynamics




          Zero Sum Games                                   Characterizing Games

• Assign values to different outcomes           • Two-player games are very much like
• Win = 1, Loss = -1                              search
                                                   – Initial state
• With zero sum games every gain comes at the      – Successor function
  other player’s expense                           – Terminal test
• Sum of both player’s scores must be 0            – Objective function (heuristic function)
• Are any games truly zero sum?                 • Unlike search
                                                   – Terminal states are often a large set
                                                   – Full search to terminal states usually impossible
                 Game Trees                                            Game Trees
                      x o x
                      o   x        Player 1                      Max nodes
                          o
                                                                      A1                    A3
    x o x            x o x              x o x                                      A2
    o x x            o   x              o   x   Player 2         Min nodes
        o            x   o              x   o
                                                           A11             A21      A22     A31   A32
                                                                 A12
x o x    x o x    x o x    x o x      x o x     x o x
o x x    o x x    o o x    o   x      o o x     o   x
o   o      o o    x   o    x o o      x   o     x o o
                     Player 1                                              Terminal Nodes




                                                                      Minimax Values
                   Minimax
                                                                 Max nodes
  • Max player tries to maximize his return                                             3

  • Min player tries to minimize his return
  • This is optimal for both (zero sum)
                                                                 Min nodes
                                                                                             2
                                                           3                   2
minimax(nmax ) = max s∈succesors( n ) minimax( s)
minimax(nmin ) = min s∈succesors( n ) minimax( s )

                                                           3     12        2        4        15   2
           Minimax Properties
                                                              Minimax in the Real World
• Minimax can be run depth first
   – Time O(bm)                                        • Search trees are too big
   – Space O(bm)                                       • Alternating turns double depth of the search
                                                         – 2 ply = 1 full turn
• Assumes that opponent plays optimally                • Branching factors are too high
                                                         – Chess: 35
• Based on a worst-case analysis                         – Go: 361
                                                       • Search from start never terminates in non-
• What if this is incorrect?                             trivial games




          Evaluation Functions                          Desiderata for Evaluation Functions
• Like heuristic functions
                                                        • Would like to put the same ordering on nodes (even
• Try to estimate value of a node without                 if values aren’t totally right)
  expanding all the way to termination                  • Is this a reasonable thing to ask for?
• Using evaluation functions                            • What if you have a perfect evaluation function?
  – Do a depth-limited search                           • How are evaluation functions made in practice?
  – Treat evaluation function as if it were terminal      – Buckets
• What’s wrong with this?                                 – Linear combinations
                                                             • Chess pieces (material)
• How do you pick the depth?                                 • Board control (positional, strategic)

• How do you manage your time?
  • Iterative deepening, quiescence
        Search Control Issues
                                                                      Pruning
• Horizon effects
                                                  • The most important search control method is
  – Sometimes something interesting is just
                                                    figuring out which nodes you don’t need to
    beyond the horizon
                                                    expand
  – How do you know?
                                                  • Use the fact that we are doing a worst-case
• When to generate more nodes?                      analysis to our advantage
• If you selectively extend your frontier, how      – Max player cuts off search when he knows min
  do you decide where?                                player can force a provably bad outcome
                                                    – Min player cuts of search when he knows max can
• If you have a fixed amount of total game            force a provably good (for max) outcome
  time, how do you allocate this?




           Alpha-beta pruning                                     How to prune
                                                   • We still do (bounded) DFS
           Max nodes            3
                                                   • Expand at least one path to the “bottom”
                                                   • If current node is max node, and min can
                                                     force a lower value, then prune siblings
           Min nodes                               • If curent node is min node, and max can
                                    2
   3                    2
                                                     force a higher value, then prune siblings



  3       12        2       4       15        2
                                                       Implementing alpha-beta
        Max node pruning
                                               max_value(state, alpha, beta)
                                               if cutoff(state) then return eval(state)
                                               for each s in successors(state) do
                    2                            alpha = max(alpha, min_value(s, alpha, beta))
                                                 if alpha >= beta the return beta
                                               end
                                               return alpha
    Max nodes
                                                               min_value(state, alpha, beta)
                2           4                                  if cutoff(state) then return eval(state)
                                                               for each s in successors(state) do
                                                                 beta = min(alpha, max_value(s, alpha, beta))
                                                                 if beta <= alpha the return alpha
                        4                                      end
                                                               return beta




 Amazing facts about alpha-beta                        What About Probabilities?

• Empirically, alpha-beta has the effect of                 Max nodes
  reducing the branching factor by half for
  many problems
• This effectively doubles the horizon that   Chance
  can be searched                             nodes
• Alpha-beta makes the difference                                                      P=0.9
                                              P=0.5         P=0.5 P=0.6       P=0.4               P=0.1
  between novice and expert computer
  players

                                                                        Min nodes
                Expectiminimax                                        Expectiminimax is nasty
                                                                  • High branching factor
    • n random outcomes per chance node
                                                                  • Randomness makes evaluation fns difficult
    • O(bmnm) time                                                  – Hard to predict many steps into future
                                                                    – Values tend to smear together
                                                                    – Preserving order is not sufficient
                                                                  • Pruning is problematic
eminimax(nmax ) = max s∈succesors( n ) eminimax( s )                – Need to prune based upon bound on an
eminimax(nmin ) = min s∈succesors ( n ) eminimax( s )                 expectation

eminimax(nchance ) = ∑s∈succesors( n ) eminimax( s) p ( s )
                                                                    – Need a priori bounds on the evaluation function




              Multiplayer Games                                                   Conclusions

    • Things sort-of generalize                               • Game tree search is a special kind of search
    • We can maintain a vector of possible                    • Rely heavily on heuristic evaluation functions
      values for each player at each node                     • Alpha-beta is a big win
    • Assume that each player acts greedily                   • Most successful players use alpha-beta
    • What’s wrong with this?                                 • Final thought: Tradeoff between search
                                                                effort and evaluation function effort
                                                              • When is it better to invest in your evaluation
                                                                function?

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:3/10/2012
language:
pages:7