Games Why Study Games Why Are Games Good for AI History of by xuyuzhu

VIEWS: 3 PAGES: 7

									                                                        Why Study Games?

                                            • Many human activities can be modeled as
                                              games
                 Games                          – Negotiations
                                                – Bidding
                                                – TCP/IP
                 CPS 170
                                                – Military confrontations
                 Ron Parr                       – Pursuit/Evasion
                                            • Games are used to train the mind
                                                – Human game-playing, animal play-fighting




Why Are Games Good for AI?                            History of Games in AI
• Games typically have concise rules        • Computer games have been around almost as
                                              long as computers (perhaps longer)
• Well-defined starting and end points
                                                – Chess: Turing (and others) in the 1950s
• Sensing and effecting are simplified          – Checkers: Samuel, 1950s learning program
  – Not true for sports games               •   Usually start with naïve optimism
  – See robocup                             •   Follow with naïve pessimism
• Games are fun!                            •   Simon: Computer chess champ by 1967
• Downside: Getting taken seriously (not)   •   Many, e.g., Kasparov, predicted that a computer
  – See robo search and rescue                  would never be champion
                                                                    Game Setup
               Games Today
                                                • Most commonly, we study games that are:
                                                  –   2 player
  • Computers perform at champion level           –   Alternating
    – Backgammon, Checkers, Chess, Othello        –   Zero-sum
                                                  –   Perfect information
  • Computers perform well
                                                • Examples: Checkers, chess, backgammon
    – Bridge
                                                • Assumptions can be relaxed at some expense
  • Computers still do badly
                                                • Economics studies case where number of
    – Go, Hex                                     agents is very large
                                                  – Individual actions don’t change the dynamics




          Zero Sum Games                                   Characterizing Games

• Assign values to different outcomes           • Two-player games are very much like
• Win = 1, Loss = -1                              search
                                                   – Initial state
• With zero sum games every gain comes at the      – Successor function
  other player’s expense                           – Terminal test
• Sum of both player’s scores must be 0            – Objective function (heuristic function)
• Are any games truly zero sum?                 • Unlike search
                                                   – Terminal states are often a large set
                                                   – Full search to terminal states usually impossible
                 Game Trees                                            Game Trees
                      x o x
                      o   x        Player 1                      Max nodes
                          o
                                                                      A1                    A3
    x o x            x o x              x o x                                      A2
    o x x            o   x              o   x   Player 2         Min nodes
        o            x   o              x   o
                                                           A11             A21      A22     A31   A32
                                                                 A12
x o x    x o x    x o x    x o x      x o x     x o x
o x x    o x x    o o x    o   x      o o x     o   x
o   o      o o    x   o    x o o      x   o     x o o
                     Player 1                                              Terminal Nodes




                                                                      Minimax Values
                   Minimax
                                                                 Max nodes
  • Max player tries to maximize his return                                             3

  • Min player tries to minimize his return
  • This is optimal for both (zero sum)
                                                                 Min nodes
                                                                                             2
                                                           3                   2
minimax(nmax ) = max s∈succesors( n ) minimax( s)
minimax(nmin ) = min s∈succesors( n ) minimax( s )

                                                           3     12        2        4        15   2
           Minimax Properties
                                                              Minimax in the Real World
• Minimax can be run depth first
   – Time O(bm)                                        • Search trees are too big
   – Space O(bm)                                       • Alternating turns double depth of the search
                                                         – 2 ply = 1 full turn
• Assumes that opponent plays optimally                • Branching factors are too high
                                                         – Chess: 35
• Based on a worst-case analysis                         – Go: 361
                                                       • Search from start never terminates in non-
• What if this is incorrect?                             trivial games




          Evaluation Functions                          Desiderata for Evaluation Functions
• Like heuristic functions
                                                        • Would like to put the same ordering on nodes (even
• Try to estimate value of a node without                 if values aren’t totally right)
  expanding all the way to termination                  • Is this a reasonable thing to ask for?
• Using evaluation functions                            • What if you have a perfect evaluation function?
  – Do a depth-limited search                           • How are evaluation functions made in practice?
  – Treat evaluation function as if it were terminal      – Buckets
• What’s wrong with this?                                 – Linear combinations
                                                             • Chess pieces (material)
• How do you pick the depth?                                 • Board control (positional, strategic)

• How do you manage your time?
  • Iterative deepening, quiescence
        Search Control Issues
                                                                      Pruning
• Horizon effects
                                                  • The most important search control method is
  – Sometimes something interesting is just
                                                    figuring out which nodes you don’t need to
    beyond the horizon
                                                    expand
  – How do you know?
                                                  • Use the fact that we are doing a worst-case
• When to generate more nodes?                      analysis to our advantage
• If you selectively extend your frontier, how      – Max player cuts off search when he knows min
  do you decide where?                                player can force a provably bad outcome
                                                    – Min player cuts of search when he knows max can
• If you have a fixed amount of total game            force a provably good (for max) outcome
  time, how do you allocate this?




           Alpha-beta pruning                                     How to prune
                                                   • We still do (bounded) DFS
           Max nodes            3
                                                   • Expand at least one path to the “bottom”
                                                   • If current node is max node, and min can
                                                     force a lower value, then prune siblings
           Min nodes                               • If curent node is min node, and max can
                                    2
   3                    2
                                                     force a higher value, then prune siblings



  3       12        2       4       15        2
                                                       Implementing alpha-beta
        Max node pruning
                                               max_value(state, alpha, beta)
                                               if cutoff(state) then return eval(state)
                                               for each s in successors(state) do
                    2                            alpha = max(alpha, min_value(s, alpha, beta))
                                                 if alpha >= beta the return beta
                                               end
                                               return alpha
    Max nodes
                                                               min_value(state, alpha, beta)
                2           4                                  if cutoff(state) then return eval(state)
                                                               for each s in successors(state) do
                                                                 beta = min(alpha, max_value(s, alpha, beta))
                                                                 if beta <= alpha the return alpha
                        4                                      end
                                                               return beta




 Amazing facts about alpha-beta                        What About Probabilities?

• Empirically, alpha-beta has the effect of                 Max nodes
  reducing the branching factor by half for
  many problems
• This effectively doubles the horizon that   Chance
  can be searched                             nodes
• Alpha-beta makes the difference                                                      P=0.9
                                              P=0.5         P=0.5 P=0.6       P=0.4               P=0.1
  between novice and expert computer
  players

                                                                        Min nodes
                Expectiminimax                                        Expectiminimax is nasty
                                                                  • High branching factor
    • n random outcomes per chance node
                                                                  • Randomness makes evaluation fns difficult
    • O(bmnm) time                                                  – Hard to predict many steps into future
                                                                    – Values tend to smear together
                                                                    – Preserving order is not sufficient
                                                                  • Pruning is problematic
eminimax(nmax ) = max s∈succesors( n ) eminimax( s )                – Need to prune based upon bound on an
eminimax(nmin ) = min s∈succesors ( n ) eminimax( s )                 expectation

eminimax(nchance ) = ∑s∈succesors( n ) eminimax( s) p ( s )
                                                                    – Need a priori bounds on the evaluation function




              Multiplayer Games                                                   Conclusions

    • Things sort-of generalize                               • Game tree search is a special kind of search
    • We can maintain a vector of possible                    • Rely heavily on heuristic evaluation functions
      values for each player at each node                     • Alpha-beta is a big win
    • Assume that each player acts greedily                   • Most successful players use alpha-beta
    • What’s wrong with this?                                 • Final thought: Tradeoff between search
                                                                effort and evaluation function effort
                                                              • When is it better to invest in your evaluation
                                                                function?

								
To top