Games Why Study Games Why Are Games Good for AI History of
W
Document Sample


Why Study Games?
• Many human activities can be modeled as
games
Games – Negotiations
– Bidding
– TCP/IP
CPS 170
– Military confrontations
Ron Parr – Pursuit/Evasion
• Games are used to train the mind
– Human game-playing, animal play-fighting
Why Are Games Good for AI? History of Games in AI
• Games typically have concise rules • Computer games have been around almost as
long as computers (perhaps longer)
• Well-defined starting and end points
– Chess: Turing (and others) in the 1950s
• Sensing and effecting are simplified – Checkers: Samuel, 1950s learning program
– Not true for sports games • Usually start with naïve optimism
– See robocup • Follow with naïve pessimism
• Games are fun! • Simon: Computer chess champ by 1967
• Downside: Getting taken seriously (not) • Many, e.g., Kasparov, predicted that a computer
– See robo search and rescue would never be champion
Game Setup
Games Today
• Most commonly, we study games that are:
– 2 player
• Computers perform at champion level – Alternating
– Backgammon, Checkers, Chess, Othello – Zero-sum
– Perfect information
• Computers perform well
• Examples: Checkers, chess, backgammon
– Bridge
• Assumptions can be relaxed at some expense
• Computers still do badly
• Economics studies case where number of
– Go, Hex agents is very large
– Individual actions don’t change the dynamics
Zero Sum Games Characterizing Games
• Assign values to different outcomes • Two-player games are very much like
• Win = 1, Loss = -1 search
– Initial state
• With zero sum games every gain comes at the – Successor function
other player’s expense – Terminal test
• Sum of both player’s scores must be 0 – Objective function (heuristic function)
• Are any games truly zero sum? • Unlike search
– Terminal states are often a large set
– Full search to terminal states usually impossible
Game Trees Game Trees
x o x
o x Player 1 Max nodes
o
A1 A3
x o x x o x x o x A2
o x x o x o x Player 2 Min nodes
o x o x o
A11 A21 A22 A31 A32
A12
x o x x o x x o x x o x x o x x o x
o x x o x x o o x o x o o x o x
o o o o x o x o o x o x o o
Player 1 Terminal Nodes
Minimax Values
Minimax
Max nodes
• Max player tries to maximize his return 3
• Min player tries to minimize his return
• This is optimal for both (zero sum)
Min nodes
2
3 2
minimax(nmax ) = max s∈succesors( n ) minimax( s)
minimax(nmin ) = min s∈succesors( n ) minimax( s )
3 12 2 4 15 2
Minimax Properties
Minimax in the Real World
• Minimax can be run depth first
– Time O(bm) • Search trees are too big
– Space O(bm) • Alternating turns double depth of the search
– 2 ply = 1 full turn
• Assumes that opponent plays optimally • Branching factors are too high
– Chess: 35
• Based on a worst-case analysis – Go: 361
• Search from start never terminates in non-
• What if this is incorrect? trivial games
Evaluation Functions Desiderata for Evaluation Functions
• Like heuristic functions
• Would like to put the same ordering on nodes (even
• Try to estimate value of a node without if values aren’t totally right)
expanding all the way to termination • Is this a reasonable thing to ask for?
• Using evaluation functions • What if you have a perfect evaluation function?
– Do a depth-limited search • How are evaluation functions made in practice?
– Treat evaluation function as if it were terminal – Buckets
• What’s wrong with this? – Linear combinations
• Chess pieces (material)
• How do you pick the depth? • Board control (positional, strategic)
• How do you manage your time?
• Iterative deepening, quiescence
Search Control Issues
Pruning
• Horizon effects
• The most important search control method is
– Sometimes something interesting is just
figuring out which nodes you don’t need to
beyond the horizon
expand
– How do you know?
• Use the fact that we are doing a worst-case
• When to generate more nodes? analysis to our advantage
• If you selectively extend your frontier, how – Max player cuts off search when he knows min
do you decide where? player can force a provably bad outcome
– Min player cuts of search when he knows max can
• If you have a fixed amount of total game force a provably good (for max) outcome
time, how do you allocate this?
Alpha-beta pruning How to prune
• We still do (bounded) DFS
Max nodes 3
• Expand at least one path to the “bottom”
• If current node is max node, and min can
force a lower value, then prune siblings
Min nodes • If curent node is min node, and max can
2
3 2
force a higher value, then prune siblings
3 12 2 4 15 2
Implementing alpha-beta
Max node pruning
max_value(state, alpha, beta)
if cutoff(state) then return eval(state)
for each s in successors(state) do
2 alpha = max(alpha, min_value(s, alpha, beta))
if alpha >= beta the return beta
end
return alpha
Max nodes
min_value(state, alpha, beta)
2 4 if cutoff(state) then return eval(state)
for each s in successors(state) do
beta = min(alpha, max_value(s, alpha, beta))
if beta <= alpha the return alpha
4 end
return beta
Amazing facts about alpha-beta What About Probabilities?
• Empirically, alpha-beta has the effect of Max nodes
reducing the branching factor by half for
many problems
• This effectively doubles the horizon that Chance
can be searched nodes
• Alpha-beta makes the difference P=0.9
P=0.5 P=0.5 P=0.6 P=0.4 P=0.1
between novice and expert computer
players
Min nodes
Expectiminimax Expectiminimax is nasty
• High branching factor
• n random outcomes per chance node
• Randomness makes evaluation fns difficult
• O(bmnm) time – Hard to predict many steps into future
– Values tend to smear together
– Preserving order is not sufficient
• Pruning is problematic
eminimax(nmax ) = max s∈succesors( n ) eminimax( s ) – Need to prune based upon bound on an
eminimax(nmin ) = min s∈succesors ( n ) eminimax( s ) expectation
eminimax(nchance ) = ∑s∈succesors( n ) eminimax( s) p ( s )
– Need a priori bounds on the evaluation function
Multiplayer Games Conclusions
• Things sort-of generalize • Game tree search is a special kind of search
• We can maintain a vector of possible • Rely heavily on heuristic evaluation functions
values for each player at each node • Alpha-beta is a big win
• Assume that each player acts greedily • Most successful players use alpha-beta
• What’s wrong with this? • Final thought: Tradeoff between search
effort and evaluation function effort
• When is it better to invest in your evaluation
function?
Get documents about "