The Implementation of Machine
Learning in the Game of Checkers
Billy Melicher
Computer Systems lab 08
10/29/08
Abstract
• Machine learning uses past information
to predict future states
• Can be used in any situation where the
past will predict the future
• Will adapt to situations
Introduction
• Checkers is used to
explore machine
learning
• Checkers has many
tactical aspects that
make it good for
studying
Background
• Minimax
• Heuristics
• Learning
Minimax
• Method of adversarial search
• Every pattern(board) can be given a fitness
value(heuristic)
• Each player chooses the outcome that is best
for them from the choices they have
Minimax
Gotten from wiki
Minimax
• Has exponential growth rate
• Can only evaluate a certain number of actions
into the future – ply
Heuristic
• Heuristics predict out come of a board
• Fitness value of board, higher value, better
outcome
• Not perfect
• Requires expertise in the situation to create
Heuristics
• H(s) = c0F0(s) + c1F1(s) + … + cnFn(s)
• H(s) = heuristic
• Has many different terms
• In checkers terms could be:
• Number of checkers
• Number of kings
• Number of checkers on an edge
• How far checkers are on board
Learning by Rote
• Stores every game played
• Connects the moves made for each board
• Relates the moves made from a particular
board to the outcome of the board
• More likely to make moves that result in a
win, less likely to make moves resulting in a
loss
• Good in end game, not as good in mid game
How I store data
I convert each checker board into a 32 digit base 5 number where
each digit corresponds to a playable square and each number
corresponds to what occupies that square.
Learning by Generalization
• Uses a heuristic function to guide moves
• Changes the heuristic function after games
based on the outcome
• Good in mid game but not as good in early
and end games
• Requires identifying the features that affect
game
Development
• Use of minimax algorithm with alpha beta
pruning
• Use of both learning by Rote and
Generalization
• Temporal difference learning
Temporal Difference Learning
• In temporal difference learning, you adjust the
heuristic based on the difference between the
heuristic at one time and at another
• Equilibrium moves toward ideal function
• U(s) <-- U(s) + α( R(s) + γU(s') - U(s))
Temporal Difference Learning
• No proof that prediction closer to the end of
the game will be better but common sense says
it is
• Changes heuristic so that it better predicts the
value of all boards
• Adjusts the weights of the heuristic
Alpha Value
• The alpha value decreases the change of the
heuristic based on how much data you have
• Decreasing returns
• Necessary for ensuring rare occurrences do not
change heuristic too much
Results
• Value of weight reaches equilibrium
• Changes to reflect the learning of the program
• Occasionally requires programmer intervention
when it reaches a false equilibrium
Results
16
14
12
10
8 Value of Weight
6
4
2
0
0 5 10 15 20 25
During the course of a game the value of
this particular weight centers around 10.
Results
• Learning by rote requires a large data set
• Requires large amounts of memory
• Necessary for determining alpha value in
temporal difference learning
Results
120
100
80
Number of Boards in Data Base
60
40
20
0
0.5 1 1.5 2 2.5 3 3.5
• Learning by rote does increase with the
number of games but has decreasing returns
and large amounts of memory