# Evolving Strategies for the Prisoner's Dilemma by yaofenji

VIEWS: 7 PAGES: 30

• pg 1
```									Evolving Strategies for the
Prisoner’s Dilemma

Jennifer Golbeck
University of Maryland, College Park
Department of Computer Science

July 23, 2002
Overview
•   Previous Research
•   Prisoner’s Dilemma
•   The Genetic Algorithm
•   Results
•   Conclusions
Previous Research
Axelrod
• Robert Axelrod’s experiments of the 1980’s
served as the starting point for this research
• Implementation closely adheres to the
configuration of his experiments
• Same model for the Prisoner’s Dilemma
• Minor variation in the implementation of
the Genetic Algorithm
Prisoner’s Dilemma
The Prisoner’s Dilemma Model
• The basic two-player prisoner’s Dilemma
• Both players are arrested for the same crime
• Each has a choice
– Confess - Cooperate with the authorities (admit to
doing the crime)
– Deny - Defect against the other player (claim the other
person is responsible)
• No knowledge of “opponent’s” action
Payoff Matrix
• Optimization
• If both players
cooperate, they each
• If both players Defect,
• If there is a mixed
outcome, the Defector
gets 5 points and the
cooperator gets 0
points
Iterated Game
• In simulation, the endpoint of the game is
unknown to the players, making it essentially an
infinitely iterated game
• Each player has a memory of the previous three
rounds on which to base his strategy
• Strategies are deterministic - for a given history h
players will always make the same move
• With 4 possible configurations in each round and a
history of 3, each strategy is comprised of 43 = 64
moves
Previous Results
• Axelrod tournaments
• Using the three-round history model, teams
submitted strategies to be competed in a round-
robin tournament
• Tit for Tat
• Pavlov strategy, developed after these
tournaments, was shown to be an effective
strategy as well.
The Genetic Algorithm
The Model
•   Darwinian Survival of the Fittest
•   Genetic representation of entities
•   Fitness function
•   Select most fit individuals to reproduce
•   Mutate
•   Traits of most fit will be passed on
•   Over time, the population will evolve to be
more fit, optimal
GA’s and the Prisoner’s Dilemma
• Population: 20 individuals
• Chromosome: 64-bit string where each bit
represents the Cooperate or Defect move
played for a specific strategy
String P ositio n   Re pre se nted Hi story   Move   String P ositio n   Re pre se nted Hi story   Move
0                   CCCCCC                    C      32                  DCCCCC                    D
1                   CCCCCD                    D      33                  DCCCCD                    C
2                   CCCCDC                    D      34                  DCCCDC                    D
3                   CCCCDD                    D      35                  DCCCDD                    D
4                   CCCDCC                    C      36                  DCCDCC                    C
5                   CCCDCD                    C      37                  DCCDCD                    C
6                   CCCDDC                    C      38                  DCCDDC                    D
7                   CCCDDD                    D      39                  DCCDDD                    D

8                   CCDCCC                    C      40                  DCDCCC                    D
9                   CCDCCD                    D      41                  DCDCCD                    C
10                  CCDCDC                    D      42                  DCDCDC                    C
11                  CCDCDD                    D      43                  DCDCDD                    C
12                  CCDDCC                    D      44                  DCDDCC                    C
13                  CCDDCD                    D      45                  DCDDCD                    D
14                  CCDDDC                    C      46                  DCDDDC                    D
15                  CCDDDD                    C      47                  DCDDDD                    C

16                  CDCCCC                    C      48                  DDCCCC                    C
17                  CDCCCD                    C      49                  DDCCCD                    D
18                  CDCCDC                    D      50                  DDCCDC                    C
19                  CDCCDD                    C      51                  DDCCDD                    C
20                  CDCDCC                    C      52                  DDCDCC                    C
21                  CDCDCD                    D      53                  DDCDCD                    C
22                  CDCDDC                    D      54                  DDCDDC                    D
23                  CDCDDD                    C      55                  DDCDDD                    D

24                  CDDCCC                    C      56                  DDDCCC                    C
25                  CDDCCD                    C      57                  DDDCCD                    C
26                  CDDCDC                    D      58                  DDDCDC                    D
27                  CDDCDD                    C      59                  DDDCDD                    D
28                  CDDDCC                    D      60                  DDDDCC                    C
29                  CDDDCD                    C      61                  DDDDCD                    C
30                  CDDDDC                    C      62                  DDDDDC                    D
31                  CDDDDD                    D      63                  DDDDDD                    C
GA’s and PD II
• Fitness: Each player competes against every
other for 64 consecutive rounds, and a
cumulative score is maintained
• Selection:Roulette Wheel selection
• Reproduction: Random point crossover with
replacement
• Mutation rate 0.001

• Generations: 200,000 generations
Simulation and Results
Hypothesis
• Past research has looked at which strategy was
“best”. This research looks as what makes a
“good” strategy.
• Tit for Tat and Pavlov both perform very well,
and share two traits
– Defend against Defectors
– Cooperate with other cooperators
Hypothesis

• All populations evolve over time to possess
and exhibit these two traits

• This behavior evolves regardless of the
initial makeup of the population
Experiment I
• Five Initial Populations
–   All “Always Cooperate (Confess)” (AllC)
–   All “Always Defect (Deny)” (AllD)
–   All Tit for Tat
–   All Pavolv
–   All Randomly generated (independently)
Experiment II
• Controls: Tit for Tat and Pavolv
– Statistically equal performance
• Support the hypothesis by showing:
– Traits are not present in other initial
populations
– Over time, populations evolve to exhibit those
traits and perform as well as Tit For Tat and
Pavlov
Experiment II
• To show that the hypothesized traits evolve,
populations must demonstrate
– In the presence of Defectors, evolved populations
perform identically to the controls
– In the presence of cooperators, evolved
populations perform identically to controls
Part 1:Defend Against Defectors I

• Mix each initial population with a small set of AllD
– Tit for Tat and Pavolv (controls) perform at about 80% of
maximum
– All others perform significantly worse that Tit For Tat and
Pavolv
– AllC and Random populations perform significantly worse
than their normal behavior
– This shows that a priori, the AllC and random populations
cannot defend against Defectors
Part 1: Defend against Defectors II

• Evolve each population and then mix with
small set of AllD
– All populations now perform equally as well as
each other, and as well as the TFT and Pavlov
controls
– Fitness at about 80% maximum
Part 2: Cooperate with Cooperators
• As before, each startup population is mixed with a
small set of AllC
– TFT, Pavlov, do very well
– AllC does exceptionally well
– Others do significantly worse
• Evolve and then add AllC
– All populations perform equally as well as each other
– Identical performance to TFT and Pavlov
Performance of Different Experiments
0.9

0.85

0.8

0.75

0.7

0.65

0.6

0.55

0.5

0.45

0.4
Unevolved Population Unevolved Population Evolved Population    Evolved Population Evolved Population
inoculated w ith AllC inoculated w ith AllD                    inoculated w ith AllC inoculated w ith AllD

Pavlov          Tit For Tat         Random             AllC            AllD
Conclusions
Conclusions I
• Performance measures show that AllC, AllD,
and random populations do not generally
possess defensive or cooperative traits a
priori
• After evolution, all populations have changed
to incorporate both traits
• Evolved strategies perform as well as TFT
Conclusions II
• In both experiments there is no statistical
difference between the performance of
evolved populations before and after the
introduction of AllC or AllD players
• Indicates that not only do the populations
exhibit hypothesized traits in experimental
conditions, but it is their normal behavior to
do so.
Future Work
Non-deterministic Players
• This work shows results for players with
deterministic strategies
• Much previous research has been done on
stochastic strategies
• Preliminary results show that the results
presented here apply to stochastic strategies as
well, but a formal study is necessary.

```
To top