Evolving Strategies for the Prisoner's Dilemma by yaofenji


									Evolving Strategies for the
  Prisoner’s Dilemma

           Jennifer Golbeck
 University of Maryland, College Park
  Department of Computer Science

            July 23, 2002
•   Previous Research
•   Prisoner’s Dilemma
•   The Genetic Algorithm
•   Results
•   Conclusions
Previous Research
• Robert Axelrod’s experiments of the 1980’s
  served as the starting point for this research
• Implementation closely adheres to the
  configuration of his experiments
• Same model for the Prisoner’s Dilemma
• Minor variation in the implementation of
  the Genetic Algorithm
Prisoner’s Dilemma
 The Prisoner’s Dilemma Model
• The basic two-player prisoner’s Dilemma
• Both players are arrested for the same crime
• Each has a choice
   – Confess - Cooperate with the authorities (admit to
     doing the crime)
   – Deny - Defect against the other player (claim the other
     person is responsible)
• No knowledge of “opponent’s” action
                  Payoff Matrix
• Optimization
• If both players
  cooperate, they each
  receive 3 points
• If both players Defect,
  each receives 1 point
• If there is a mixed
  outcome, the Defector
  gets 5 points and the
  cooperator gets 0
               Iterated Game
• In simulation, the endpoint of the game is
  unknown to the players, making it essentially an
  infinitely iterated game
• Each player has a memory of the previous three
  rounds on which to base his strategy
• Strategies are deterministic - for a given history h
  players will always make the same move
• With 4 possible configurations in each round and a
  history of 3, each strategy is comprised of 43 = 64
              Previous Results
• Axelrod tournaments
• Using the three-round history model, teams
  submitted strategies to be competed in a round-
  robin tournament
• Tit for Tat
• Pavlov strategy, developed after these
  tournaments, was shown to be an effective
  strategy as well.
The Genetic Algorithm
                   The Model
•   Darwinian Survival of the Fittest
•   Genetic representation of entities
•   Fitness function
•   Select most fit individuals to reproduce
•   Mutate
•   Traits of most fit will be passed on
•   Over time, the population will evolve to be
    more fit, optimal
   GA’s and the Prisoner’s Dilemma
• Population: 20 individuals
• Chromosome: 64-bit string where each bit
  represents the Cooperate or Defect move
  played for a specific strategy
String P ositio n   Re pre se nted Hi story   Move   String P ositio n   Re pre se nted Hi story   Move
0                   CCCCCC                    C      32                  DCCCCC                    D
1                   CCCCCD                    D      33                  DCCCCD                    C
2                   CCCCDC                    D      34                  DCCCDC                    D
3                   CCCCDD                    D      35                  DCCCDD                    D
4                   CCCDCC                    C      36                  DCCDCC                    C
5                   CCCDCD                    C      37                  DCCDCD                    C
6                   CCCDDC                    C      38                  DCCDDC                    D
7                   CCCDDD                    D      39                  DCCDDD                    D

8                   CCDCCC                    C      40                  DCDCCC                    D
9                   CCDCCD                    D      41                  DCDCCD                    C
10                  CCDCDC                    D      42                  DCDCDC                    C
11                  CCDCDD                    D      43                  DCDCDD                    C
12                  CCDDCC                    D      44                  DCDDCC                    C
13                  CCDDCD                    D      45                  DCDDCD                    D
14                  CCDDDC                    C      46                  DCDDDC                    D
15                  CCDDDD                    C      47                  DCDDDD                    C

16                  CDCCCC                    C      48                  DDCCCC                    C
17                  CDCCCD                    C      49                  DDCCCD                    D
18                  CDCCDC                    D      50                  DDCCDC                    C
19                  CDCCDD                    C      51                  DDCCDD                    C
20                  CDCDCC                    C      52                  DDCDCC                    C
21                  CDCDCD                    D      53                  DDCDCD                    C
22                  CDCDDC                    D      54                  DDCDDC                    D
23                  CDCDDD                    C      55                  DDCDDD                    D

24                  CDDCCC                    C      56                  DDDCCC                    C
25                  CDDCCD                    C      57                  DDDCCD                    C
26                  CDDCDC                    D      58                  DDDCDC                    D
27                  CDDCDD                    C      59                  DDDCDD                    D
28                  CDDDCC                    D      60                  DDDDCC                    C
29                  CDDDCD                    C      61                  DDDDCD                    C
30                  CDDDDC                    C      62                  DDDDDC                    D
31                  CDDDDD                    D      63                  DDDDDD                    C
            GA’s and PD II
• Fitness: Each player competes against every
  other for 64 consecutive rounds, and a
  cumulative score is maintained
• Selection:Roulette Wheel selection
• Reproduction: Random point crossover with
• Mutation rate 0.001

• Generations: 200,000 generations
Simulation and Results
• Past research has looked at which strategy was
  “best”. This research looks as what makes a
  “good” strategy.
• Tit for Tat and Pavlov both perform very well,
  and share two traits
  – Defend against Defectors
  – Cooperate with other cooperators

• All populations evolve over time to possess
  and exhibit these two traits

• This behavior evolves regardless of the
  initial makeup of the population
                      Experiment I
• Five Initial Populations
   –   All “Always Cooperate (Confess)” (AllC)
   –   All “Always Defect (Deny)” (AllD)
   –   All Tit for Tat
   –   All Pavolv
   –   All Randomly generated (independently)
              Experiment II
• Controls: Tit for Tat and Pavolv
  – Statistically equal performance
• Support the hypothesis by showing:
  – Traits are not present in other initial
  – Over time, populations evolve to exhibit those
    traits and perform as well as Tit For Tat and
               Experiment II
• To show that the hypothesized traits evolve,
  populations must demonstrate
  – In the presence of Defectors, evolved populations
    perform identically to the controls
  – In the presence of cooperators, evolved
    populations perform identically to controls
      Part 1:Defend Against Defectors I

• Mix each initial population with a small set of AllD
   – Tit for Tat and Pavolv (controls) perform at about 80% of
   – All others perform significantly worse that Tit For Tat and
   – AllC and Random populations perform significantly worse
     than their normal behavior
   – This shows that a priori, the AllC and random populations
     cannot defend against Defectors
  Part 1: Defend against Defectors II

• Evolve each population and then mix with
  small set of AllD
  – All populations now perform equally as well as
    each other, and as well as the TFT and Pavlov
  – Fitness at about 80% maximum
    Part 2: Cooperate with Cooperators
• As before, each startup population is mixed with a
  small set of AllC
   – TFT, Pavlov, do very well
   – AllC does exceptionally well
   – Others do significantly worse
• Evolve and then add AllC
   – All populations perform equally as well as each other
   – Identical performance to TFT and Pavlov
       Performance of Different Experiments










       Unevolved Population Unevolved Population Evolved Population    Evolved Population Evolved Population
       inoculated w ith AllC inoculated w ith AllD                    inoculated w ith AllC inoculated w ith AllD

                  Pavlov          Tit For Tat         Random             AllC            AllD
               Conclusions I
• Performance measures show that AllC, AllD,
  and random populations do not generally
  possess defensive or cooperative traits a
• After evolution, all populations have changed
  to incorporate both traits
• Evolved strategies perform as well as TFT
  and Pavlov, traditional “best” strategies
                Conclusions II
• In both experiments there is no statistical
  difference between the performance of
  evolved populations before and after the
  introduction of AllC or AllD players
• Indicates that not only do the populations
  exhibit hypothesized traits in experimental
  conditions, but it is their normal behavior to
  do so.
Future Work
      Non-deterministic Players
• This work shows results for players with
  deterministic strategies
• Much previous research has been done on
  stochastic strategies
• Preliminary results show that the results
  presented here apply to stochastic strategies as
  well, but a formal study is necessary.

To top