Try the all-new QuickBooks Online for FREE.  No credit card required.

Evolving Strategies for the Prisoner's Dilemma

Document Sample
Evolving Strategies for the Prisoner's Dilemma Powered By Docstoc
					Evolving Strategies for the
  Prisoner’s Dilemma

           Jennifer Golbeck
 University of Maryland, College Park
  Department of Computer Science

            July 23, 2002
•   Previous Research
•   Prisoner’s Dilemma
•   The Genetic Algorithm
•   Results
•   Conclusions
Previous Research
• Robert Axelrod’s experiments of the 1980’s
  served as the starting point for this research
• Implementation closely adheres to the
  configuration of his experiments
• Same model for the Prisoner’s Dilemma
• Minor variation in the implementation of
  the Genetic Algorithm
Prisoner’s Dilemma
 The Prisoner’s Dilemma Model
• The basic two-player prisoner’s Dilemma
• Both players are arrested for the same crime
• Each has a choice
   – Confess - Cooperate with the authorities (admit to
     doing the crime)
   – Deny - Defect against the other player (claim the other
     person is responsible)
• No knowledge of “opponent’s” action
                  Payoff Matrix
• Optimization
• If both players
  cooperate, they each
  receive 3 points
• If both players Defect,
  each receives 1 point
• If there is a mixed
  outcome, the Defector
  gets 5 points and the
  cooperator gets 0
               Iterated Game
• In simulation, the endpoint of the game is
  unknown to the players, making it essentially an
  infinitely iterated game
• Each player has a memory of the previous three
  rounds on which to base his strategy
• Strategies are deterministic - for a given history h
  players will always make the same move
• With 4 possible configurations in each round and a
  history of 3, each strategy is comprised of 43 = 64
              Previous Results
• Axelrod tournaments
• Using the three-round history model, teams
  submitted strategies to be competed in a round-
  robin tournament
• Tit for Tat
• Pavlov strategy, developed after these
  tournaments, was shown to be an effective
  strategy as well.
The Genetic Algorithm
                   The Model
•   Darwinian Survival of the Fittest
•   Genetic representation of entities
•   Fitness function
•   Select most fit individuals to reproduce
•   Mutate
•   Traits of most fit will be passed on
•   Over time, the population will evolve to be
    more fit, optimal
   GA’s and the Prisoner’s Dilemma
• Population: 20 individuals
• Chromosome: 64-bit string where each bit
  represents the Cooperate or Defect move
  played for a specific strategy
String P ositio n   Re pre se nted Hi story   Move   String P ositio n   Re pre se nted Hi story   Move
0                   CCCCCC                    C      32                  DCCCCC                    D
1                   CCCCCD                    D      33                  DCCCCD                    C
2                   CCCCDC                    D      34                  DCCCDC                    D
3                   CCCCDD                    D      35                  DCCCDD                    D
4                   CCCDCC                    C      36                  DCCDCC                    C
5                   CCCDCD                    C      37                  DCCDCD                    C
6                   CCCDDC                    C      38                  DCCDDC                    D
7                   CCCDDD                    D      39                  DCCDDD                    D

8                   CCDCCC                    C      40                  DCDCCC                    D
9                   CCDCCD                    D      41                  DCDCCD                    C
10                  CCDCDC                    D      42                  DCDCDC                    C
11                  CCDCDD                    D      43                  DCDCDD                    C
12                  CCDDCC                    D      44                  DCDDCC                    C
13                  CCDDCD                    D      45                  DCDDCD                    D
14                  CCDDDC                    C      46                  DCDDDC                    D
15                  CCDDDD                    C      47                  DCDDDD                    C

16                  CDCCCC                    C      48                  DDCCCC                    C
17                  CDCCCD                    C      49                  DDCCCD                    D
18                  CDCCDC                    D      50                  DDCCDC                    C
19                  CDCCDD                    C      51                  DDCCDD                    C
20                  CDCDCC                    C      52                  DDCDCC                    C
21                  CDCDCD                    D      53                  DDCDCD                    C
22                  CDCDDC                    D      54                  DDCDDC                    D
23                  CDCDDD                    C      55                  DDCDDD                    D

24                  CDDCCC                    C      56                  DDDCCC                    C
25                  CDDCCD                    C      57                  DDDCCD                    C
26                  CDDCDC                    D      58                  DDDCDC                    D
27                  CDDCDD                    C      59                  DDDCDD                    D
28                  CDDDCC                    D      60                  DDDDCC                    C
29                  CDDDCD                    C      61                  DDDDCD                    C
30                  CDDDDC                    C      62                  DDDDDC                    D
31                  CDDDDD                    D      63                  DDDDDD                    C
            GA’s and PD II
• Fitness: Each player competes against every
  other for 64 consecutive rounds, and a
  cumulative score is maintained
• Selection:Roulette Wheel selection
• Reproduction: Random point crossover with
• Mutation rate 0.001

• Generations: 200,000 generations
Simulation and Results
• Past research has looked at which strategy was
  “best”. This research looks as what makes a
  “good” strategy.
• Tit for Tat and Pavlov both perform very well,
  and share two traits
  – Defend against Defectors
  – Cooperate with other cooperators

• All populations evolve over time to possess
  and exhibit these two traits

• This behavior evolves regardless of the
  initial makeup of the population
                      Experiment I
• Five Initial Populations
   –   All “Always Cooperate (Confess)” (AllC)
   –   All “Always Defect (Deny)” (AllD)
   –   All Tit for Tat
   –   All Pavolv
   –   All Randomly generated (independently)
              Experiment II
• Controls: Tit for Tat and Pavolv
  – Statistically equal performance
• Support the hypothesis by showing:
  – Traits are not present in other initial
  – Over time, populations evolve to exhibit those
    traits and perform as well as Tit For Tat and
               Experiment II
• To show that the hypothesized traits evolve,
  populations must demonstrate
  – In the presence of Defectors, evolved populations
    perform identically to the controls
  – In the presence of cooperators, evolved
    populations perform identically to controls
      Part 1:Defend Against Defectors I

• Mix each initial population with a small set of AllD
   – Tit for Tat and Pavolv (controls) perform at about 80% of
   – All others perform significantly worse that Tit For Tat and
   – AllC and Random populations perform significantly worse
     than their normal behavior
   – This shows that a priori, the AllC and random populations
     cannot defend against Defectors
  Part 1: Defend against Defectors II

• Evolve each population and then mix with
  small set of AllD
  – All populations now perform equally as well as
    each other, and as well as the TFT and Pavlov
  – Fitness at about 80% maximum
    Part 2: Cooperate with Cooperators
• As before, each startup population is mixed with a
  small set of AllC
   – TFT, Pavlov, do very well
   – AllC does exceptionally well
   – Others do significantly worse
• Evolve and then add AllC
   – All populations perform equally as well as each other
   – Identical performance to TFT and Pavlov
       Performance of Different Experiments










       Unevolved Population Unevolved Population Evolved Population    Evolved Population Evolved Population
       inoculated w ith AllC inoculated w ith AllD                    inoculated w ith AllC inoculated w ith AllD

                  Pavlov          Tit For Tat         Random             AllC            AllD
               Conclusions I
• Performance measures show that AllC, AllD,
  and random populations do not generally
  possess defensive or cooperative traits a
• After evolution, all populations have changed
  to incorporate both traits
• Evolved strategies perform as well as TFT
  and Pavlov, traditional “best” strategies
                Conclusions II
• In both experiments there is no statistical
  difference between the performance of
  evolved populations before and after the
  introduction of AllC or AllD players
• Indicates that not only do the populations
  exhibit hypothesized traits in experimental
  conditions, but it is their normal behavior to
  do so.
Future Work
      Non-deterministic Players
• This work shows results for players with
  deterministic strategies
• Much previous research has been done on
  stochastic strategies
• Preliminary results show that the results
  presented here apply to stochastic strategies as
  well, but a formal study is necessary.

Shared By: