Repeated Games Prisoner's Dilemma

W
Shared by:
Categories
Tags
-
Stats
views:
11
posted:
8/27/2011
language:
English
pages:
10
Document Sample

```							                    Lecture 9

Repeated Games: Prisoner’s Dilemma

BESS/TSM
EC4010 - Economic Theory - Module 2 - Game Theory

Pedro C. Vicente
Trinity College Dublin
Introduction
Prisoner’s Dilemma

 We now consider a setting in which players repeatedly engage in the same
strategic form game
 We focus on the Prisoner’s Dilemma as follows (C for cooperate; D for
defect):
Player     2
C      D
Player C     2, 2    0, 3
1     D    3, 0    1, 1

 Consider the following strategy (grim trigger strategy):
 choose C to begin and then as long as the other player chooses C
 if in any period the other player chooses D, then choose D in every
subsequent period

2
Introduction
NE

 How should a player respond if her opponent uses this strategy?
 If she chooses C in every period, then the outcome is (C,C) and her
payoff is 2 in every period
 If she switches to D, she obtains a payoff of 3 in that period (a
short-term gain) and a payoff of 1 in every subsequent period (a long
term loss)
 As long as she values the future sufficiently, the stream of payoffs
(3,1,1,...) is worse than (2,2,2,...), so that she is better off choosing C
in every period (C at all periods is a best response)
 The same grim-trigger strategy is also a best response (same outcome
as when using always C); thus grim-trigger is a NE when players are
patient
 Another NE is always D (think of best response)

3
Repeated Games
Preferences

 The outcome of a repeated game is a sequence of outcomes of a strategic
form game
 How does a player evaluate such sequences? We assume the evaluation of
each sequence of outcomes in the repeated game by the discounted sum of
the associated sequence of payoffs
 Each player i has a payoff function u i for the strategic form game and a
discount factor  i between 0 and 1 such that she evaluates the sequence
a 1 , a 2 , . . . , a T of outcomes (a t is the action profile at time t) of the
strategic form game by the sum
T
ui a   1
 iui a   2
   2ui
i     a   3
. . .  T 1 u i
i         a   T
          ti 1 u i a t
t1
 If  i is close to 0, then player i cares very little about the future - she is
very impatient; we assume all players have the same discount factor:
 i   for all i

4
Repeated Games
Preferences (cont’ed)

 Take an infinite stream of payoffs w 1 , w 2 , . . . ; since we can define a
constant at all periods to be c; we can write the one that makes us
Ý
indifferent to the given stream of payoffs as 1  
c
 ti 1 w t
t1
Ý
 We call 1                    ti 1 w t the discounted average of the stream
t1
w1, w2, . . .

5
Repeated Games
Repeated Games

Definition (Repeated Game): Let G be a strategic game; denote the set of
players by N and the set of actions and payoff function of each player i by A i
and u i respectively; the T-period (encompassing the case of T  Ý) repeated
game of G for the discount factor  is the extensive game with perfect
information and simultaneous moves in which:
 The set of player is N
 The set of terminal histories is the set of sequences a 1 , a 2 , . . . , a T of
action profiles in G
 The player function assigns the set of all players to every history
a 1 , . . . , a t , for every value of t
 The set of actions available to any player i after any history is A i
 Each player i evaluates each terminal history a 1 , a 2 , . . . , a T according to
T
its discounted average 1                    ti 1 u i a t
t1

6
Repeated Games
Finitely Repeated Prisoner’s Dilemma

 Nash Equilibrium: Every NE of a finitely repeated PD generates the
outcome (D,D) in every period
 Playing C at any point is always a worse response, provided the game is
finitely repeated and there is not always an opportunity to punish defection
(consider deviating at the last given C)
 SPE: since every SPE is a NE and there is only one NE, we know that is a
SPE

7
Repeated Games
Strategies in Infinitely Repeated Prisoner’s Dilemma
 A strategy of player i in an infinitely repeated game of the strategic form
game G specifies an action of player i (a member of A i ) for every sequence
a 1 , . . . , a t of outcomes of G
 The grim-trigger strategy for an infinitely repeated PD is defined as
C if a 1 , . . . , a tj  C, . . . , C
j
s i /  C and s i a 1 , . . . , a t                                              , for every
D otherwise
history a 1 , . . . , a t , where j is the other player
 we can think of this strategy as having two states: one, call it C, in
which C is chosen; another, call it D in which D is chosen; initially
the state is C; if when the state is C, the other player chooses D, then
the state changes to D
 Tit-for-tat strategy: the length of punishment depends on the behavior of
the player being punished; if she continues to do D, then tit-for-tat
continues to do so; if she reverts to C, then tit-for-tat reverts to C also

8
Repeated Games
NE in Infinitely Repeated Prisoner’s Dilemma

 (D,D) still a NE at the infinitely repeated Prisoner’s Dilemma
 Grim-trigger strategies:
 Suppose that player 1 uses the grim-trigger strategy; if player 2 uses
the same strategy, then the outcome is (C,C) in every period, with
stream 2, 2, . . . , with discounted average 2
 If player 2 adopts a strategy that generates a different sequence of
outcomes, then in at least one period her action is D; in all subsequent
periods, 1 chooses D, so 2 goes for D subsequently as well (best
response); meaning (3,1,1,...) from the first period in which 2 chooses
D,            whose              discounted           average         is
1  3     2   3 . . .  1  3  1  3 1   ;
thus player 2 cannot increase her payoff by deviating if
3 1   2                 1
2
 This is the condition for the grim-trigger strategies to be a NE

9
Repeated Games
NE in Infinitely Repeated Prisoner’s Dilemma (cont’ed)

 Tit-for-tat strategies:
 Suppose that player 1 adheres to this strategy; denote by t the first
period in which player 2 chooses D (then player 1 chooses D in
period t  1, and continues to choose D until player 2 reverts to C);
then player 2 has two options from t  1: she can revert to C, in which
case in period t  2 she faces the same situation she faced at the start
of the game, or she can continue to choose D, in which case player 1
will continue to do so too; if player 2’s best response to tit-for-tat
implies choosing D at some period, then she either alternates between
D and C, or chooses D in every period
 Alternating: stream (3,0,3,0,...), with discounted average
1  1 3 2  1  3

 Always D: stream (3,1,1,...), with discounted average 3 1   
 Tit-for-tat is equilibrium if 2    3
1
and 2 3 2; both conditions
are equivalent to       1
2
10

```
Related docs
Other docs by yaofenjin