Repeated Games Prisoner's Dilemma
Document Sample


Lecture 9
Repeated Games: Prisoner’s Dilemma
BESS/TSM
EC4010 - Economic Theory - Module 2 - Game Theory
Pedro C. Vicente
Trinity College Dublin
Introduction
Prisoner’s Dilemma
We now consider a setting in which players repeatedly engage in the same
strategic form game
We focus on the Prisoner’s Dilemma as follows (C for cooperate; D for
defect):
Player 2
C D
Player C 2, 2 0, 3
1 D 3, 0 1, 1
Consider the following strategy (grim trigger strategy):
choose C to begin and then as long as the other player chooses C
if in any period the other player chooses D, then choose D in every
subsequent period
2
Introduction
NE
How should a player respond if her opponent uses this strategy?
If she chooses C in every period, then the outcome is (C,C) and her
payoff is 2 in every period
If she switches to D, she obtains a payoff of 3 in that period (a
short-term gain) and a payoff of 1 in every subsequent period (a long
term loss)
As long as she values the future sufficiently, the stream of payoffs
(3,1,1,...) is worse than (2,2,2,...), so that she is better off choosing C
in every period (C at all periods is a best response)
The same grim-trigger strategy is also a best response (same outcome
as when using always C); thus grim-trigger is a NE when players are
patient
Another NE is always D (think of best response)
3
Repeated Games
Preferences
The outcome of a repeated game is a sequence of outcomes of a strategic
form game
How does a player evaluate such sequences? We assume the evaluation of
each sequence of outcomes in the repeated game by the discounted sum of
the associated sequence of payoffs
Each player i has a payoff function u i for the strategic form game and a
discount factor i between 0 and 1 such that she evaluates the sequence
a 1 , a 2 , . . . , a T of outcomes (a t is the action profile at time t) of the
strategic form game by the sum
T
ui a 1
iui a 2
2ui
i a 3
. . . T 1 u i
i a T
ti 1 u i a t
t1
If i is close to 0, then player i cares very little about the future - she is
very impatient; we assume all players have the same discount factor:
i for all i
4
Repeated Games
Preferences (cont’ed)
Take an infinite stream of payoffs w 1 , w 2 , . . . ; since we can define a
constant at all periods to be c; we can write the one that makes us
Ý
indifferent to the given stream of payoffs as 1
c
ti 1 w t
t1
Ý
We call 1 ti 1 w t the discounted average of the stream
t1
w1, w2, . . .
5
Repeated Games
Repeated Games
Definition (Repeated Game): Let G be a strategic game; denote the set of
players by N and the set of actions and payoff function of each player i by A i
and u i respectively; the T-period (encompassing the case of T Ý) repeated
game of G for the discount factor is the extensive game with perfect
information and simultaneous moves in which:
The set of player is N
The set of terminal histories is the set of sequences a 1 , a 2 , . . . , a T of
action profiles in G
The player function assigns the set of all players to every history
a 1 , . . . , a t , for every value of t
The set of actions available to any player i after any history is A i
Each player i evaluates each terminal history a 1 , a 2 , . . . , a T according to
T
its discounted average 1 ti 1 u i a t
t1
6
Repeated Games
Finitely Repeated Prisoner’s Dilemma
Nash Equilibrium: Every NE of a finitely repeated PD generates the
outcome (D,D) in every period
Playing C at any point is always a worse response, provided the game is
finitely repeated and there is not always an opportunity to punish defection
(consider deviating at the last given C)
SPE: since every SPE is a NE and there is only one NE, we know that is a
SPE
7
Repeated Games
Strategies in Infinitely Repeated Prisoner’s Dilemma
A strategy of player i in an infinitely repeated game of the strategic form
game G specifies an action of player i (a member of A i ) for every sequence
a 1 , . . . , a t of outcomes of G
The grim-trigger strategy for an infinitely repeated PD is defined as
C if a 1 , . . . , a tj C, . . . , C
j
s i / C and s i a 1 , . . . , a t , for every
D otherwise
history a 1 , . . . , a t , where j is the other player
we can think of this strategy as having two states: one, call it C, in
which C is chosen; another, call it D in which D is chosen; initially
the state is C; if when the state is C, the other player chooses D, then
the state changes to D
Tit-for-tat strategy: the length of punishment depends on the behavior of
the player being punished; if she continues to do D, then tit-for-tat
continues to do so; if she reverts to C, then tit-for-tat reverts to C also
8
Repeated Games
NE in Infinitely Repeated Prisoner’s Dilemma
(D,D) still a NE at the infinitely repeated Prisoner’s Dilemma
Grim-trigger strategies:
Suppose that player 1 uses the grim-trigger strategy; if player 2 uses
the same strategy, then the outcome is (C,C) in every period, with
stream 2, 2, . . . , with discounted average 2
If player 2 adopts a strategy that generates a different sequence of
outcomes, then in at least one period her action is D; in all subsequent
periods, 1 chooses D, so 2 goes for D subsequently as well (best
response); meaning (3,1,1,...) from the first period in which 2 chooses
D, whose discounted average is
1 3 2 3 . . . 1 3 1 3 1 ;
thus player 2 cannot increase her payoff by deviating if
3 1 2 1
2
This is the condition for the grim-trigger strategies to be a NE
9
Repeated Games
NE in Infinitely Repeated Prisoner’s Dilemma (cont’ed)
Tit-for-tat strategies:
Suppose that player 1 adheres to this strategy; denote by t the first
period in which player 2 chooses D (then player 1 chooses D in
period t 1, and continues to choose D until player 2 reverts to C);
then player 2 has two options from t 1: she can revert to C, in which
case in period t 2 she faces the same situation she faced at the start
of the game, or she can continue to choose D, in which case player 1
will continue to do so too; if player 2’s best response to tit-for-tat
implies choosing D at some period, then she either alternates between
D and C, or chooses D in every period
Alternating: stream (3,0,3,0,...), with discounted average
1 1 3 2 1 3
Always D: stream (3,1,1,...), with discounted average 3 1
Tit-for-tat is equilibrium if 2 3
1
and 2 3 2; both conditions
are equivalent to 1
2
10
Get documents about "