; Playing the rock-paper-scissors
Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Playing the rock-paper-scissors

VIEWS: 77 PAGES: 5

  • pg 1
									Playing the Rock-Paper-Scissors Game with a Genetic Algorithm
Fathelalem F. AM, Zensho Nakaot, Yen-Wei Chenf
*Department of Management & Information Systems Faculty of International Studies, Meio University Nago-shi, Okinawa 905-8585, Japan Phone: (+81) 98-051-1207 Email: ali@mis.meio-u.ac.jp +Department of Electrical & Electronics Engineering Faculty of Engineering, University of the Ryukyus Okinawa 903-0213, Japan
Abstract This paper describes a strategy to follow whilst playing the Rock-Paper-Scissors game. Instead of making a biased decision, a rule is adopted where the outcomings of the game in the last few turns are observed and then a deterministic decision is made. Such a strategy is encoded into a genetic string and a genetic algorithm (GA) works on a population of such strings. Good strings are produced at later generations. Such strategy is found to be successful, and its efficiency is demonstrated by testing the strategy against systematic, as well as human strategies. 1. introduction Many concepts and examples in game theory can provide good models in constructing abstract evolutionary systems. Though game theory was originally developed by V. Neuman and Morgenstern [13] for application to economic theory, it has spread later to many other disciplines. Maynark-Smith and Price [17] have opened the door to the wide use of game theory in evolutionary ecology. In our current work, we construct an evolutionary system to be applied to the Rock-Paper-Scissors (RPS) game. The Rock-Paper-Scissors is a classical twoperson simple game to quickly decide on a winner. It is a game that children as well as adults play, mathematicians analyze, and a certain species of lizard in California takes very seriously [14]. We use a genetic algorithm [1][2] to train a player, that makes use of the historical behavior of the opponent during the past few games to guide its current decision. The Rock-Paper-Scissors is a good model for experimental and theoretical investigations of cooper ative short memory behavior.

2. rock-paper-scissors rule
In its simplest form, each of two players has a choice of ScZssors, Paper, or Rock. The two players, simultaneously make a choice each. Depending on the two players’ choices, a winner is decided according to the rule in Table 1. Tabel 1 Player A Scissors Scissors Paper Player B Paper Rock Winner Player A Player B Player A

Player A and Player B face each other and simultaneously display their hands in one of the following three shapes: a fist denoting a rock, the forefinger and middle finger extended and spread so as to suggest scissors, or a downward facing palm denoting a sheet of paper. As in Table 1, the rock wins over the scissors since it can shatter them, the scissors win over the paper since they can cut it, and the paper wins over the rock since it can be wrapped around the latter. The winner is awarded though there is no award in the case of a tie. If the game is repeated several times, the player who favors one of the options over the others, places himself at a disadvantage. The best strategy for each player is to play each of the options with the same frequency of 1/3 in a manner that will yield the opponent as little information as possible about any particular decision. The game is made more interesting by playing it repeatedly with the same player or a group of players, thereby permitting partial time histories of behavior to guide future trials.

0-7803-6375-2/00/$10.00 02000 IEEE.

74 1

3. the ga strategy

If we want the computer to Play the game, we assume a smart strategy would have two featuring aspects:
e Offensive aspect:

gathering information about the favorites of the opponent during course of the game.

Defensive aspect: Giving little as possible information about the computer's particular decision

at the beginning of the game is indeterminate. To get around this problem, we may add three letters to the coding to specify a strategy's premises or assumptions about pre-game behavior. These three letters would be used sequentially to specify the assumed behavior of the opponent prior t o the beginning of the game, and then later to keep %long actual history of the opponent behavior. Thus, a 34-ternary string would represent a particular strategy with 27 ternaries for decision, four ternaries for the caprice tendency, and three as history-reservoir (Figure 1).

[3]: allowing the decision rule to depend mainly
three possibilities: Scissors (S), Paper (P), or Rock (R). So, the particular behavior sequence can be coded as a three-letter string. For example, SPR would represent the sequence where the opponent chose Scissors, Paper, and Rock in the last three games. Treating this code as a base 3 integer, the behavioral alphabet is coded as S = 0, P = 1 , R = 2. By doing so, the three-letter sequence can be used to generate a number between 0 and 26 . Hence, three consecutive scissors choices of the o p ponent would decode to 0 ( O O O ) , while three consecutive rocks would decode to 26 (222). so, we can define a particular strategy to be a ternary string of length 27. The z-th 0,1, or 2 corresponds to the i-th behavioral sequence. Using this scheme, for example, a 1 in position 12 would be decoded as 210 --+ Paper. Such strategy is based on the information gathered .from the opponent behavior in the previous trials of the game (i.e. enforce offensive-aspec t). We go further and add some capricious features to our strategy to enforce the defensive-aspect of the strategy. w e assume that would counteract the opponent trial to trace the strategy logic, and at the same time act as a counter measurement towards subtle behavior of the opponent. We add four more letters to the string to express the caprace The value decoded from the four-letter C is used to calculate the probability (f'cap+ice) of taking a capricious random decision, rather than a deterministic decision encoded in the string strategY.
Peaprice

1-

1-

I

2

0

1

. - . .

1

0

1

2

1

0

0

2

Figure 1: A strategy string

4. the genetic operators

An elitist selection scheme is used for selection of Parents, where strings with high fitness are inserted in the next generation without undergoing further genetic Operations* On the Other strings with lower fitness are replaced by offsprings reproduced from the strings with higher fitness. For the crossover operation, a uniform crossover where an offspring is generated randomly ternary, by ternmy from two Parents.
Was

In mutation, a bit-wise mutation is applied with small Probability; a ternary is mutated to one of the two other values. For example, a 0-ternary is mutated to 1 or 2 (W. by flipping a coin)-

ss mechanics of the

to

=a x

c

(a is a scaling factor) ( a << 1)

In the computer simulation, we set a tournament of two games; a training game and a one-to-one game. the training game, the GA a randomly generated string (RqlayeT) to play against an opponent (0-player) (e.g. human). Prior to the beginning of the game, the game course is decided. The GA generates a population of strings randomly, then the game starts between R-player and Oglayer. For each choice of the 0-player, in addition to the Rqlayer, the GA makes a choice for each string in the population. The process is repeated for game-course times. Then to each string in the population, a fitness value is attached which is calculated as percentage of wins to the summation of wins and losses. Following the step, the

Since the set of rules generated by a 31-ternary string depends on the past three plays, behavior

742

GA reproduces a new population through application of genetic operators on the current population. The game continues for a specified number of generations. The strategy encoded in the string with the best fitness is then adopted for future games between 0-player and GA-player.
W

In the second one-to-one game, the 0-player plays against the GA-player which adopts the best strategy obtained through the training game. The game goes on for several times and the player with the least losses and most wins is announced as the WINNER. During the one-to-one game, the decoded value of the caprice C is scaled by a factor A. A is proportional to number of successive loses of the GA strategy during game.

f
U

10

20
Generations

30

40

50

Figure 2: Best & average fitness through generations (Training game, GA U S Systematic player)

Where A is the number of successive loses of the GA. The algorithm below shows mechanics of the genetic algorithm during the training game: Begin t=O c=o Initialize P(t)' While ( t # mazimum-generationn) D o While ( c # game-course) D o 0-player and R-player makes a choice each. The GA makes a choice for each string in the population. Calculate Scores c = c +1 End c=o Evaluate P ( t ) Select + ( t ) fiom P ( t ) Select i ; ( t ) fiom P(t) Crossove: F ( t ) Mutate P c t ) Evaluate P ( t ) P ( t I ) = P ( t ) UP ( t ) t=t+l End End

set of A (A < p ) elitist individuals. P ( t ) is a p o p dation of ( p - A) selected randomly from among P(t), and pop-size denotes population size.
6. computer simulation

In the computer simulation, we tested the GA strategy against two different opponents; a systematic opponent, and human.
6.1. GA v s Systematic player

Here an opponent that makes a systematic choice is challenged by the GA strategy. The systematic choice is made by a function that tends to make a biased choice: prefers one choice on the others, or makes the choices systematically in turn. Tabel 2 shows a condition for simulation. Tabel 2 Number of Generations Game course Population Size Elite selection rate Mutation rate a

I 50 I 150 I 30

1

I 50%
0.01 0.005

+

In this algorithm, P(t) denotesea population of
p individuals at generation ( t ) . P ( t ) is a special
' P ( t ) is initialbed randomly

The graph of Figure 2 shows the best fitness as well as population average fitness during a training game under the environment of Table 2. Looking at the graphs in Figure 2, we notice that the GA after a random start, soon seems to understand the pattern of the behavior of the systematic player, and then develops strategies challenge the systematic player. The best strategy obtained is then adopted and five sets of one-to-one game, with 100, 200, 300,

743

100

260

360 Game course

4bo

560

Figure 3: Performance of GA vs Systematic Play (One-to-one game)

0

5

10

15

20

25

30

35

40

Generatlons

Figure 4 Best & average fitness through generations during training game

400, and 500-long courses, are carried out. The strategy maintained a high score against the systematic player, as appears i Figure 3. n
0.2.

G A vs H u m a n
Win
10

We set up an environment of ten people to play against the GA strategy proposed in the literature. Tabel 3 shows parameters and simulation environment of the GA during the training game. Table 3 No of Generations Game course Population Size Elite selection rate Mutation rate
I
I

5

0

1

2

3

4

5 6 Players

7

8

9

10

0.015 0.005

Figure 5: Elite GA strategy vs 10 players (30 sets for each player)
I

Figure 4 shows the best fitness as well as population average fitness during training game against player 1. Then, the computer adopts the best strategy obtained during the tTaining game later in the main game. Each player plays 30 sets (30 times), and Figure 5 shows the output of the tournament.
7. conclusion

The GA strategy maintained superiority over the systematic player as well as the human players. For the systematic strategy, it is relatively easy for the GA to predict the pattern of the behavior of the opponent. In contrast, the behavior of human players is continuously vacillating, and is difficult to predict.
744

Nevertheless, the genetic algorithm, with the adopted rule for coding, developed good strategies that performed very well. A novel feature of our approach is the introduction of the caprice concept which gives room for capricious behavior away from the encoded strategy when encountering subtle behavior patterns of the opponent that are not predicted during the preliminary training phase. The authors believe there is still alot to do with the experiments for the approach: experiments with longer historical record of the GA o p ponent( more than three!), trying the GA with other strategies (e.g. Tit-for-tat!), to mention but few. The work demonstrates a machine learning a p plication using genetic algorithms. The problem considered is drawn from game area: an archetyp ical problem where decisions are usually non-deter-

ministic and mostly biased.

[15] J. Nash, Non-cooperative games, Annals of Mathematics, 54, pp.286-295, 1950. [16] J.M. Smith, G.A. Parker, The Logic of Animal Confict, Nature, 246, pp- 15-18, 1973. [17] J.M. Smith, Evolution and the Theory of Games, Combridge University Press, London, 1982. [18] S. Stahl, A Gentle Introduction to Game Theory, American Mathematical Society, Mathematical World, Vol. 13, 1999.

Reference

[l] D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Leaming , Addison-Wesley, 1989.
[2] Z. Michalewice, Genetic Algorithms + Data Structures = Evolution Progmms, 3rd edition, Springer-Verlag, New York, 1996. [3] R. Axelrod, Genetic Algorithm for the Prisoner Dilemma Problem, in [4], pp.32 -41. [4] L. Davis (Editor), Genetic Algorithms and Simulated Annealing, Morgan Kaufmann Publishers, San Mateo, CA, 1987. [5] C.G. Langton, C. Taylor, J.D. Farmer, & S. Rasmussen (Editor), Artificial Life ZI, SFZ Studies in the Sciences of Complexity, Addison-Wesley, Vol. X , 1991. [6] M. J. Maynard-Smith, Evolution and the Theory of Games, Cambridge: Cambridge University Press, 1982. [7] J. Maynard-Smith, G.R. Price, The Logic of Animal Conflict, Nature, Vol. 246, pp. 15-18, London, 1973.
[8] J.H. Holland, Adaptation in Natuml and Ar-

tificial Systems, Ann Arbor, MI: Univ. of Michigan Press, 1975.
[9] G.H. Burgin, Systems identification by quasilinearization and evolutionary programming,J. Cybem., vol. 3, no. 2, pp. 6-75, 1973.

[lo] L.J. Fogel, Autonomous automata, Znd. Res., Vol. 4, pp. 14-19, 1962.

[113 I. Rechenberg, Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution, Stuttgart, Germany: Frommann-Holzboog, 1973.
[12] H.-P. Schwefel, G. Rudolph, Contemporary evolution strategies, 3rd Znt. Confe. on Artificial Life (Lecture Notes in Artificial Zntelligence, F. Morirn, A. Moreno, J.J. Merelo, and P. Chacbn, Eds. Berlin, Germany: Springer, Vol. 929, pp. 893-907, 1995. [13] J. von Neumann, J., and Morgenstrn, O., Theory of Games and Economic Behabior, Princeton University Press, Princeton, 1947. [14] B. Sinervo, C.M. Livey, The Rock-PaperScissors game and the evolution of alternative male strategies, Nature, 380, pp. 240-243, 1996.
745


								
To top