Assignment 3: Genetic algorithms

This week your task is to familiarize yourself with genetic algorithms (GA), and apply the method to evolving strategies in the game of iterated Prisoner's dilemma. Readings on GA's can be found in the course folder.

Alternatively, you can apply GA to a problem of your choice.

Prisoner's dilemma

Prisoner's dilemma is a non-cooperative two-person non-zero-sum game that has been extensively studied in the fields interested in decision processes, for instance in economics, psychology, political science and sociology, just to name few.

The story goes: Two criminals are arrested under the suspicion of having committed a crime together. However, the police does not have sufficient proof in order to have them convicted. The two prisoners are isolated from each other, and the police visit each of them to offer a deal: the one who testifies against the other one will be freed. If neither testifies against the other (cooperate), both of them will get only a small punishment because of lack of proof. However, if one of them betrays the other one by confessing to the police (defect), the defector will gain more, since he is freed, and the one who remained silent will receive the full punishment. If both defect, both will be punished more severely than if they had refused to talk. The dilemma resides in the fact that each prisoner has a choice between only two options, but cannot make a good decision without knowing what the other one will do.

The payoff matrix of the game is as follows --- the larger the number, the higher the payoff. The prisoner 1's payoff is listed first.

Prisoner 2

cooperate defect

Prisoner 1 cooperate 3, 3 0, 5

defect 5, 0 1, 1

	Prisoner 2
	cooperate	defect
Prisoner 1	cooperate	3, 3	0, 5
defect	5, 0	1, 1

The task

Originally the dilemma was formulated as a one-time affair. However, in the studies exploiting Prisoner's dilemma this "game" is played repeatedly; the interest lies in how players change their behavior as they observe the opponent's decisions. Note, that the decisions are made simultaneously; the players only get to know their opponent's decision after they both have made their choices.

Your task is to implement a genetic algorithm that learns good strategies against any kind of a player: always defecting, always cooperating, cooperating after the other player cooperates etc. The goodness is measured as the cumulative payoff. The next chapter is to give you an idea how to code the strategies as chromosomes for your GA. Of course, you can adopt any other method of coding strategies that you find convenient.

Test your GA against different opponents and by varying the genetic operators (and the memory length if you choose to use the method described below), and be prepared to present your results in the class.

Some hints on implementation

There is a very simple method of representing strategies as bit strings.

Let us assume a memory length M, and mark 'defect' as 0, and 'cooperate' as 1. The memory stores both players' actions for the last M plays of the game (=game history). Since, the memory of length M requires 2M bits to encode these actions, a total of 2^(2M) game histories can be encoded. These game histories can be indexed by natural numbers from zero to 2^(2M)-1, so that the index 0 represents the history 000...0 (both players have defected all the time), and 2^(2M)-1 represents the history 1111...1 (both players have cooperated all the time).

For example, if M=3, 6 bits are needed to record all possible game histories of 3 plays. This gives 64 different game histories, which can be indexed by numbers 0, ..., 63, so that 0 represents the history 000000, and 63 represents the history 111111, and the indices in between all other possible combinations of 0 and 1 of length 6.

Now, the strategy is a mapping from all possible game histories of length M to actions (defect=0 or cooperate=1). This can be represented simply as an array of length 2^(2M), so that each element stores the action to be given as a response to the game history corresponding to the index of the element.

For instance (assume M=3), the strategy 10011...1 means that the player chooses to cooperate as a response to the history with index 0 (both players have defected the last 3 times), but chooses to defect as a response to history with index 1 (000001, i.e., one player cooperated in the last round, and both players defected at the two previous rounds), ..., and finally chooses to cooperate if both players have cooperated the last 3 rounds.

The population for the GA is then a collection of different strategies, that is, bit strings representing responses to different game histories.