MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.001-Structure and Interpretation of Computer Programs Spring semester, 2005 Project 2-Prisoner's Dilemma Issued: Monday, February 21 To Be Completed By: Friday, March 11, 6: 00 pm Reading: Sections 2. 1, 2.2. 1 and 2. 2. 2 in Structure and Interpretation of Computer Programs Code to load for this project o a link to the system code file prisoner. scm is provided from the Projects link on the projects section Purpose Project 2 focuses on the use of higher order procedures, together with data structures You will also further develop and demonstrate your ability to write clear, intelligible well-documented procedures, as well as test cases for your procedures A Fable In the mid-1920S, the Nebraska State Police achieved what may still be their finest moment. After a 400-mile car chase over dirt roads and through corn fields, they finall caught up with the notorious bank robbers bunny and Clod The two criminals were brought back to the police station in Omaha for further interrogation. Bunny and Clod were questioned in separate rooms, and each was offered the same deal by the police. The deal went as follows(since both are the same, we need only describe the version presented to Bunny) Bunny, here's the offer that we are making to both you and Clod. If you both hold out on us and don' t confess to bank robbery then we admit that we don' t have enough proof to convict you. However, we will be able to jail you both for one year, for reckless driving and endangerment of corn. If you turn state's witness and help us convict Clod(assuming he doesnt confess ), then you will go free, and Clod will get twenty years in prison. On the other hand, if you don' t confess and Clod does, then he will go free and you will get twenty years What happens if both Clod and I confess? asked Bunny Then you both get ten years, responded the polic Bunny, who had been a math major at Cal Tech before turning to crime, reasoned this way: Suppose Clod intends to confess. Then if I don,' t confess, I'll get twenty years, but
MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.001 – Structure and Interpretation of Computer Programs Spring Semester, 2005 Project 2 – Prisoner's Dilemma • Issued: Monday, February 21 • To Be Completed By: Friday, March 11, 6:00 pm • Reading: Sections 2.1, 2.2.1 and 2.2.2 in Structure and Interpretation of Computer Programs • Code to load for this project: o A link to the system code file prisoner.scm is provided from the Projects link on the projects section. Purpose Project 2 focuses on the use of higher order procedures, together with data structures. You will also further develop and demonstrate your ability to write clear, intelligible, well-documented procedures, as well as test cases for your procedures. A Fable In the mid-1920's, the Nebraska State Police achieved what may still be their finest moment. After a 400-mile car chase over dirt roads and through corn fields, they finally caught up with the notorious bank robbers Bunny and Clod. The two criminals were brought back to the police station in Omaha for further interrogation. Bunny and Clod were questioned in separate rooms, and each was offered the same deal by the police. The deal went as follows (since both are the same, we need only describe the version presented to Bunny): “Bunny, here's the offer that we are making to both you and Clod. If you both hold out on us and don't confess to bank robbery, then we admit that we don't have enough proof to convict you. However, we will be able to jail you both for one year, for reckless driving and endangerment of corn. If you turn state's witness and help us convict Clod (assuming he doesn't confess), then you will go free, and Clod will get twenty years in prison. On the other hand, if you don't confess and Clod does, then he will go free and you will get twenty years.” “What happens if both Clod and I confess?” asked Bunny. “Then you both get ten years,” responded the police. Bunny, who had been a math major at Cal Tech before turning to crime, reasoned this way: “Suppose Clod intends to confess. Then if I don't confess, I'll get twenty years, but
if I do confess, I'll only get ten years. On the other hand, suppose Clod intends to hold out on the cops. Then if I don,'t confess, I'll go to jail for a year, but if I do confess, I'll go free. So no matter what Clod intends to do. i am better off confessing than holding out So id better confess Naturally, Clod employed the very same reasoning. Both criminals confessed, and both went to jail for ten years(Well, actually they didn,'t go to jail. when they were in court, and heard that they had both turned state's witness, they strangled each other. But thats another story. The police, of course, were triumphant, since the criminals would hay een free in a year had both remained silent The prisoner's dilemma The Bunny and Clod story is an example of a situation known in mathematical game theory as the"prisoner's dilemma. A prisoner's dilemma al ways involves two game players, " and each has a choice between"cooperating and"defecting. If the two players cooperate, they each do moderately well; if they both defect, they each do moderately poorly. If one player cooperates and the other defects, then the defector does extremely well and the cooperator does extremely poorly. (In the case of the Bunny and Clod story, cooperating means cooperating with one's partner-ie holding out on the police -and"defecting" means confessing to bank robbery. Before formalizing the prisoner's dilemma situation, we need to introduce some basic game theory notation A Crash Course in Game Theory In game theory, we differentiate between a game, and a play. a game refers to the set of possible choices and outcomes for the entire range of situations. a play refers to a specific set of choices by the players, with the associated outcome for that particular scenario. Thus, in game theory, a two-person binary-choice game is represented by a two-by-two matrix. Here is a hypothetical game matrix Cooperates B defects A cooperates A gets 5 A gets 2 B gets 5 b gets 3 A defects a gets 3 A gets 1 B gets 2 B gets The two players in this case are called A and B, and the choices are called"cooperate and"defect. Players A and B can play a single game by separately(and secretly hoosing either to cooperate or to defect. Once each player has made a choice, he nces it to the other player; and the two then look up their respective scores in the game matrix. Each entry in the matrix is a pair of numbers indicating a score for each player, depending on their choices. Thus, in the example above, if Player a chooses to cooperate while Player B defects, then a gets 2 points and B gets 3 points. If both players defect, they each get 1 point. Note, by the way, that the game matrix is a matter of public
if I do confess, I'll only get ten years. On the other hand, suppose Clod intends to hold out on the cops. Then if I don't confess, I'll go to jail for a year, but if I do confess, I'll go free. So no matter what Clod intends to do, I am better off confessing than holding out. So I'd better confess.” Naturally, Clod employed the very same reasoning. Both criminals confessed, and both went to jail for ten years (Well, actually they didn't go to jail. When they were in court, and heard that they had both turned state's witness, they strangled each other. But that's another story.) The police, of course, were triumphant, since the criminals would have been free in a year had both remained silent. The Prisoner's Dilemma The Bunny and Clod story is an example of a situation known in mathematical game theory as the “prisoner's dilemma.” A prisoner's dilemma always involves two “game players,” and each has a choice between “cooperating” and “defecting.” If the two players cooperate, they each do moderately well; if they both defect, they each do moderately poorly. If one player cooperates and the other defects, then the defector does extremely well and the cooperator does extremely poorly. (In the case of the Bunny and Clod story, “cooperating” means cooperating with one's partner - i.e. holding out on the police - and “defecting” means confessing to bank robbery.) Before formalizing the prisoner's dilemma situation, we need to introduce some basic game theory notation. A Crash Course in Game Theory In game theory, we differentiate between a game, and a play. A game refers to the set of possible choices and outcomes for the entire range of situations. A play refers to a specific set of choices by the players, with the associated outcome for that particular scenario. Thus, in game theory, a two-person binary-choice game is represented by a two-by-two matrix. Here is a hypothetical game matrix. B cooperates B defects A cooperates A gets 5 A gets 2 B gets 5 B gets 3 A defects A gets 3 A gets 1 B gets 2 B gets 1 The two players in this case are called A and B, and the choices are called “cooperate” and “defect.” Players A and B can play a single game by separately (and secretly) choosing either to cooperate or to defect. Once each player has made a choice, he announces it to the other player; and the two then look up their respective scores in the game matrix. Each entry in the matrix is a pair of numbers indicating a score for each player, depending on their choices. Thus, in the example above, if Player A chooses to cooperate while Player B defects, then A gets 2 points and B gets 3 points. If both players defect, they each get 1 point. Note, by the way, that the game matrix is a matter of public
knowledge, for instance, Player A knows before the game even starts that if he and B both choose to defect, they will each get I point In an iterated game, the two players play repeatedly; thus after finishing one game, A and B may play another(Admittedly, there is a little confusion in the terminology here; thus we refer to each iteration as a"play, which constitutes a single"round " of the larger iterated game. ) There are a number of ways in which iterated games may be played; in the simplest situation, A and B play for some fixed number of rounds(say 200), and before each round, they are able to look at the record of all previous rounds. For instance, before playing the tenth round of their iterated game, both a and b are able to study the sults of the previous nine rounds An Analysis of a simple game matrix The game depicted by the matrix above is a particularly easy one to analyze. Let's examine the situation from Player A's point of view(Player B's point of view is identical) Suppose B cooperates. Then I do better by cooperating myself (I receive five points instead of three). On the other hand, suppose b defects. I still do better by cooperating (since I get two points instead of one). So no matter what B does, I am better off cooperating Player B will, of course, reason the same way, and both will choose to cooperate In the terminology of game theory, both a and b have a dominant choice -i. e, a choice that gives a preferred outcome no matter what the other player chooses to do. The matrix shown above, by the way, does not represent a prisoner's dilemma situation, since when both players make their dominant choice, they also both achieve their highest personal scores. Well see an example of a prisoner's dilemma game very shortly To re-cap: in any particular game using the above matrix, we would expect both players to cooperate, and in an iterated game, we would expect both players to cooperate repeatedly, on every round The Prisoner's Dilemma game matrix Now consider the following game matrix B defects A cooperates gets 3 B gets 3 B gets 5 A defects a gets 5 B gets 0 B gets 1 what Plava farers A and b both have a dominant choice -namely, defection. No matter layer B does, Player A improves his own score by defecting, and vice versa
knowledge; for instance, Player A knows before the game even starts that if he and B both choose to defect, they will each get 1 point. In an iterated game, the two players play repeatedly; thus after finishing one game, A and B may play another. (Admittedly, there is a little confusion in the terminology here; thus we refer to each iteration as a “play,” which constitutes a single “round” of the larger, iterated game.) There are a number of ways in which iterated games may be played; in the simplest situation, A and B play for some fixed number of rounds (say 200), and before each round, they are able to look at the record of all previous rounds. For instance, before playing the tenth round of their iterated game, both A and B are able to study the results of the previous nine rounds. An Analysis of a Simple Game Matrix The game depicted by the matrix above is a particularly easy one to analyze. Let's examine the situation from Player A's point of view (Player B's point of view is identical): “Suppose B cooperates. Then I do better by cooperating myself (I receive five points instead of three). On the other hand, suppose B defects. I still do better by cooperating (since I get two points instead of one). So no matter what B does, I am better off cooperating.” Player B will, of course, reason the same way, and both will choose to cooperate. In the terminology of game theory, both A and B have a dominant choice - i.e., a choice that gives a preferred outcome no matter what the other player chooses to do. The matrix shown above, by the way, does not represent a prisoner's dilemma situation, since when both players make their dominant choice, they also both achieve their highest personal scores. We'll see an example of a prisoner's dilemma game very shortly. To re-cap: in any particular game using the above matrix, we would expect both players to cooperate; and in an iterated game, we would expect both players to cooperate repeatedly, on every round. The Prisoner's Dilemma Game Matrix Now consider the following game matrix: B cooperates B defects A cooperates A gets 3 A gets 0 B gets 3 B gets 5 A defects A gets 5 A gets 1 B gets 0 B gets 1 In this case, Players A and B both have a dominant choice - namely, defection. No matter what Player B does, Player A improves his own score by defecting, and vice versa
However, there is something odd about this game. It seems as though the two players would benefit by choosing to cooperate. Instead of winning only one point each, they could win three points each. So the"rational"choice of mutual defection has a puzzlin self-destructive flavor The second matrix is an example of a prisoner's dilemma game situation. Just to formalize the situation, let CC be the number of points won by each player when they both cooperate; let dd be the number of points won when both defect; let D be the number of points won by the cooperating party when the other defects; and let dC be the number of points won by the defecting party when the other cooperates. Then the prisoner's dilemma situation is characterized by the following conditions DC>CC>DD>CD DC+ CD In the second game matrix, we have dc=5. Cc=3. DD=1 CD=0 so both conditions are met. In the bunny and clod story, by the way, you can verify that DC=0, 1,DD=-10,CD=-20 Again, these values satisfy the prisoner's dilemma conditions Axelrod's tournament In the late 1970 s political scientist robert Axelrod held a computer tournament designed to investigate the prisoner's dilemma situation(Actually, there were two tournaments Their rules and results are described in Axelrod's book: The Evolution of Cooperation. Contestants in the tournament submitted computer programs that would compete in an iterated prisoner's dilemma game of approximately two hundred rounds, using the second matrix above. Each contestant's program played five iterated games against each of the other programs submitted, and after all games had been played the scores were tallied The contestants in Axelrod's tournament included professors of political science mathematics, computer science, and economics. The winning program-the program the highest average score- was submitted by Anatol Rapoport, a professor of psychole make up our own Scheme programs to play the iterated prisoner's dilemma game s and at the University of Toronto. In this project, we will pursue Axelrod's investigatio As part of this project, we will be running a similar tournament, but now involving a hree-person prisoner,s dilemma
However, there is something odd about this game. It seems as though the two players would benefit by choosing to cooperate. Instead of winning only one point each, they could win three points each. So the “rational” choice of mutual defection has a puzzling self-destructive flavor. The second matrix is an example of a prisoner's dilemma game situation. Just to formalize the situation, let CC be the number of points won by each player when they both cooperate; let DD be the number of points won when both defect; let CD be the number of points won by the cooperating party when the other defects; and let DC be the number of points won by the defecting party when the other cooperates. Then the prisoner's dilemma situation is characterized by the following conditions: In the second game matrix, we have so both conditions are met. In the Bunny and Clod story, by the way, you can verify that: Again, these values satisfy the prisoner's dilemma conditions. Axelrod's Tournament In the late 1970's, political scientist Robert Axelrod held a computer tournament designed to investigate the prisoner's dilemma situation (Actually, there were two tournaments. Their rules and results are described in Axelrod's book: The Evolution of Cooperation.). Contestants in the tournament submitted computer programs that would compete in an iterated prisoner's dilemma game of approximately two hundred rounds, using the second matrix above. Each contestant's program played five iterated games against each of the other programs submitted, and after all games had been played the scores were tallied. The contestants in Axelrod's tournament included professors of political science, mathematics, computer science, and economics. The winning program - the program with the highest average score - was submitted by Anatol Rapoport, a professor of psychology at the University of Toronto. In this project, we will pursue Axelrod's investigations and make up our own Scheme programs to play the iterated prisoner's dilemma game. As part of this project, we will be running a similar tournament, but now involving a three-person prisoner's dilemma
Before we look at the two-player program, it is worth speculating on what possible strategies might be employed in the iterated prisoner's dilemma game. Here are some examples Nasty -a program using the Nasty strategy simply defects on every round of every game Patsy-a program using the Patsy strategy cooperates on every round of every game Spastic-this program cooperates or defects on a random basis Egalitarian-this program cooperates on the first round. On all subsequent rounds Egalitarian examines the history of the other player's actions, counting the total number of defections and cooperations by the other player. If the other player's defections outnumber her cooperations, Egalitarian will defect; otherwise this strategy will cooperate round it mimics the other player's previous move. Thus, if the other player cooperater Eye-for-Eye-this program cooperates on the first round, and then on every subsequer (defects)on the nth round, then Eye-for-Eye will cooperate( defect)on the(n+ 1 )st round All of these strategies are extremely simple. (Indeed, the first three do not even pay any attention to the other player; their responses are uninfluenced by the previous rounds of the game. )Nevertheless, simplicity is not necessarily a disadvantage. Rapoport's first prize program employed the Eye-for-Eye strategy, and achieved the highest average score in a field of far more complicated programs The Two-Player Prisoner's Dilemma Program A Scheme program for an iterated prisoner's dilemma game is provided as part of the code for this project. The procedure play-loop pits two players(or, to be more precise, two"strategies" )against one another for approximately 100 games, then prints out the average score of each player Player strategies are represented as procedures. Each strategy takes two inputs-its own history"(that is, a list of all its previous"plays, where for convenience we will use"c to represent cooperate, and"d"to represent defect)and its opponent's "history. " The strategy returns either the string"c"for"cooperate"or the string"d"for"defect "(Note that we will need to use procedures appropriate for comparing strings when we analyze these results. At the beginning of an iterated game, each history is an empty list. As the game progresses, the histories grow(via extend-history)into lists of "c"s and"d"s, thus each history is stored from most recent to least recent. Note how each strategy must have its own history as its first input. So in play-loop-iter, strato has history as its first input, and strati has historyl as its first input
Before we look at the two-player program, it is worth speculating on what possible strategies might be employed in the iterated prisoner's dilemma game. Here are some examples: Nasty - a program using the Nasty strategy simply defects on every round of every game. Patsy - a program using the Patsy strategy cooperates on every round of every game. Spastic - this program cooperates or defects on a random basis. Egalitarian - this program cooperates on the first round. On all subsequent rounds, Egalitarian examines the history of the other player's actions, counting the total number of defections and cooperations by the other player. If the other player's defections outnumber her cooperations, Egalitarian will defect; otherwise this strategy will cooperate. Eye-for-Eye - this program cooperates on the first round, and then on every subsequent round it mimics the other player's previous move. Thus, if the other player cooperates (defects) on the nth round, then Eye-for-Eye will cooperate (defect) on the (n+1)st round. All of these strategies are extremely simple. (Indeed, the first three do not even pay any attention to the other player; their responses are uninfluenced by the previous rounds of the game.) Nevertheless, simplicity is not necessarily a disadvantage. Rapoport's firstprize program employed the Eye-for-Eye strategy, and achieved the highest average score in a field of far more complicated programs. The Two-Player Prisoner's Dilemma Program A Scheme program for an iterated prisoner's dilemma game is provided as part of the code for this project. The procedure play-loop pits two players (or, to be more precise, two “strategies”) against one another for approximately 100 games, then prints out the average score of each player. Player strategies are represented as procedures. Each strategy takes two inputs - its own “history” (that is, a list of all its previous “plays,” where for convenience we will use "c" to represent cooperate, and "d" to represent defect) and its opponent's “history.” The strategy returns either the string “c” for “cooperate” or the string “d” for “defect.” (Note that we will need to use procedures appropriate for comparing strings when we analyze these results.) At the beginning of an iterated game, each history is an empty list. As the game progresses, the histories grow (via extend-history) into lists of “c”'s and “d”'s, thus each history is stored from most recent to least recent. Note how each strategy must have its own history as its first input. So in play-loop-iter, strat0 has history0 as its first input, and strat1 has history1 as its first input