ComputerTournaments one described in chapter 1.It awarded both players 3 points for mutual cooperation,and 1 point for mutual defection. If one player defected while the other player cooperated, the defecting player received 5 points and the cooperating player received 0 points. No entry was disqualified for exceeding the allotted time.In fact,the entire round robin tournament was run five times to get a more stable estimate of the scores for each pair of players.In all,there were 120,000 moves, making for 240,000 separate choices. The fourteen submitted entries came from five disci- plines:psychology,economics,political science,mathe- matics,and sociology.Appendix A lists the names and affil- iations of the people who submitted these entries,and it gives the rank and score of their entries. One remarkable aspect of the tournament was that it allowed people from different disciplines to interact with each other in a common format and language.Most of the entrants were recruited from those who had published arti- cles on game theory in general or the Prisoner's Dilemma in particular. TIT FOR TAT,submitted by Professor Anatol Rapoport of the University of Toronto,won the tournament.This was the simplest of all submitted programs and it turned out to be the best! TIT FOR TAT,of course,starts with a cooperative choice,and thereafter does what the other player did on the previous move.This decision rule is probably the most widely known and most discussed rule for playing the Pris- oner's Dilemma.It is easily understood and easily pro- grammed.It is known to elicit a good degree of coopera- tion when played with humans(Oskamp 1971;W.Wilson 1971).As an entry in a computer tournament,it has the 31
Computer Tournaments one described in chapter 1. It awarded both players 3 points for mutual cooperation, and 1 point for mutual defection. If one player defected while the other player cooperated, the defecting player received 5 points and the cooperating player received 0 points. No entry was disqualified for exceeding the allotted time. In fact, the entire round robin tournament was run five times to get a more stable estimate of the scores for each pair of players. In all, there were 120,000 moves, making for 240,000 separate choices. The fourteen submitted entries came from five disciplines: psychology, economics, political science, mathematics, and sociology. Appendix A lists the names and affiliations of the people who submitted these entries, and it gives the rank and score of their entries. One remarkable aspect of the tournament was that it allowed people from different disciplines to interact with each other in a common format and language. Most of the entrants were recruited from those who had published articles on game theory in general or the Prisoner's Dilemma in particular. TIT FOR TAT, submitted by Professor Anatol Rapoport of the University of Toronto, won the tournament. This was the simplest of all submitted programs and it turned out to be the best! TIT FOR TAT, of course, starts with a cooperative choice, and thereafter does what the other player did on the previous move. This decision rule is probably the most widely known and most discussed rule for playing the Prisoner's Dilemma. It is easily understood and easily programmed. It is known to elicit a good degree of cooperation when played with humans (Oskamp 1971; W. Wilson 1971). As an entry in a computer tournament, it has the 31
The Emergence of Cooperation desirable properties that it is not very exploitable and that it does well with its own twin.It has the disadvantage that it is too generous with the RANDOM rule,which was known by the participants to be entered in the tournament. In addition,TIT FOR TAT was known to be a powerful competitor.In a preliminary tournament,TIT FOR TAT scored second place;and in a variant of that preliminary tournament,TIT FOR TAT won first place.All of these facts were known to most of the people designing pro- grams for the Computer Prisoner's Dilemma Tournament, because they were sent copies of a description of the pre- liminary tournament.Not surprisingly,many of them used the TIT FOR TAT principle and tried to improve upon it. The striking fact is that none of the more complex pro- grams submitted was able to perform as well as the origi- nal,simple TIT FOR TAT. This result contrasts with computer chess tournaments, where complexity is obviously needed.For example,in the Second World Computer Chess Championships,the least complex program came in last (Jennings 1978).It was sub- mitted by Johann Joss of the Eidgenossishe Technische Hochschule of Zurich,Switzerland,who also submitted an entry to the Computer Prisoner's Dilemma Tournament. His entry to the Prisoner's Dilemma Tournament was a small modification of TIT FOR TAT.But his modifica- tion,like the others,just lowered the performance of the decision rule. Analysis of the results showed that neither the discipline of the author,the brevity of the program-nor its length- accounts for a rule's relative success.What does? Before answering this question,a remark on the inter- pretation of numerical scores is in order.In a game of 200 moves,a useful benchmark for very good performance is 32
The Emergence of Cooperation desirable properties that it is not very exploitable and that it does well with its own twin. It has the disadvantage that it is too generous with the RANDOM rule, which was known by the participants to be entered in the tournament. In addition, TIT FOR TAT was known to be a powerful competitor. In a preliminary tournament, TIT FOR TAT scored second place; and in a variant of that preliminary tournament, TIT FOR TAT won first place. All of these facts were known to most of the people designing programs for the Computer Prisoner's Dilemma Tournament, because they were sent copies of a description of the preliminary tournament. Not surprisingly, many of them used the TIT FOR TAT principle and tried to improve upon it. The striking fact is that none of the more complex programs submitted was able to perform as well as the original, simple TIT FOR TAT. This result contrasts with computer chess tournaments, where complexity is obviously needed. For example, in the Second World Computer Chess Championships, the least complex program came in last (Jennings 1978). It was submitted by Johann Joss of the Eidgenossishe Technische Hochschule of Zurich, Switzerland, who also submitted an entry to the Computer Prisoner's Dilemma Tournament. His entry to the Prisoner's Dilemma Tournament was a small modification of TIT FOR TAT. But his modification, like the others, just lowered the performance of the decision rule. Analysis of the results showed that neither the discipline of the author, the brevity of the program—nor its length— accounts for a rule's relative success. What does? Before answering this question, a remark on the interpretation of numerical scores is in order. In a game of 200 moves, a useful benchmark for very good performance is 32
ComputerTournaments 600 points,which is equivalent to the score attained by a player when both sides always cooperate with each other. A useful benchmark for very poor performance is 200 points,which is equivalent to the score attained by a player when both sides never cooperate with each other.Most scores range between 200 and 600 points,although scores from 0 to 1000 points are possible.The winner,TIT FOR TAT,averaged 504 points per game. Surprisingly,there is a single property which distin- guishes the relatively high-scoring entries from the rela- tively low-scoring entries.This is the property of being nice,which is to say never being the first to defect.(For the sake of analyzing this tournament,the definition of a nice rule will be relaxed to include rules which will not be the first to defect before the last few moves,say before move 199.) Each of the eight top-ranking entries (or rules)is nice. None of the other entries is.There is even a substantial gap in the score between the nice entries and the others.The nice entries received tournament averages between 472 and 504,while the best of the entries that were not nice re- ceived only 401 points.Thus,not being the first to defect, at least until virtually the end of the game,was a property which,all by itself,separated the more successful rules from the less successful rules in this Computer Prisoner's Dilemma Tournament. Each of the nice rules got about 600 points with each of the other seven nice rules and with its own twin.This is because when two nice rules play,they are sure to cooper- ate with each other until virtually the end of the game. Actually the minor variations in end-game tactics did not account for much variation in the scores. Since the nice rules all got within a few points of 600 33
Computer Tournaments 600 points, which is equivalent to the score attained by a player when both sides always cooperate with each other. A useful benchmark for very poor performance is 200 points, which is equivalent to the score attained by a player when both sides never cooperate with each other. Most scores range between 200 and 600 points, although scores from 0 to 1000 points are possible. The winner, TIT FOR TAT, averaged 504 points per game. Surprisingly, there is a single property which distinguishes the relatively high-scoring entries from the relatively low-scoring entries. This is the property of being nice, which is to say never being the first to defect. (For the sake of analyzing this tournament, the definition of a nice rule will be relaxed to include rules which will not be the first to defect before the last few moves, say before move 199.) Each of the eight top-ranking entries (or rules) is nice. None of the other entries is. There is even a substantial gap in the score between the nice entries and the others. The nice entries received tournament averages between 472 and 504, while the best of the entries that were not nice received only 401 points. Thus, not being the first to defect, at least until virtually the end of the game, was a property which, all by itself, separated the more successful rules from the less successful rules in this Computer Prisoner's Dilemma Tournament. Each of the nice rules got about 600 points with each of the other seven nice rules and with its own twin. This is because when two nice rules play, they are sure to cooperate with each other until virtually the end of the game. Actually the minor variations in end-game tactics did not account for much variation in the scores. Since the nice rules all got within a few points of 600 33
The Emergence of Cooperation with each other,the thing that distinguished the relative rankings among the nice rules was their scores with the rules which are not nice.This much is obvious.What is not obvious is that the relative ranking of the eight top rules was largely determined by just two of the other seven rules.These two rules are kingmakers because they do not do very well for themselves,but they largely determine the rankings among the top contenders. The most important kingmaker was based on an "out- come maximization"principle originally developed as a possible interpretation of what human subjects do in the Prisoner's Dilemma laboratory experiments (Downing 1975).This rule,called DOWNING,is a particularly in- teresting rule in its own right.It is well worth studying as an example of a decision rule which is based upon a quite sophisticated idea.Unlike most of the others,its logic is not just a variant ofTIT FOR TAT.Instead it is based on a deliberate attempt to understand the other player and then to make the choice that will yield the best long-term score based upon this understanding.The idea is that ifthe other player does not seem responsive to what DOWNING is doing,DOWNING will try to get away with whatever it can by defecting.On the other hand,if the other player does seem responsive,DOWNING will cooperate.To judge the other's responsiveness,DOWNING estimates the probability that the other player cooperates after it (DOWNING)cooperates,and also the probability that the other player cooperates after DOWNING defects.For each move,it updates its estimate of these two conditional probabilities and then selects the choice which will maxi- mize its own long-term payoff under the assumption that it has correctly modeled the other player.If the two condi- tional probabilities have similar values,DOWNING deter- 34
The Emergence of Cooperation with each other, the thing that distinguished the relative rankings among the nice rules was their scores with the rules which are not nice. This much is obvious. What is not obvious is that the relative ranking of the eight top rules was largely determined by just two of the other seven rules. These two rules are kingmakers because they do not do very well for themselves, but they largely determine the rankings among the top contenders. The most important kingmaker was based on an "outcome maximization" principle originally developed as a possible interpretation of what human subjects do in the Prisoner's Dilemma laboratory experiments (Downing 1975). This rule, called DOWNING, is a particularly interesting rule in its own right. It is well worth studying as an example of a decision rule which is based upon a quite sophisticated idea. Unlike most of the others, its logic is not just a variant of TIT FOR TAT. Instead it is based on a deliberate attempt to understand the other player and then to make the choice that will yield the best long-term score based upon this understanding. The idea is that if the other player does not seem responsive to what DOWNING is doing, DOWNING will try to get away with whatever it can by defecting. On the other hand, if the other player does seem responsive, DOWNING will cooperate. To judge the other's responsiveness, DOWNING estimates the probability that the other player cooperates after it (DOWNING) cooperates, and also the probability that the other player cooperates after DOWNING defects. For each move, it updates its estimate of these two conditional probabilities and then selects the choice which will maximize its own long-term payoff under the assumption that it has correctly modeled the other player. If the two conditional probabilities have similar values, DOWNING deter- 34
Computer Tournaments mines that it pays to defect,since the other player seems to be doing the same thing whether DOWNING cooperates or not.Conversely,if the other player tends to cooperate after a cooperation but not after a defection by DOWN- ING,then the other player seems responsive,and DOWNING will calculate that the best thing to do with a responsive player is to cooperate.Under certain circum- stances,DOWNING will even determine that the best strategy is to alternate cooperation and defection. At the start of a game,DOWNING does not know the values of these conditional probabilities for the other play- ers.It assumes that they are both.5,but gives no weight to this estimate when information actually does come in dur- ing the play of the game. This is a fairly sophisticated decision rule,but its imple- mentation does have one flaw.By initially assuming that the other player is unresponsive,DOWNING is doomed to defect on the first two moves.These first two defections led many other rules to punish DOWNING,so things usually got off to a bad start.But this is precisely why DOWNING served so well as a kingmaker.First-ranking TIT FOR TAT and second-ranking TIDEMAN AND CHIERUZZI both reacted in such a way that DOWN- ING learned to expect that defection does not pay but that cooperation does.All of the other nice rules went downhill with DOWNING. The nice rules did well in the tournament largely be- cause they did so well with each other,and because there were enough of them to raise substantially each other's average score.As long as the other player did not defect. each of the nice rules was certain to continue cooperating until virtually the end of the game.But what happened if there was a defection?Different rules responded quite dif- 35
Computer Tournaments mines that it pays to defect, since the other player seems to be doing the same thing whether DOWNING cooperates or not. Conversely, if the other player tends to cooperate after a cooperation but not after a defection by DOWNING, then the other player seems responsive, and DOWNING will calculate that the best thing to do with a responsive player is to cooperate. Under certain circumstances, DOWNING will even determine that the best strategy is to alternate cooperation and defection. At the start of a game, DOWNING does not know the values of these conditional probabilities for the other players. It assumes that they are both .5, but gives no weight to this estimate when information actually does come in during the play of the game. This is a fairly sophisticated decision rule, but its implementation does have one flaw. By initially assuming that the other player is unresponsive, DOWNING is doomed to defect on the first two moves. These first two defections led many other rules to punish DOWNING, so things usually got off to a bad start. But this is precisely why DOWNING served so well as a kingmaker. First-ranking TIT FOR TAT and second-ranking TIDEMAN AND CHIERUZZI both reacted in such a way that DOWNING learned to expect that defection does not pay but that cooperation does. All of the other nice rules went downhill with DOWNING. The nice rules did well in the tournament largely because they did so well with each other, and because there were enough of them to raise substantially each other's average score. As long as the other player did not defect, each of the nice rules was certain to continue cooperating until virtually the end of the game. But what happened if there was a defection? Different rules responded quite dif- 35