Introduction other player cooperates.The worst a player can do is get S, the sucker's payoff for cooperating while the other player defects.In ordering the other two outcomes,R,the reward for mutual cooperation,is assumed to be better than P,the punishment for mutual defection.This leads to a prefer- ence ranking of the four payoffs from best to worst as T,R, P,and S. The second part of the definition of the Prisoner's Di- lemma is that the players cannot get out of their dilemma by taking turns exploiting each other.This assumption means that an even chance of exploitation and being ex- ploited is not as good an outcome for a player as mutual cooperation.It is therefore assumed that the reward for mutual cooperation is greater than the average of the temp- tation and the sucker's payoff:This assumption,together with the rank ordering of the four payoffs,defines the Pris- oner's Dilemma. Thus two egoists playing the game once will both choose their dominant choice,defection,and each will get less than they both could have gotten if they had cooperated.If the game is played a known finite number of times,the players still have no incentive to cooperate.This is certainly true on the last move since there is no future to influence. On the next-to-last move neither player will have an in- centive to cooperate since they can both anticipate a defec- tion by the other player on the very last move.Such a line of reasoning implies that the game will unravel all the way back to mutual defection on the first move of any sequence of plays that is of known finite length (Luce and Raiffa 1957,pp.94-102).This reasoning does not apply if the players will interact an indefinite number of times.And in most realistic settings,the players cannot be sure when the last interaction between them will take place.As will be 10
Introduction other player cooperates. The worst a player can do is get S, the sucker's payoff for cooperating while the other player defects. In ordering the other two outcomes, R, the reward for mutual cooperation, is assumed to be better than P, the punishment for mutual defection. This leads to a preference ranking of the four payoffs from best to worst as T, R, P, and S. The second part of the definition of the Prisoner's Dilemma is that the players cannot get out of their dilemma by taking turns exploiting each other. This assumption means that an even chance of exploitation and being exploited is not as good an outcome for a player as mutual cooperation. It is therefore assumed that the reward for mutual cooperation is greater than the average of the temptation and the sucker's payoff. This assumption, together with the rank ordering of the four payoffs, defines the Prisoner's Dilemma. Thus two egoists playing the game once will both choose their dominant choice, defection, and each will get less than they both could have gotten if they had cooperated. If the game is played a known finite number of times, the players still have no incentive to cooperate. This is certainly true on the last move since there is no future to influence. On the next-to-last move neither player will have an incentive to cooperate since they can both anticipate a defection by the other player on the very last move. Such a line of reasoning implies that the game will unravel all the way back to mutual defection on the first move of any sequence of plays that is of known finite length (Luce and Raiffa 1957, pp. 94-102). This reasoning does not apply if the players will interact an indefinite number of times. And in most realistic settings, the players cannot be sure when the last interaction between them will take place. As will be 10
The Problem ofCooperation shown later,with an indefinite number of interactions,co- operation can emerge.The issue then becomes the discov- ery of the precise conditions that are necessary and suffi- cient for cooperation to emerge. In this book I will examine interactions between just two players at a time.A single player may be interacting with many others,but the player is assumed to be interacting with them one at a time.'The player is also assumed to recognize another player and to remember how the two of them have interacted so far.This ability to recognize and remember allows the history of the particular interaction to be taken into account by a player's strategy. A variety of ways to resolve the Prisoner's Dilemma have been developed.Each involves allowing some additional activity that alters the strategic interaction in such a way as to fundamentally change the nature of the problem.The original problem remains,however,because there are many situations in which these remedies are not available.There- fore,the problem will be considered in its fundamental form,without these alterations. 1.There is no mechanism available to the players to make enforceable threats or commitments (Schelling 1960).Since the players cannot commit themselves to a particular strategy,each must take into account all possible strategies that might be used by the other player.Moreover the players have all possible strategies available to themselves. 2.There is no way to be sure what the other player will do on a given move.This eliminates the possibility of me- tagame analysis (Howard 1971)which allows such options as "make the same choice as the other is about to make."It also eliminates the possibility of reliable reputations such as might be based on watching the other player interact with 11
The Problem of Cooperation shown later, with an indefinite number of interactions, cooperation can emerge. The issue then becomes the discovery of the precise conditions that are necessary and sufficient for cooperation to emerge. In this book I will examine interactions between just two players at a time. A single player may be interacting with many others, but the player is assumed to be interacting with them one at a time.3 The player is also assumed to recognize another player and to remember how the two of them have interacted so far. This ability to recognize and remember allows the history of the particular interaction to be taken into account by a player's strategy. A variety of ways to resolve the Prisoner's Dilemma have been developed. Each involves allowing some additional activity that alters the strategic interaction in such a way as to fundamentally change the nature of the problem. The original problem remains, however, because there are many situations in which these remedies are not available. Therefore, the problem will be considered in its fundamental form, without these alterations. 1. There is no mechanism available to the players to make enforceable threats or commitments (Schelling 1960). Since the players cannot commit themselves to a particular strategy, each must take into account all possible strategies that might be used by the other player. Moreover the players have all possible strategies available to themselves. 2. There is no way to be sure what the other player will do on a given move. This eliminates the possibility of metagame analysis (Howard 1971) which allows such options as "make the same choice as the other is about to make." It also eliminates the possibility of reliable reputations such as might be based on watching the other player interact with 11
Introduction third parties.Thus the only information available to the players about each other is the history of their interaction so far. 3.There is no way to eliminate the other player or run away from the interaction.Therefore each player retains the ability to cooperate or defect on each move. 4.There is no way to change the other player's payoffs. The payoffs already include whatever consideration each player has for the interests of the other (Taylor 1976,pp. 69-73). Under these conditions,words not backed by actions are so cheap as to be meaningless.The players can communi- cate with each other only through the sequence of their own behavior.This is the problem of the Prisoner's Dilem- ma in its fundamental form. What makes it possible for cooperation to emerge is the fact that the players might meet again.This possibility means that the choices made today not only determine the outcome of this move,but can also influence the later choices of the players.The future can therefore cast a shad- ow back upon the present and thereby affect the current strategic situation. But the future is less important than the present-for two reasons.The first is that players tend to value payoffs less as the time of their obtainment recedes into the future. The second is that there is always some chance that the players will not meet again.An ongoing relationship may end when one or the other player moves away,changes jobs,dies,or goes bankrupt. For these reasons,the payoff of the next move always counts less than the payoff of the current move.A natural way to take this into account is to cumulate payoffs over time in such a way that the next move is worth some frac- 12
Introduction third parties. Thus the only information available to the players about each other is the history of their interaction so far. 3. There is no way to eliminate the other player or run away from the interaction. Therefore each player retains the ability to cooperate or defect on each move. 4. There is no way to change the other player's payoffs. The payoffs already include whatever consideration each player has for the interests of the other (Taylor 1976, pp. 69-73). Under these conditions, words not backed by actions are so cheap as to be meaningless. The players can communicate with each other only through the sequence of their own behavior. This is the problem of the Prisoner's Dilemma in its fundamental form. What makes it possible for cooperation to emerge is the fact that the players might meet again. This possibility means that the choices made today not only determine the outcome of this move, but can also influence the later choices of the players. The future can therefore cast a shadow back upon the present and thereby affect the current strategic situation. But the future is less important than the present—for two reasons. The first is that players tend to value payoffs less as the time of their obtainment recedes into the future. The second is that there is always some chance that the players will not meet again. An ongoing relationship may end when one or the other player moves away, changes jobs, dies, or goes bankrupt. For these reasons, the payoff of the next move always counts less than the payoff of the current move. A natural way to take this into account is to cumulate payoffs over time in such a way that the next move is worth some frac- 12
The Problem ofCooperation tion of the current move (Shubik 1970).The weight (or importance)of the next move relative to the current move will be called w.It represents the degree to which the pay- off of each move is discounted relative to the previous move,and is therefore a discount parameter. The discount parameter can be used to determine the payoff for a whole sequence.To take a simple example, suppose that each move is only half as important as the previous move,making w=/2.Then a whole string of mutual defections worth one point each move would have a value of I on the first move,on the second move,on the third move,and so on.The cumulative value of the sequence would be 1 +/++/s...which would sum to exactly 2.In general,getting one point on each move would be worth 1 +w+w2+w....A very useful fact is that the sum of this infinite series for any w greater than zero and less than one is simply 1/(1-w).To take another case,if each move is worth 90 percent of the previous move,a string of I's would be worth ten points because 1/(1-w)=1/(1-.9)=1/.1 =10.Similarly, with w still equal to.9,a string of 3 point mutual rewards would be worth three times this,or 30 points. Now consider an example of two players interacting. Suppose one player is following the policy of always defecting (ALL D),and the other player is following the policy of TIT FOR TAT.TIT FOR TAT is the policy of cooperating on the first move and then doing whatever the other player did on the previous move.This policy means that TIT FOR TAT will defect once after each defection of the other player.When the other player is using TIT FOR TAT,a player who always defects will get T on the first move,and P on all subsequent moves.The value (or score)to someone using ALL D when playing with some- 13
The Problem of Cooperation tion of the current move (Shubik 1970). The weight (or importance) of the next move relative to the current move will be called w. It represents the degree to which the payoff of each move is discounted relative to the previous move, and is therefore a discount parameter. The discount parameter can be used to determine the payoff for a whole sequence. To take a simple example, suppose that each move is only half as important as the previous move, making w = 1/2. Then a whole string of mutual defections worth one point each move would have a value of 1 on the first move, 1/2 on the second move, 1/4 on the third move, and so on. The cumulative value of the sequence would be 1 + 1/2 + 1/4 + 1/8 . . . which would sum to exactly 2. In general, getting one point on each move would be worth 1 + w + w 2 + w 3 . . . . A very useful fact is that the sum of this infinite series for any w greater than zero and less than one is simply 1/(1–w). To take another case, if each move is worth 90 percent of the previous move, a string of 1's would be worth ten points because l/(l–w) = 1/(1–.9) = l/.l = 10. Similarly, with w still equal to .9, a string of 3 point mutual rewards would be worth three times this, or 30 points. Now consider an example of two players interacting. Suppose one player is following the policy of always defecting (ALL D), and the other player is following the policy of TIT FOR TAT. TIT FOR TAT is the policy of cooperating on the first move and then doing whatever the other player did on the previous move. This policy means that TIT FOR TAT will defect once after each defection of the other player. When the other player is using TIT FOR TAT, a player who always defects will get T on the first move, and P on all subsequent moves. The value (or score) to someone using ALL D when playing with some- 13
Introduction one using TIT FOR TAT is thus the sum of T for the first move,wP for the second move,w'P for the third move, and so on. Both ALL D and TIT FOR TAT are strategies.In gen- eral,a strategy (or decision rule)is a specification ofwhat to do in any situation that might arise.The situation itself depends upon the history of the game so far.Therefore,a strategy might cooperate after some patterns of interaction and defect after others.Moreover,a strategy may use prob- abilities,as in the example of a rule which is entirely ran- dom with equal probabilities of cooperation and defection on each move.A strategy can also be quite sophisticated in its use of the pattern of outcomes in the game so far to determine what to do next.An example is one which,on each move,models the behavior of the other player using a complex procedure (such as a Markov process),and then uses a fancy method of statistical inference (such as Bayes- ian analysis)to select what seems the best choice for the long run.Or it may be some intricate combination of other strategies. The first question you are tempted to ask is,"What is the best strategy?"In other words,what strategy will yield a player the highest possible score?This is a good question, but as will be shown later,no best rule exists independently of the strategy being used by the other player.In this sense, the iterated Prisoner's Dilemma is completely different from a game like chess.A chess master can safely use the assumption that the other player will make the most feared move.This assumption provides a basis for planning in a game like chess,where the interests of the players are com- pletely antagonistic.But the situations represented by the Prisoner's Dilemma game are quite different.The interests of the players are not in total conflict.Both players can do 14
Introduction one using TIT FOR TAT is thus the sum of T for the first move, wP for the second move, w 2 P for the third move, and so on.4 Both ALL D and TIT FOR TAT are strategies. In general, a strategy (or decision rule) is a specification of what to do in any situation that might arise. The situation itself depends upon the history of the game so far. Therefore, a strategy might cooperate after some patterns of interaction and defect after others. Moreover, a strategy may use probabilities, as in the example of a rule which is entirely random with equal probabilities of cooperation and defection on each move. A strategy can also be quite sophisticated in its use of the pattern of outcomes in the game so far to determine what to do next. An example is one which, on each move, models the behavior of the other player using a complex procedure (such as a Markov process), and then uses a fancy method of statistical inference (such as Bayesian analysis) to select what seems the best choice for the long run. Or it may be some intricate combination of other strategies. The first question you are tempted to ask is, "What is the best strategy?" In other words, what strategy will yield a player the highest possible score? This is a good question, but as will be shown later, no best rule exists independently of the strategy being used by the other player. In this sense, the iterated Prisoner's Dilemma is completely different from a game like chess. A chess master can safely use the assumption that the other player will make the most feared move. This assumption provides a basis for planning in a game like chess, where the interests of the players are completely antagonistic. But the situations represented by the Prisoner's Dilemma game are quite different. The interests of the players are not in total conflict. Both players can do 14