equilibriun At the beginning of each date t, agents choose actions simultaneously. Then each agent i observes the actions ait chosen by the agents j E Ni and updates his beliefs accordingly. Agent is information set at date t consists of his signal oi( w) and the history of actions ais: jE Ni, s<t-1 Agent i chooses the action ait to maximize the expectation of his short-run payoff U(ait, w) conditional on the information available An agents behavior can be described more formally as follows. Agent i's choice of action at date t is described by a random variable Xit(w) and his information at date t is described by a a-field Fit. Since the agents choice can only depend on the information available to him, Xit must be measurable with respect to Fit. Since Fit represents the agent's information at date t, it must be the a-field generated by the random variables g: and 3 N 1. Note that there is no need to condition explicitly on agent is past actions because they are functions of the past actions of agents j E Ni and the signal oi(w). Finally, since it is optimal, there cannot be any other Fit-measurable choice function that yields a higher expected utility. These are the essential elements of our definition of equilibrium, as stated below DEFINITION 1. A weak perfect Bayesian equilibrium consists of a se- quence of random variables (Xit and o-fields Fit such that for each (i) Xit: Q2- A is Fit-measurable (i)Fit=F(oi, [X,s:jE Ni]s=1),and (ii) EU(w),w)s EU(Xit(w), w). for any Fit-measurable function 3x:g2→A Note that our definition of equilibrium does not require optimality"off the equilibrium path". This entails no essential loss of generality as long as it is assumed that the actions of a single agent, who is of measure zero are not observed by other players. Then a deviation by a single agent has no effect on the subsequent decisions of other agents 3. LEARNING WITH TWO(REPRESENTATIVE)AGENTS AND TWO ACTIONS To fix ideas and illustrate the workings of the basic model. we first onsider the special case of two representative agents, A and B, and two actions,0 and 1. There are three graphs, besides the empty graph M ()NA={B},NB={4} ( i)NA=(B, NB=0 必,NB={4}
Equilibrium At the beginning of each date t, agents choose actions simultaneously. Then each agent i observes the actions ajt chosen by the agents j ∈ Ni and updates his beliefs accordingly. Agent i’s information set at date t consists of his signal σi(ω) and the history of actions {ajs : j ∈ Ni, s ≤ t − 1}. Agent i chooses the action ait to maximize the expectation of his short-run payoff U(ait, ω) conditional on the information available. An agent’s behavior can be described more formally as follows. Agent i’s choice of action at date t is described by a random variable Xit(ω) and his information at date t is described by a σ-field Fit. Since the agent’s choice can only depend on the information available to him, Xit must be measurable with respect to Fit. Since Fit represents the agent’s information at date t, it must be the σ-field generated by the random variables σi and {Xjs : j ∈ Ni, s ≤ t − 1}. Note that there is no need to condition explicitly on agent i’s past actions because they are functions of the past actions of agents j ∈ Ni and the signal σi(ω). Finally, since Xit is optimal, there cannot be any other Fit-measurable choice function that yields a higher expected utility. These are the essential elements of our definition of equilibrium, as stated below. Definition 1. A weak perfect Bayesian equilibrium consists of a sequence of random variables {Xit} and σ-fields {Fit} such that for each i = 1, ..., n and t = 1, 2, ..., (i) Xit : Ω → A is Fit-measurable, (ii) Fit = F ¡ σi, {Xjs : j ∈ Ni}t−1 s=1¢ , and (iii) E[U(x(ω), ω)] ≤ E[U(Xit(ω), ω)], for any Fit-measurable function x : Ω → A. Note that our definition of equilibrium does not require optimality “off the equilibrium path”. This entails no essential loss of generality as long as it is assumed that the actions of a single agent, who is of measure zero, are not observed by other players. Then a deviation by a single agent has no effect on the subsequent decisions of other agents. 3. LEARNING WITH TWO (REPRESENTATIVE) AGENTS AND TWO ACTIONS To fix ideas and illustrate the workings of the basic model, we first consider the special case of two representative agents, A and B, and two actions, 0 and 1. There are three graphs, besides the empty graph NA = NB = ∅, (i) NA = {B}, NB = {A}; (ii) NA = {B}, NB = ∅; (iii) NA = ∅, NB = {A}. 6
ases (ii) and(iii) are uninteresting because there is no possibility of mu- ual learning. For example, in case(i), agent B observes a private signal and chooses the optimal action at date 1. Since he observes no further in- formation, he chooses the same action at every subsequent date. Agent A observes a private signal and chooses the optimal action at date 1. At date 2. he observes agent B's action at date 1. updates his beliefs and chooses the new optimal action at date 2. After that, A receives no additional information, so agent A chooses the same action at every subsequent date. Agent A has learned something from agent B, but that is as far as it goes In case(i), on the other hand, the two agents learn from each other and learning can continue for an unbounded number of periods. We focus on the network defined in(i) in what follows. For simplicity, we consider a special information and payoff structure. We assume that Q2=SA QB, where Q; is an interval a, b and the generic element is w=(wA, WB). The signals are assumed to satisfy =A,B, where the random variables w a and wb are independently and continuously distributed. that is, P= PA x PB and Pi has no atoms. There are two actions a=0, 1 and the payoff function is assumed to satisfy 0 ifa=0 where the function U(wA, wB) is assumed to be a continuous and increasing function. To avoid trivialities we assume that neither action is weakly These assumptions are sufficient for the optimal strategy to have the form of a cutoff rule. To see this, note that for any history that occurs with positive probability, agent is beliefs at date t take the form of an event Wwil x Bit, where the true value of wi is known to belong to Bit. Then the payoff to action 1 is pi wi, Bit)=eu(wA, wBwi x Bit. Clearly, Pi wi, Bit)is increasing in wi, because the distribution of w, is independent of wi, so there exists a cutoff wi (Bit)such that >(Bt)→→y;(,Bt)>0, 1<(Bt)=1(u,Bjt)<0 We assume that when an agent is indifferent between two actions, he chooses action 1. The analysis is essentially the same for any other the tie-breaking rule. The fact that agent is strategy takes the form of a cutoff rule implies that the set Bit is an interval. This can be proved by induction as follows At date 1, agent j has a cutoff w, and Xi1(w)=l if and only if wj2w;
Cases (ii) and (iii) are uninteresting because there is no possibility of mutual learning. For example, in case (ii), agent B observes a private signal and chooses the optimal action at date 1. Since he observes no further information, he chooses the same action at every subsequent date. Agent A observes a private signal and chooses the optimal action at date 1. At date 2, he observes agent B’s action at date 1, updates his beliefs and chooses the new optimal action at date 2. After that, A receives no additional information, so agent A chooses the same action at every subsequent date. Agent A has learned something from agent B, but that is as far as it goes. In case (i), on the other hand, the two agents learn from each other and learning can continue for an unbounded number of periods. We focus on the network defined in (i) in what follows. For simplicity, we consider a special information and payoff structure. We assume that Ω = ΩA ×ΩB , where Ωi is an interval [a, b] and the generic element is ω = (ωA, ωB). The signals are assumed to satisfy σi(ω) = ωi, ∀ω ∈ Ω, i = A, B, where the random variables ωA and ωB are independently and continuously distributed, that is, P = PA × PB and Pi has no atoms. There are two actions a = 0, 1 and the payoff function is assumed to satisfy u(a, ω) = ½ 0 if a = 0 U(ω) if a = 1, where the function U(ωA, ωB) is assumed to be a continuous and increasing function. To avoid trivialities we assume that neither action is weakly dominated. These assumptions are sufficient for the optimal strategy to have the form of a cutoff rule. To see this, note that for any history that occurs with positive probability, agent i’s beliefs at date t take the form of an event {ωi} × Bjt, where the true value of ωj is known to belong to Bjt. Then the payoff to action 1 is ϕi(ωi, Bjt) = E[U(ωA, ωB)|{ωi} × Bjt}. Clearly, ϕi(ωi, Bjt) is increasing in ωi, because the distribution of ωj is independent of ωi, so there exists a cutoff ω∗ i (Bjt) such that ωi > ω∗ i (Bjt) =⇒ ϕi(ωi, Bjt) > 0, ωi < ω∗ i (Bjt) =⇒ ϕi(ωi, Bjt) < 0. We assume that when an agent is indifferent between two actions, he chooses action 1. The analysis is essentially the same for any other the tie-breaking rule. The fact that agent i’s strategy takes the form of a cutoff rule implies that the set Bit is an interval. This can be proved by induction as follows. At date 1, agent j has a cutoff ω∗ j1 and Xj1(ω)=1 if and only if ωj ≥ ω∗ j1. 7
Then at date 2 agent i will know that the true value of wj belongs to B,(a) B2(u) wi1,b if Xil(w)=1, a,wi) if X1(w)=0 Now suppose that at some date t, the information set Bit(w)c la, b is an interval and agent j's cutoff is wit (Bit(w). Then at date t+l, agent i knows that w, belongs to Bit+1(a), where Bit+1 w) Bit(w)nwit(Bit(w)),b if Xit(w)=1 Bit(w)n[a, wit Bit(w))) if X;t() Clearly, Bit+1(w) is also an interval. Hence, by induction, Bit(w) is an interval for all t and the common knowledge at date t is Bt w)= Batu)x BBt(w). By construction, w E B++1w)CBw)for every t. Then Btw) B(w)=n,.) and (B(w): wES defines a partition of Q2. Note that u∈B(u)soB(u)≠ In the limit, when all learning has ceased, agent A know gs to a set BB() and agent B knows that wa belongs to BA(w). Fur thermore. because the actions chosen at each date are common knowledge the sets Ba() and BB(w) are common knowledge An interesting question is whether, given their information in the limit the two agents must agree which action is best. In the two-person case, we can show directly that both agents must eventually agree, in the sense that hey choose different actions only if they are indifferent. The proof is by contradiction. Suppose, contrary to what we want to prove, that for some B and every w such that B()=B EU(uA,uB){uA}×BB]>0 EU(uA,B)BA×{uB<0. Then the same actions must be optimal for every element in the information et(otherwise, more information would be revealed)and this implies EU(A,B川{gA}×BB]≥0 EU(A,B)BA×{BH≤0 here wA= inf BA(w)and WB=sup BB(w). Then U(c4,四B)≥0 U(a4,DB)≤0
Then at date 2 agent i will know that the true value of ωj belongs to Bj (ω), where Bj2(ω) = ½ [ω∗ j1, b] if Xj1(ω)=1, [a, ω∗ j1) if Xj1(ω)=0. Now suppose that at some date t, the information set Bjt(ω) ⊆ [a, b] is an interval and agent j’s cutoff is ω∗ jt(Bit(ω)). Then at date t + 1, agent i knows that ωj belongs to Bjt+1(ω), where Bjt+1(ω) = ½ Bjt(ω) ∩ [ω∗ jt(Bit(ω)), b] if Xjt(ω)=1, Bjt(ω) ∩ [a, ω∗ jt(Bit(ω))) if Xjt(ω)=0. Clearly, Bjt+1(ω) is also an interval. Hence, by induction, Bit(ω) is an interval for all t and the common knowledge at date t is Bt(ω) = BAt(ω)× BBt(ω). By construction, ω ∈ Bt+1(ω) ⊆ Bt(ω) for every t. Then Bt(ω) & B(ω) = ∩∞ t=1Bt(ω) and {B(ω) : ω ∈ Ω} defines a partition of Ω. Note that ω ∈ B(ω) so B(ω) 6= ∅. In the limit, when all learning has ceased, agent A knows that ωB belongs to a set BB(ω) and agent B knows that ωA belongs to BA(ω). Furthermore, because the actions chosen at each date are common knowledge, the sets BA(ω) and BB(ω) are common knowledge. An interesting question is whether, given their information in the limit, the two agents must agree which action is best. In the two-person case, we can show directly that both agents must eventually agree, in the sense that they choose different actions only if they are indifferent. The proof is by contradiction. Suppose, contrary to what we want to prove, that for some B and every ω such that B(ω) = B, E[U(ωA, ωB)|{ωA} × BB] > 0 and E[U(ωA, ωB)|BA × {ωB}] < 0. Then the same actions must be optimal for every element in the information set (otherwise, more information would be revealed) and this implies E[U(ωA, ωB)|{ωA} × BB] ≥ 0 and E[U(ωA, ωB)|BA × {ωB}] ≤ 0, where ωA = inf BA(ω) and ωB = sup BB(ω). Then U(ωA, ωB) ≥ 0 and U(ωA, ωB) ≤ 0, 8