28 Chapter 2 Axioms of Probability If we consider a sequence of events E1.E2.where E1=Sand Ei=fori1. then,because the events are mutually exclusive and because SEwe have from Axiom3, PS=∑PE)=P+∑PO implying that P(0)=0 That is,the null event has probability 0 of occurring hat t follows that.o ny ofmtlly exclusive event Note PUE-∑PE (3.1) all vyal is finite.(Why?)However,the added generality of Axiom 3 is necessary when the sample space consists of an infinite number of points EXAMPLE3a PID=P(TD= PU=号PT= ◆ EXAMPLE 3b cqual P2,4.6)=P2+P4+P6)= The assumption of the existence of a set function P,defined on the events of a sample space S and ving Axioms 2,and m mathemat chance and randomness.Furthermore,using these axioms we shall be able to prove resown sthe stron of arge is present Chap ter 8 In addi- rebl inepreation of probabn
28 Chapter 2 Axioms of Probability If we consider a sequence of events E1, E2, ., where E1 = S and Ei = Ø for i > 1, then, because the events are mutually exclusive and because S = q i=1 Ei, we have, from Axiom 3, P(S) = q i=1 P(Ei) = P(S) + q i=2 P(Ø) implying that P(Ø) = 0 That is, the null event has probability 0 of occurring. Note that it follows that, for any finite sequence of mutually exclusive events E1, E2, . , En, P ⎛ ⎝n 1 Ei ⎞ ⎠ = n i=1 P(Ei) (3.1) This equation follows from Axiom 3 by defining Ei as the null event for all values of i greater than n. Axiom 3 is equivalent to Equation (3.1) when the sample space is finite. (Why?) However, the added generality of Axiom 3 is necessary when the sample space consists of an infinite number of points. EXAMPLE 3a If our experiment consists of tossing a coin and if we assume that a head is as likely to appear as a tail, then we would have P({H}) = P({T}) = 1 2 On the other hand, if the coin were biased and we felt that a head were twice as likely to appear as a tail, then we would have P({H}) = 2 3 P({T}) = 1 3 . EXAMPLE 3b If a die is rolled and we suppose that all six sides are equally likely to appear, then we would have P({1}) = P({2}) = P({3}) = P({4}) = P({5}) = P({6}) = 1 6 . From Axiom 3, it would thus follow that the probability of rolling an even number would equal P({2, 4, 6}) = P({2}) + P({4}) + P({6}) = 1 2 . The assumption of the existence of a set function P, defined on the events of a sample space S and satisfying Axioms 1, 2, and 3, constitutes the modern mathematical approach to probability theory. Hopefully, the reader will agree that the axioms are natural and in accordance with our intuitive concept of probability as related to chance and randomness. Furthermore, using these axioms we shall be able to prove that if an experiment is repeated over and over again, then, with probability 1, the proportion of time during which any specific event E occurs will equal P(E). This result, known as the strong law of large numbers, is presented in Chapter 8. In addition, we present another possible interpretation of probability—as being a measure of belief—in Section 2.7
Section 2.4 Some Simple Propositions 29 二 ,as 2.4 SOME SIMPLE PROPOSITIONS have,by Axioms 2 and 3, 1=P(S)=P(E U E)=P(E)+P(E) Or,equivalently,we have Proposition 4.1. Proposition 4.1. P(E)=1-P(E h置 Proposition 4.2.If E C F,then P(E)s P(F). Proof.Since EC F.it follows that we can express Fas F=EUEF Hence,because Eand EFare mutually exclusive,we obtain,from Axiom3, P(F)=P(E)+P(EF) which proves the result,since P(EF)0. ◇ The next proposition gives the relationship between the probability of the union ed in terms of the individual probabilities,and the probability Proposition 4.3. P(E U F)P(E)+P(F)-P(EF) P(E U F)=P(EU EF) =P(E)+P(EFF) Furthermore,since F=EFU EF,we again obtain from Axiom 3 P(F)=P(EF)+P(EF)
Section 2.4 Some Simple Propositions 29 Technical Remark. We have supposed that P(E) is defined for all the events E of the sample space. Actually, when the sample space is an uncountably infinite set, P(E) is defined only for a class of events called measurable. However, this restriction need not concern us, as all events of any practical interest are measurable. 2.4 SOME SIMPLE PROPOSITIONS In this section, we prove some simple propositions regarding probabilities. We first note that, since E and Ec are always mutually exclusive and since E ∪ Ec = S, we have, by Axioms 2 and 3, 1 = P(S) = P(E ∪ Ec ) = P(E) + P(Ec ) Or, equivalently, we have Proposition 4.1. Proposition 4.1. P(Ec ) = 1 − P(E) In words, Proposition 4.1 states that the probability that an event does not occur is 1 minus the probability that it does occur. For instance, if the probability of obtaining a head on the toss of a coin is 3 8 , then the probability of obtaining a tail must be 5 8 . Our second proposition states that if the event E is contained in the event F, then the probability of E is no greater than the probability of F. Proposition 4.2. If E ( F, then P(E) . P(F). Proof. Since E ( F, it follows that we can express F as F = E ∪ Ec F Hence, because E and EcF are mutually exclusive, we obtain, from Axiom 3, P(F) = P(E) + P(Ec F) which proves the result, since P(EcF) Ú 0. Proposition 4.2 tells us, for instance, that the probability of rolling a 1 with a die is less than or equal to the probability of rolling an odd value with the die. The next proposition gives the relationship between the probability of the union of two events, expressed in terms of the individual probabilities, and the probability of the intersection of the events. Proposition 4.3. P(E ∪ F) = P(E) + P(F) − P(EF) Proof. To derive a formula for P(E ∪ F), we first note that E ∪ F can be written as the union of the two disjoint events E and EcF. Thus, from Axiom 3, we obtain P(E ∪ F) = P(E ∪ Ec F) = P(E) + P(Ec F) Furthermore, since F = EF ∪ EcF, we again obtain from Axiom 3 P(F) = P(EF) + P(Ec F)
30 Chapter 2 Axioms of Probability FIGURE 2.4:Venn Diagram FIGURE 2.5:Venn Diagram in Sections or,equivalently. P(EF)=P(F)-P(EF) thereby completing the proof Proposition 4.3 could also have been proved by making use of the Venn diagram In tet us pont oth(hciepre EUF=IU IIU IⅢ E=IUⅡ F=Ⅱu As I,II,and III are mutually exclusive,it follows from Axiom3that P(E U F)P(I)+P(II)+P(III) P(E)=P(D+POD P(F)P(I)P(I) which shows that P(EU F=P(E)+P(F)-P(I and Proposition 4.3 is proved,since II=EF. EXAMPLE4a J is taking two books along on her holiday vacation.With probability.5.she will like
30 Chapter 2 Axioms of Probability E F FIGURE 2.4: Venn Diagram E F I III II FIGURE 2.5: Venn Diagram in Sections or, equivalently, P(Ec F) = P(F) − P(EF) thereby completing the proof. Proposition 4.3 could also have been proved by making use of the Venn diagram in Figure 2.4. Let us divide E ∪ F into three mutually exclusive sections, as shown in Figure 2.5. In words, section I represents all the points in E that are not in F (that is, EFc), section II represents all points both in E and in F (that is, EF), and section III represents all points in F that are not in E (that is, EcF). From Figure 2.5, we see that E ∪ F = I ∪ II ∪ III E = I ∪ II F = II ∪ III As I, II, and III are mutually exclusive, it follows from Axiom 3 that P(E ∪ F) = P(I) + P(II) + P(III) P(E) = P(I) + P(II) P(F) = P(II) + P(III) which shows that P(E ∪ F) = P(E) + P(F) − P(II) and Proposition 4.3 is proved, since II = EF. EXAMPLE 4a J is taking two books along on her holiday vacation. With probability .5, she will like the first book; with probability .4, she will like the second book; and with probability .3, she will like both books. What is the probability that she likes neither book?
Section 2.4 Some Simple Propositions 31 Solution.Let B:denote the event that I likes book ii=1.2.Then the probability that she likes at least one of the books is P(B1UB2)=P(B1)+P(B2)-P(B1B2)=5+4-3=6 P(BB)=P(B1UB2)9)=1-PB1UB2)=4 P(EUFUG)=P[(EUF)UG] which,by Proposition 4.3,equals P(E UF)+P(G)-P[(E U F)G] it follows f ta low that the nts (EU F)G and EG U FG are P(EUFUG P(E)+P(F)-P(EF)+P(G)P(EG U FG) P(E)+P(F)-P(EF)P(G)-P(EG)-P(FG)+P(EGFG) P(E)+P(F)+P(G)-P(EF)-P(EG)-P(FG)+P(EFG) be poch position.known as the inclusion-exclusion identity.can Proposition 4.4. PEU EU.U En=∑PE)-∑PE,E)+ +(-1*1 P(EE·E) +.+(-1)n+1P(E1E2.Em) The summation PEEh.E,)is taken over all of the)possible sub. sets of sire r of the e. In words,Prop osition 4.4 states that the bability of the union of n events e the sum of the probabilities of these events taken one at a time,minus the sum of the ethe mthe po probabilities o Rematcoranoait amcrtforVPctiCn44hita does not contribute anything to either side of the equality.Now,suppose that an outcome is in exactly m of the events Ei,where m>0.Then,since it is in E,its
Section 2.4 Some Simple Propositions 31 Solution. Let Bi denote the event that J likes book i, i = 1, 2. Then the probability that she likes at least one of the books is P(B1 ∪ B2) = P(B1) + P(B2) − P(B1B2) = .5 + .4 − .3 = .6 Because the event that J likes neither book is the complement of the event that she likes at least one of them, we obtain the result P(Bc 1Bc 2) = P (B1 ∪ B2) c = 1 − P(B1 ∪ B2) = .4 . We may also calculate the probability that any one of the three events E, F, and G occurs, namely, P(E ∪ F ∪ G) = P[(E ∪ F) ∪ G] which, by Proposition 4.3, equals P(E ∪ F) + P(G) − P[(E ∪ F)G] Now, it follows from the distributive law that the events (E ∪ F)G and EG ∪ FG are equivalent; hence, from the preceding equations, we obtain P(E ∪ F ∪ G) = P(E) + P(F) − P(EF) + P(G) − P(EG ∪ FG) = P(E) + P(F) − P(EF) + P(G) − P(EG) − P(FG) + P(EGFG) = P(E) + P(F) + P(G) − P(EF) − P(EG) − P(FG) + P(EFG) In fact, the following proposition, known as the inclusion–exclusion identity, can be proved by mathematical induction: Proposition 4.4. P(E1 ∪ E2 ∪ ··· ∪ En) = n i=1 P(Ei) − i1<i2 P(Ei1Ei2 ) + ··· + (−1) r+1 i1<i2<···<ir P(Ei1Ei2 ··· Eir) + ··· + (−1) n+1P(E1E2 ··· En) The summation i1<i2<···<ir P(Ei1Ei2 ··· Eir) is taken over all of the n r possible subsets of size r of the set {1, 2, . , n}. In words, Proposition 4.4 states that the probability of the union of n events equals the sum of the probabilities of these events taken one at a time, minus the sum of the probabilities of these events taken two at a time, plus the sum of the probabilities of these events taken three at a time, and so on. Remarks. 1. For a noninductive argument for Proposition 4.4, note first that if an outcome of the sample space is not a member of any of the sets Ei, then its probability does not contribute anything to either side of the equality. Now, suppose that an outcome is in exactly m of the events Ei, where m > 0. Then, since it is in i Ei, its
32 Chapter 2 Axioms of Probability probability is counted once inPasoa this outcome is contained in subsets of the typeEEits probability is counted (T)-())+()-.±(m) 1=()-(2)+(3)-.±(m) However.since1.the preceding equation is equivalentto ()-w= and the latter equation follows from the binomial theorem,since 0=-1+=())-m 2.The following is a succinct way of writing the inclusion-exclusion identity: PUE=∑-1∑PE.E,) probability.going out three terms results n an upper bound on the probability.gon out four terms results in a lower bound,and so on.That is,for events E.En we have PUE)s∑PE) (4.1) PUIE)≥∑PE)-∑P(EE) (4.2) (4.3) j< and so on.To prove the validity of these bounds,note the identity U%1E=E1UEEUEE5E3U.UE.E所-1E
32 Chapter 2 Axioms of Probability probability is counted once in P i Ei ; also, as this outcome is contained in m k subsets of the type Ei1Ei2 ··· Eik , its probability is counted m 1 − m 2 + m 3 − ··· ; m m times on the right of the equality sign in Proposition 4.4. Thus, for m > 0, we must show that 1 = m 1 − m 2 + m 3 − ··· ; m m However, since 1 = m 0 , the preceding equation is equivalent to m i=0 m i (−1) i = 0 and the latter equation follows from the binomial theorem, since 0 = (−1 + 1) m = m i=0 m i (−1) i (1) m−i 2. The following is a succinct way of writing the inclusion–exclusion identity: P(∪n i=1Ei) = n r=1 (−1) r+1 i1<···<ir P(Ei1 ··· Eir) 3. In the inclusion–exclusion identity, going out one term results in an upper bound on the probability of the union, going out two terms results in a lower bound on the probability, going out three terms results in an upper bound on the probability, going out four terms results in a lower bound, and so on. That is, for events E1, . , En, we have P(∪n i=1Ei) . n i=1 P(Ei) (4.1) P(∪n i=1Ei) Ú n i=1 P(Ei) − j<i P(EiEj) (4.2) P(∪n i=1Ei) . n i=1 P(Ei) − j<i P(EiEj) + k<j<i P(EiEjEk) (4.3) and so on. To prove the validity of these bounds, note the identity ∪n i=1Ei = E1 ∪ Ec 1E2 ∪ Ec 1Ec 2E3 ∪ ··· ∪ Ec 1 ··· Ec n−1En