Section2.3 Axioms of Probability 27 That is,P(E)is defined as the(limiting)proportion of time that E occurs.It is thus the limiting fre quency of e Although the eding definition is certainly intuitively pleasing and should alway be kept in mind by the reader,it poss s a serious drawhack.Ho w do we know that n(E)/n will converge to some co nstant limiting value that will be the same for each possible sequence of repetitions of the experiment?For example,suppose that the experiment to be repeatedly performed consists of flipping a coin.How do we know that the proportion of heads obtained in the first n flips will converge to some value as n gets large?Also,even if it does converge to some value,how do we know that, if the experiment is repeatedly performed a second time,we shall obtain the same imiting proportion of heads? Proponents of the relative frequency definition of probability usually answer this objection by stating that the convergence of n(E)/n to a constant limiting val lue is an assumption,or an axiom,of the system.However,to assume t t n(E)/n will nece sarily converge to some raordi noug ght i 0 at a ld it all seem t dent pr y an d th tant lin ist?Th shall ac we shall asepp e that fo each event fin the sa P(FY referred to as the probability of E.We shall then as me that all thes probabilities satisfy a certain set of axioms,which,we hope the reader will agree,is in accordance with our intuitive notion of probability Consider an experiment whose san ple space is S.For each event E of the sample space S,we assume that a number P(E)is defined and satisfies the following three axioms: Axiom 1 0≤P(E≤1 Axiom 2 P(S)=1 Axiom 3 For any sequence of mutually exclusive events E1,E2,...(that is,events for which EiE=☑when i≠. P =1 =1 We refer to P(E)as the probability of the event E. eIstates that the probability that the outcome of thep ween 0 an states tnat,wit probability sequence ve events,the pro occurring is just the sum c of th e respective proba y of at least one of these events
Section 2.3 Axioms of Probability 27 That is, P(E) is defined as the (limiting) proportion of time that E occurs. It is thus the limiting frequency of E. Although the preceding definition is certainly intuitively pleasing and should always be kept in mind by the reader, it possesses a serious drawback: How do we know that n(E)/n will converge to some constant limiting value that will be the same for each possible sequence of repetitions of the experiment? For example, suppose that the experiment to be repeatedly performed consists of flipping a coin. How do we know that the proportion of heads obtained in the first n flips will converge to some value as n gets large? Also, even if it does converge to some value, how do we know that, if the experiment is repeatedly performed a second time, we shall obtain the same limiting proportion of heads? Proponents of the relative frequency definition of probability usually answer this objection by stating that the convergence of n(E)/n to a constant limiting value is an assumption, or an axiom, of the system. However, to assume that n(E)/n will necessarily converge to some constant value seems to be an extraordinarily complicated assumption. For, although we might indeed hope that such a constant limiting frequency exists, it does not at all seem to be a priori evident that this need be the case. In fact, would it not be more reasonable to assume a set of simpler and more self-evident axioms about probability and then attempt to prove that such a constant limiting frequency does in some sense exist? The latter approach is the modern axiomatic approach to probability theory that we shall adopt in this text. In particular, we shall assume that, for each event E in the sample space S, there exists a value P(E), referred to as the probability of E. We shall then assume that all these probabilities satisfy a certain set of axioms, which, we hope the reader will agree, is in accordance with our intuitive notion of probability. Consider an experiment whose sample space is S. For each event E of the sample space S, we assume that a number P(E) is defined and satisfies the following three axioms: Axiom 1 0 … P(E) … 1 Axiom 2 P(S) = 1 Axiom 3 For any sequence of mutually exclusive events E1, E2, ... (that is, events for which EiEj = Ø when i Z j), P ⎛ ⎝q i=1 Ei ⎞ ⎠ = q i=1 P(Ei) We refer to P(E) as the probability of the event E. Thus, Axiom 1 states that the probability that the outcome of the experiment is an outcome in E is some number between 0 and 1. Axiom 2 states that, with probability 1, the outcome will be a point in the sample space S. Axiom 3 states that, for any sequence of mutually exclusive events, the probability of at least one of these events occurring is just the sum of their respective probabilities
28 Chapter 2 Axioms of Probability If we consider a sequence of events E1.E2.....where E1=S and Ei=for i>1. then,because the events are mutually exclusive and because SE we have.from Axiom 3. PS=P(E)=PS+∑P(O) -1 -2 implying that P(0)=0 That is,the null eve Note for any finite sequence of mutually exclusive events E1 E2.....En. PUE-∑PE (3.1) This from Axiom 3 by defining Ei as the null eve nt for all values Axiom cnt to en the h sample y an f points EXAMPLE 3a If our experiment consists of tossing a coin and if we assume that a head is as likely to appear as a tail,then we would have P(HD)P((T])= On the other hand,if the coin were biased and we felt that a head were twice as likely to appear as a tail,then we would have 2 PH)=PT)=5 ◆ EXAMPLE3b and we suppose that all six sides are equally likely to appear,then P(4)=P(5)=P(6)=From Axiom3.it would thus follow that the probability of rollingn eveneould equa P2,4,6)=P2+P4+PI6)= exist A ms12n】3 onstitutes the ical approach to r robability the y.Hopefully,the eader will a ee that the axioms are natural and in accordance with our intuitive conc ept of probability as related to chance and randomness.Furthermore,using these axioms we shall be able to prove that if an experiment is repeated over and over again then with probability 1 the proportion of time during which any specific event E occurs will equal P(E).This result,known as the strong law of large numbers,is presented in Chapter 8.In addi- tion,we present another possible interpretation of probability-as being a measure of belief-in Section 2.7
28 Chapter 2 Axioms of Probability If we consider a sequence of events E1, E2, ..., where E1 = S and Ei = Ø for i > 1, then, because the events are mutually exclusive and because S = q i=1 Ei, we have, from Axiom 3, P(S) = q i=1 P(Ei) = P(S) + q i=2 P(Ø) implying that P(Ø) = 0 That is, the null event has probability 0 of occurring. Note that it follows that, for any finite sequence of mutually exclusive events E1, E2, ... , En, P ⎛ ⎝n 1 Ei ⎞ ⎠ = n i=1 P(Ei) (3.1) This equation follows from Axiom 3 by defining Ei as the null event for all values of i greater than n. Axiom 3 is equivalent to Equation (3.1) when the sample space is finite. (Why?) However, the added generality of Axiom 3 is necessary when the sample space consists of an infinite number of points. EXAMPLE 3a If our experiment consists of tossing a coin and if we assume that a head is as likely to appear as a tail, then we would have P({H}) = P({T}) = 1 2 On the other hand, if the coin were biased and we felt that a head were twice as likely to appear as a tail, then we would have P({H}) = 2 3 P({T}) = 1 3 . EXAMPLE 3b If a die is rolled and we suppose that all six sides are equally likely to appear, then we would have P({1}) = P({2}) = P({3}) = P({4}) = P({5}) = P({6}) = 1 6 . From Axiom 3, it would thus follow that the probability of rolling an even number would equal P({2, 4, 6}) = P({2}) + P({4}) + P({6}) = 1 2 . The assumption of the existence of a set function P, defined on the events of a sample space S and satisfying Axioms 1, 2, and 3, constitutes the modern mathematical approach to probability theory. Hopefully, the reader will agree that the axioms are natural and in accordance with our intuitive concept of probability as related to chance and randomness. Furthermore, using these axioms we shall be able to prove that if an experiment is repeated over and over again, then, with probability 1, the proportion of time during which any specific event E occurs will equal P(E). This result, known as the strong law of large numbers, is presented in Chapter 8. In addition, we present another possible interpretation of probability—as being a measure of belief—in Section 2.7
Section 2.4 Some Simple Propositions 29 Technical Remark.We hav osed that P(E)is defined for all the events E of the sa ce.Actually when the sam ce is an et P(E)is defin ned only for a class of events called m asurable.However.this restriction need not concern us,as all events of any practical interest are meas rable 2.4 SOME SIMPLE PROPOSITIONS In this section,we prove some simple propositions regarding probabilities.We first note that,since E and E are always mutually exclusive and since E UE S,we have.by Axioms 2 and 3. 1=P(S)=P(E UE)=P(E)+P(E) Or,equivalently,we have Proposition 4.1. Proposition 4.1. P(E)=1-P(E In words.Proposition 4.1 states that the probability that an event does not occur is mins the probability that it docsoinsane if the probability of obtaining toss of then the nr robability of ob Our s ond n sition states that if the the probability of E is no greater than the probability of F. Proposition 4.2.If E C F,then P(E)P(F). Proof.Since E C F,it follows that we can express F as F=EUEF Hence,because E and EF are mutually exclusive,we obtain,from Axiom3, P(F)=P(E)+P(EF) which proves the result,since P(EF)0. lee ettion get te? bability of lity of expre d in te Proposition 4.3. P(E UF)=P(E)+P(F)-P(EF) Pro ula for P(E UF),w ote that EU Fcan be written I hus,fom Axiom 3,we obtain P(E U F)=P(E UEF) P(E)+P(EF) Furthermore,since F=EF U EF,we again obtain from Axiom3 P(F)=P(EF)+P(EF)
Section 2.4 Some Simple Propositions 29 Technical Remark. We have supposed that P(E) is defined for all the events E of the sample space. Actually, when the sample space is an uncountably infinite set, P(E) is defined only for a class of events called measurable. However, this restriction need not concern us, as all events of any practical interest are measurable. 2.4 SOME SIMPLE PROPOSITIONS In this section, we prove some simple propositions regarding probabilities. We first note that, since E and Ec are always mutually exclusive and since E ∪ Ec = S, we have, by Axioms 2 and 3, 1 = P(S) = P(E ∪ Ec ) = P(E) + P(Ec ) Or, equivalently, we have Proposition 4.1. Proposition 4.1. P(Ec ) = 1 − P(E) In words, Proposition 4.1 states that the probability that an event does not occur is 1 minus the probability that it does occur. For instance, if the probability of obtaining a head on the toss of a coin is 3 8 , then the probability of obtaining a tail must be 5 8 . Our second proposition states that if the event E is contained in the event F, then the probability of E is no greater than the probability of F. Proposition 4.2. If E ( F, then P(E) … P(F). Proof. Since E ( F, it follows that we can express F as F = E ∪ Ec F Hence, because E and EcF are mutually exclusive, we obtain, from Axiom 3, P(F) = P(E) + P(Ec F) which proves the result, since P(EcF) Ú 0. Proposition 4.2 tells us, for instance, that the probability of rolling a 1 with a die is less than or equal to the probability of rolling an odd value with the die. The next proposition gives the relationship between the probability of the union of two events, expressed in terms of the individual probabilities, and the probability of the intersection of the events. Proposition 4.3. P(E ∪ F) = P(E) + P(F) − P(EF) Proof. To derive a formula for P(E ∪ F), we first note that E ∪ F can be written as the union of the two disjoint events E and EcF. Thus, from Axiom 3, we obtain P(E ∪ F) = P(E ∪ Ec F) = P(E) + P(Ec F) Furthermore, since F = EF ∪ EcF, we again obtain from Axiom 3 P(F) = P(EF) + P(Ec F)
30 Chapter 2 Axioms of Probability FIGURE 2.4:Venn Diagram FIGURE 2.:Venn Diagram in Sections or,equivalently. P(EF)=P(F)-P(EF) thereby completing the proof. ◇ Proposition 4.3 could also have been proved by making use of the Venn diagram in Figure 2.4. Let us divide E U F into three mutually exclusive sections,as shown in Figure 2.5. In words,section I represents all the points in E that are not in F(that is,EF), section II represents all points both in E and in F(that is,EF),and section III repre- sents all points in F that are not in E (that is,EF). From Figure 2.5.we see that EUF=IUIUⅢ E=IUI F=IIU III As I,II,and III are mutually exclusive,it follows from Axiom 3 that PEUF)=P④+PID+PIII) P(E)=P(I)+P(ID) P(F)=P(II)P(III) which shows that P(E U F)=P(E)+P(F)-P(II) and Proposition 4.3 is proved,since II=EF. EXAMPLE 4a J is taking two books along on her holiday vacation.With probability.5,she will like the first book;with probability.4,she will like the second book;and with probabil- ity.3,she will like both books.What is the probability that she likes neither book?
30 Chapter 2 Axioms of Probability E F FIGURE 2.4: Venn Diagram E F I III II FIGURE 2.5: Venn Diagram in Sections or, equivalently, P(Ec F) = P(F) − P(EF) thereby completing the proof. Proposition 4.3 could also have been proved by making use of the Venn diagram in Figure 2.4. Let us divide E ∪ F into three mutually exclusive sections, as shown in Figure 2.5. In words, section I represents all the points in E that are not in F (that is, EFc), section II represents all points both in E and in F (that is, EF), and section III represents all points in F that are not in E (that is, EcF). From Figure 2.5, we see that E ∪ F = I ∪ II ∪ III E = I ∪ II F = II ∪ III As I, II, and III are mutually exclusive, it follows from Axiom 3 that P(E ∪ F) = P(I) + P(II) + P(III) P(E) = P(I) + P(II) P(F) = P(II) + P(III) which shows that P(E ∪ F) = P(E) + P(F) − P(II) and Proposition 4.3 is proved, since II = EF. EXAMPLE 4a J is taking two books along on her holiday vacation. With probability .5, she will like the first book; with probability .4, she will like the second book; and with probability .3, she will like both books. What is the probability that she likes neither book?
Section 2.4 Some Simple Propositions 31 ion.Let Bdenote the event that ]likes book i1.2.Then the probability that she likes at least ne of the books is P(B1UB2)=P(B1)+P(B2)-P(B1B2)=.5+4-3=.6 P(B BS)=P((B1 U B2))=1-P(B1 UB2)=.4 We may also calculate the probability that any one of the three events E,F,and G occurs.namely. P(EUFUG)=P[(EUFUG] which,by Proposition 4.3,equals P(EU F)+P(G)-P[(E U F)G] Now,it follows from the distributive law that the events(E U F)G and EG U FG are equivalent;hence,from the preceding equations,we obtain P(EUFUG) =P(E)+P(F)-P(EF)+P(G)-P(EG U FG =P(E)+P(F)-P(EF)+P(G)-P(EG)-P(FG)+P(EGFG) P(E)+P(F)+P(G)-P(EF)-P(EG)-P(FG)+P(EFG) In fact,the following proposition,known as the inclusion-exclusion identity,can be proved by mathematical induction: Proposition 4.4. PEUE2U…UEn)=∑P(E)-∑P(E,E)+ +(-1)+1 P(EE2…E,) +.+(-1)+1P(E1E2…En) The summation∑ PEEa…E))is taken over all of the()pssible sub- sets of size r of the set (1.2..... of th robabilities nts taken tw mof the probabilities these events taken thre at a time,and so Remarks 1.For a noninductive argument for Proposition 4.4,note first that if an outcome of the sample space is not a nem er of any of the sets Ei,then its probability does not contribute anythi g to eithe 10 a an
Section 2.4 Some Simple Propositions 31 Solution. Let Bi denote the event that J likes book i, i = 1, 2. Then the probability that she likes at least one of the books is P(B1 ∪ B2) = P(B1) + P(B2) − P(B1B2) = .5 + .4 − .3 = .6 Because the event that J likes neither book is the complement of the event that she likes at least one of them, we obtain the result P(Bc 1Bc 2) = P (B1 ∪ B2) c = 1 − P(B1 ∪ B2) = .4 . We may also calculate the probability that any one of the three events E, F, and G occurs, namely, P(E ∪ F ∪ G) = P[(E ∪ F) ∪ G] which, by Proposition 4.3, equals P(E ∪ F) + P(G) − P[(E ∪ F)G] Now, it follows from the distributive law that the events (E ∪ F)G and EG ∪ FG are equivalent; hence, from the preceding equations, we obtain P(E ∪ F ∪ G) = P(E) + P(F) − P(EF) + P(G) − P(EG ∪ FG) = P(E) + P(F) − P(EF) + P(G) − P(EG) − P(FG) + P(EGFG) = P(E) + P(F) + P(G) − P(EF) − P(EG) − P(FG) + P(EFG) In fact, the following proposition, known as the inclusion–exclusion identity, can be proved by mathematical induction: Proposition 4.4. P(E1 ∪ E2 ∪ ··· ∪ En) = n i=1 P(Ei) − i1<i2 P(Ei1Ei2 ) + ··· + (−1) r+1 i1<i2<···<ir P(Ei1Ei2 ··· Eir) + ··· + (−1) n+1P(E1E2 ··· En) The summation i1<i2<···<ir P(Ei1Ei2 ··· Eir) is taken over all of the n r possible subsets of size r of the set {1, 2, ... , n}. In words, Proposition 4.4 states that the probability of the union of n events equals the sum of the probabilities of these events taken one at a time, minus the sum of the probabilities of these events taken two at a time, plus the sum of the probabilities of these events taken three at a time, and so on. Remarks. 1. For a noninductive argument for Proposition 4.4, note first that if an outcome of the sample space is not a member of any of the sets Ei, then its probability does not contribute anything to either side of the equality. Now, suppose that an outcome is in exactly m of the events Ei, where m > 0. Then, since it is in i Ei, its