1.3 Mathematical Background 17 with "heads"facing up or with "tails"facing up.We let 51='heads'and s2='tails'and assign Pr(s)=1/2 and Pr(s2)=1/2.(If someone objects because the coin could land on its edge,we may let s3='edge'and define Pr(s3)=0.However,with a finite number of events,an event of probability zero can be ignored,so such elementary events are not usu- ally defined.)If a six-sided die is thrown,there are six possible outcomes:for I <i<6, si="the die lands with side number i facing up,"and Pr(si)=1/6.In general,if there are k possible outcomes and each is considered equally likely,then we let Pr(s;)=1/k for each i.Often,there is no reason to assume all outcomes are equally likely;primarily, this is an assumption used in examples or used because there is no data to support a better assumption. If the experiment involves several objects,then an elementary event must take into account what is observed about all of them.For example,if two dice,A and B,are thrown, then the event"A lands with side I facing up"is not an elementary event because there are several outcomes associated with B.In this case,the elementary events would be s="die A lands with side i facing up and die B lands with side j facing up,"for 1si,js6.We will abbreviate this description to"A shows i and B shows j"from here on.There are 36 elementary events,and it is customary to assign a probability of 1/36 to each. We often need to consider the probability of any one of several specified outcomes occurring or the probability that the outcome has a particular property.Let S be a subset of the elementary events (s....Then s is called an event,and Pr(5)=Pr(s). For example,suppose one die is thrown,and define the event S to be"the number appearing is divisible by 3."Then,the probability of S is Pr(S)=Pr((s3,561)=Pr(s3)+Pr(s6)= 1/3.Elementary events are also events. Two special events are the sure event,U=(s1.....S),which has probability 1,and the impossible event,0,which has probability 0.(Recall that 0 denotes the empty set.)Also, for any event S,there is the complement event "not S,"consisting of all the elementary events that are not in S,that is,U-S.Clearly,Pr(not S)=1-Pr(S). Events can be defined in terms of other events by using the logical connectives"and" and "or."The event "S and S2"is (Sn S2),the intersection of S and S2.The event"S or S2"is (SLUS2),the union of Si and S2. We often need to analyze probabilities based on some degree of partial knowledge about the experiment.These are called conditional probabilities. Definition 1.4 Conditional probability The conditional probability of an event S given an event T is defined as ∑Pr(s) Pr(SIT)= Pr(S and T)sesnT Pr(T) ∑PrS) (1.4) siET 7 where si and si range over elementary events. ] Example 1.3 Conditional probability with two dice 7 Suppose two dice,A and B,are thrown in the experiment.Let us define three events:
18 Chapter 1 Analyzing Algorithms and Problems:Principles and Examples S:“A shows I," S2:“B shows6,” S:“The sum of the numbers showing is4 or less..” To get a feel for what the conditional probability means,let's consider the simple case in which all the elementary events have the same probability.For our example,the 36 elementary events are of the form“A shows i and B shows j,”for1≤i,j≤6.Then the conditional probability Pr(S S3)can be interpreted as the answer to the question, "Out of all the elementary events in S3,what fraction of those elementary events are also in S1?" Let us list all the elementary events in S3: “A shows I and B shows 1"“A shows2 and B shows I,” “A shows1 and B shows2,”“4 shows2 and B shows2," “A shows I and B shows3,”“A shows3 and B shows1," The event Si consists of 6 elementary events in which A shows 1,and B shows each of its six possible values.Three of the elementary events in S3 are also in S,so the answer to the question is 3/6=1/2.By an exact calculation from the formula in Equation(1.4),the probability of Si given S3 is Pr(StS3)= 3/36 636=1/2. Notice that the conditional probability of S2 given S3 is 0;that is,Pr(S2 S3)=0. In general,the procedure for calculating conditional probabilities given some spec- ified event S is to eliminate all the elementary events that are not in S,then rescale the probabilities of all the remaining elementary events by the same factor so that the rescaled probabilities sum to 1.The required factor is 1/Pr(S). The conditional probability of an event may be either larger or smaller than the uncon- ditional probability of that event.In Example 1.3 the unconditional probability of S is 1/6 and the conditional probability of S given S3 is 1/2.On the other hand,the unconditional probability that"the number shown by A is divisible by 3"is 1/3.But in Example 1.3 we see that the conditional probability that"the number shown by A is divisible by 3"given S3 is 1/6. Definition 1.5 Stochastic independence Given two events S and T,if Pr(S and T)=Pr(S)Pr(T) then S and T are stochastically independent,or simply independent. If S is stochastically independent of T,then Pr(ST)=Pr(S)(see Exercise 1.8).That is,knowing that event T has occurred does not influence the probability that event S occurs
1.3 Mathematical Background 19 one way or the other.The property of independence is extremely useful when it exists. because it permits probabilities of different events to be analyzed separately.However, many errors in analysis are made by unjustified assumptions of independence. Example 1.4 Stochastic independence Continuing with the events defined in Example 1.3,events S and S2 are independent because the probability of each is 1/6,and (S and S2)consists of one elementary event, whose probability is 1/36.Notice also that Pr(S1I S2)=(1/36)/(6/36)=1/6=Pr(S1). From the discussion in Example 1.3,we see that S and S3 are not independent,and that .S2 and S3 are not independent. Random variables and their expected values are important for many situations that involve probabilities.A random variable is a real valued variable that depends on which elementary event has occurred;in other words,it is a function defined for elementary events.For example,if the number of operations done by an algorithm depends on the input,and each possible input is an elementary event,then the number of operations is a random variable. Definition 1.6 Expectation and conditional expectation Let f(e)be a random variable defined on a set of elementary events e eU.The expectation of f,denoted as E(f),is defined as EUf)=∑fte)Pre. eEU This is often called the average value of f,also.The conditional expectation of f given an event S,denoted as E(f S),is defined as EfIs=∑fie)Pr(elS)=∑f(e)Pr(elS) eEU eeS since the conditional probability of any event not in S is 0. Expectations are often easier to manipulate than the random variables themselves, particularly when several interrelated random variables are involved,due to the following important laws,which are easily proven from the definitions. Lemma 1.2 (Laws of expectations)For random variables f(e)and g(e)defined on a set of elementary events e e U,and any event S: E(f+8)=E(f)+E(g), E(f)=Pr(S)E(f |S)+Pr(not S)E(f not S). Example 1.5 Conditional probability and order In Chapter 4 we will consider probabilities in connection with order information gained by doing comparisons.Let's look at an example of that type involving four elements A
20 Chapter 1 Analyzing Algorithms and Problems:Principles and Examples B,C,D,which have distinct numerical values,but initially we know nothing about their values or relative values.We will write the letters in order to denote the elementary event that this is their relative order;that is,CBDA is the event that C <B<D <A.There are 24 possible permutations: ABCD ACBD CABD ACDB CADB CDAB ABDC ADBC DABC ADCB DACB DCAB BACD BCAD CBAD BCDA CBDA CDBA BADC BDAC DBAC BDCA DBCA DCBA We begin by assuming all input permutations are equally likely,so the probability of each one is 1/24.What is the probability that A B?In other words,defining A<B as an event, what is its probability?Intuitively we expect it to be 1/2,and we can verify that by counting the number of permutations in which A apears before B in the sequence.Similarly,for any pair of elements,the probability that one is less than another is 1/2.For example,the event B D has probability 1/2. Now suppose the program compares A and B and discovers that A <B.How does this "affect"the probabilities?To make this question more rigorous,we phrase it as,"What are the probabilities conditioned on the event A <B?"We see by inspection that the event A B consists of all the elementary events in the first two rows of the table.Therefore the conditional probabilities of these elementary events given A <B are twice their original probabilities,2/24 1/12,while the conditional probabilities of the elementary events given A B in the last two rows are 0. Recall that before any comparisons,the probability of the event B<D was 1/2.We have not compared B and D.Is the conditional probability of B <D given A<B still 1/2?To answer the question,we check how many sequences in the first two rows have B preceding D.In fact,there are only four cases in which B precedes D in the first two rows. So Pr(B<D A<B)=1/3. Now consider the event C<D.Is its conditional probability different from 1/2? Again checking the first two rows of the table,we see that C precedes D in six cases so Pr(C<DA<B)=1/2.Therefore the events A B and C<D are stochastically independent.This is what we would expect:The relative order of A and B should not"have any influence"on the order of C and D. Finally,suppose the program does another comparison and discovers D<C (it already discovered A <B).Let's look at the conditional probabilities given both of these events (which is also the single event "A<B and D<C").We see by inspection that the event "A B and D<C"consists of all the eleinentary events in the second row of the table. To make the conditional probabilities sum to 1,all of these elementary events must have a conditional probability of 1/6.The program has not compared A or B to either of C or D. Does this mean that the conditional probabilities of the events A C,A D,B C,and B<D are unchanged from their original probabilities,which were al!1/2?The answer is worked out in Exercise 1.10
1.3 Mathematical Background 21 Example 1.6 Expected number of inversions Consider the same probability space as Example 1.5.Let us define the random variable (e)to be the number of pairs of elements whose relative key order is opposite their alphabetical order.This is called the number of inversions in the permutation.For example, I(ABCD)=0,/(ABDC)=I because D C but C precedes D in alphabetical order. (DCBA)=6,and so on.By inspection we see that E(1)=3.Now consider E(IA B) and E(I B <A).Again,by direct count we find they are 2.5 and 3.5,respectively.Since Pr(A B)=Pr(B <A)=4,Lemma 1.2 tells us that E(1)=(2.5+3.5),which is true. To summarize,conditional probabilities reflect the uncertainties of a situation when we have some partial knowledge.They can be calculated by discarding all the elementary events that are known not to be possible in the current situation,then scaling up the remaining probabilities of elementary events so that they again sum to I.Any event whose probability does nor change as a result of this calculation is(stochastically)independent of the known event.Independent events often involve objects that do not influence each other (like multiple coins or multiple dice). Summations and Series There are several summations that occur frequently when analyzing algorithms.Formulas for some of them are listed here and in the next section,with brief hints that may help you to remember them.A note on terminology:A series is the sum of a sequence. Arithmetic Series:The sum of consecutive integers: (n+1) 2 1.5) i▣ How to remember if:Write out the integers from I to n.Pair up the first and last,that is, 1 and n;pair up the second and next to last,2 and n-I,and so on.Each pair adds up to (n+1)and there are n/2 pairs,giving the result.(If n is odd the central element counts as "half a pair.")The same trick works for limits other than 1 and n. Polynomial Series: First,we consider the sum of squares. ∑2=2n3+32+m (1.6) 6 i= This can be proved by induction on n.The main thing to remember is that the sum of the first n squares is roughly n/3.Equation (1.6)is not used in the text,but you may need it for some of the exercises. The general case is i-l k+1 (1.7)