6.042/18.] Mathematics for Computer Science April 28, 2005 Srini devadas and Eric Lehman Lecture notes Random variable Weve used probablity to model a variety of experiments, games, and tests. Through out, we have tried to compute probabilities of events. We asked for example, what is the probability of the event that you win the Monty Hall game? What is the probability of the event that it rains, given that the weatherman carried his umbrella today? What is the probability of the event that you have a rare disease, given that you tested positive? But one can ask more general questions about an experiment. How hard will it rain? How long will this illness last? How much will I lose playing 6.042 games all day? These questions are fundamentally different and not easily phrased in terms of events. The problem is that an event either does or does not happen: you win or lose, it rains or doesn't, you're sick or not. But these new questions are about matters of degree: how much, how hard, how long? To approach these questions, we need a new mathematical 1 Random variables Let's begin with an example. Consider the experiment of tossing three independent, un- biased coins. Let C be the number of heads that appear. Let M= l if the three coins come up all heads or all tails, and let M=0 otherwise. Now every outcome of the three coin flips uniquely determines the values of C and M. For example, if we flip heads, tails, heads, then C=2 and M=0. If we flip tails, tails, tails, then C=0 and M=l. In effect, C counts the number of heads and M indicates whether all the coins match Since each outcome uniquely determines C and M, we can regard them as function mapping outcomes to numbers. For this experiment, the sample space is S=HHH, HHT, HTH, HTT, THH, THT, TTH, TTT) Now C is a function that maps each outcome in the sample space to a number as follows C(HHHI CTHH 2 C(HHT) 3221 CTHT C(HTH) C(TTH) C(HTT) C(TTT)=0 Similarly, M is a function mapping each outcome another way M(HHH M(HHT M(THT 0 M(HTH)=0 METH M(HTT)=0 M(TTT)
6.042/18.062J Mathematics for Computer Science April 28, 2005 Srini Devadas and Eric Lehman Lecture Notes Random Variables We’ve used probablity to model a variety of experiments, games, and tests. Throughout, we have tried to compute probabilities of events. We asked, for example, what is the probability of the event that you win the Monty Hall game? What is the probability of the event that it rains, given that the weatherman carried his umbrella today? What is the probability of the event that you have a rare disease, given that you tested positive? But one can ask more general questions about an experiment. How hard will it rain? How long will this illness last? How much will I lose playing 6.042 games all day? These questions are fundamentally different and not easily phrased in terms of events. The problem is that an event either does or does not happen: you win or lose, it rains or doesn’t, you’re sick or not. But these new questions are about matters of degree: how much, how hard, how long? To approach these questions, we need a new mathematical tool. 1 Random Variables Let’s begin with an example. Consider the experiment of tossing three independent, unbiased coins. Let C be the number of heads that appear. Let M = 1 if the three coins come up all heads or all tails, and let M = 0 otherwise. Now every outcome of the three coin flips uniquely determines the values of C and M. For example, if we flip heads, tails, heads, then C = 2 and M = 0. If we flip tails, tails, tails, then C = 0 and M = 1. In effect, C counts the number of heads, and M indicates whether all the coins match. Since each outcome uniquely determines C and M, we can regard them as functions mapping outcomes to numbers. For this experiment, the sample space is: S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T} Now C is a function that maps each outcome in the sample space to a number as follows: C(HHH) = 3 C(T HH) = 2 C(HHT) = 2 C(T HT) = 1 C(HTH) = 2 C(T T H) = 1 C(HT T) = 1 C(T T T) = 0 Similarly, M is a function mapping each outcome another way: M(HHH) = 1 M(T HH) = 0 M(HHT) = 0 M(T HT) = 0 M(HTH) = 0 M(T T H) = 0 M(HT T) = 0 M(T T T) = 1
2 Random Variables The functions C and M are examples of random variables. In general, a random variable is a function whose domain is the sample space.(The codomain can be anything, but well usually use a subset of the real numbers. )Notice that the name"random variable is a misnomer; random variables are actually functions! 1.1 Indicator random variables An indicator random variable (or simply an indicator or a Bernoulli random variable)is a random variable that maps every outcome to either 0 or 1. The random variable M is an example. If all three coins match, then M= l; otherwise, M=0 Indicator random variables are closely related to events. In particular, an indicator partitions the sample space into those outcomes mapped to 1 and those outcomes mapped toO. For example, the indicator M partitions the sample space into two blocks as follows HHH TTT HHT HTH HTT THH THT TTH In the same way, an event partitions the sample space into those outcomes in the event and those outcomes not in the event. Therefore, each event is naturally associated with a certain indicator random variable and vice versa: an indicator for an event E is an indicator random variable that is 1 for all outcomes in e and o for all outcomes not in e Thus, M is an indicator random variable for the event that all three coins mato 1.2 Random variables and events There is a strong relationship between events and more general random variables as well A random variable that takes on several values partitions the sample space into several blocks. For example, C partitions the sample space as follows TTT TTH THT HTT THH HTH HHT HHH Each block is a subset of the sample space and is therefore an event. Thus, we can regard an equation or inequality involving a random variable as an event. For example, the event that C= 2 consists of the outcomes thh hth, and hht. the event c< l consists of the outcomes ttt tth. tht and htt. Naturally enough, we can talk about the probability of events defined by equations and inequalities involving random variables. For example Pr(M=1)=Pr(TTT)+ Pr(HHh
� � � 2 Random Variables The functions C and M are examples of random variables. In general, a random variable is a function whose domain is the sample space. (The codomain can be anything, but we’ll usually use a subset of the real numbers.) Notice that the name “random variable” is a misnomer; random variables are actually functions! 1.1 Indicator Random Variables An indicator random variable (or simply an indicator or a Bernoulli random variable) is a random variable that maps every outcome to either 0 or 1. The random variable M is an example. If all three coins match, then M = 1; otherwise, M = 0. Indicator random variables are closely related to events. In particular, an indicator partitions the sample space into those outcomes mapped to 1 and those outcomes mapped to 0. For example, the indicator M partitions the sample space into two blocks as follows: HHH �� T T T HHT HTH HT T ��T HH T HT T T H � M = 1 M = 0 In the same way, an event partitions the sample space into those outcomes in the event and those outcomes not in the event. Therefore, each event is naturally associated with a certain indicator random variable and vice versa: an indicator for an event E is an indicator random variable that is 1 for all outcomes in E and 0 for all outcomes not in E. Thus, M is an indicator random variable for the event that all three coins match. 1.2 Random Variables and Events There is a strong relationship between events and more general random variables as well. A random variable that takes on several values partitions the sample space into several blocks. For example, C partitions the sample space as follows: T T T T T H T HT HT T � T HH HTH HHT � HHH � �� � � �� � �� � �� � C = 0 C = 1 C = 2 C = 3 Each block is a subset of the sample space and is therefore an event. Thus, we can regard an equation or inequality involving a random variable as an event. For example, the event that C = 2 consists of the outcomes T HH, HTH, and HHT. The event C ≤ 1 consists of the outcomes T T T, T T H, T HT, and HT T. Naturally enough, we can talk about the probability of events defined by equations and inequalities involving random variables. For example: Pr (M = 1) = Pr (T T T) + Pr (HHH) 1 1 = + 8 8 1 = 4
Random variables As another example Pr(C22)=Pr THh)+Pr(hTh)+Pr(hHt)+ Pr(hhH) × 11 This is pretty wild; one normally thinks of equations and inequalities as either true or false. But when variables are replaced by random variables, there is a probability that the elationship holds 1.3 Conditional Probability Mixing conditional probabilities and events involving random variables creates no new difficulties. For example, Pr(C22 M=0) is the probability that at least two coins are heads(C> 2), given that not all three coins are the same(M=0). We can compute this probability using the definition of conditional probability Pr(≥2|M=0)=P(≥2nM=0) Pr (M=0) Pr(THH, HTH, HHTI) Pr(THH, HTH, HHT, HTT, THT, TTHD) The expression C2 2nM=0 on the first line may look odd; what is the set operation n doing between an inequality and an equality? But recall that, in this context, C 2 and M=0 are events, which sets of outcomes. So taking their intersection is perfectly valid! 1.4 Independence The notion of independence carries over from events to random variables as well. Ran dom variables R and r2 are independent if Pr(r1=1n R2= T2)=Pr(R1=T1). Pr(R2=2) for all i in the codomain of Ri and t2 in the codomain of r As with events, we can formulate independence for random variables in an equivalent and perhaps more intuitive way: random variables Ri and R2 are independent if and only Pr(R1= 1l R2=12)=Pr(R1=ai) or Pr(R2=x2)
Random Variables 3 As another example: Pr (C ≥ 2) = Pr (T HH) + Pr (HTH) + Pr (HHT) + Pr (HHH) 1 1 1 1 = + + + 8 8 8 8 1 = 2 This is pretty wild; one normally thinks of equations and inequalities as either true or false. But when variables are replaced by random variables, there is a probability that the relationship holds! 1.3 Conditional Probability Mixing conditional probabilities and events involving random variables creates no new difficulties. For example, Pr (C ≥ 2 | M = 0) is the probability that at least two coins are heads (C ≥ 2), given that not all three coins are the same (M = 0). We can compute this probability using the definition of conditional probability: Pr (C ≥ 2 M = 0) = Pr (C ≥ 2 ∩ M = 0) | Pr (M = 0) Pr ({T HH, HT H, HHT}) = Pr ({T HH, HT H, HHT, HT T, T HT, T T H}) 3/8 = 6/8 1 = 2 The expression C ≥ 2 ∩ M = 0 on the first line may look odd; what is the set operation ∩ doing between an inequality and an equality? But recall that, in this context, C ≥ 2 and M = 0 are events, which sets of outcomes. So taking their intersection is perfectly valid! 1.4 Independence The notion of independence carries over from events to random variables as well. Random variables R1 and R2 are independent if Pr (R1 = x1 ∩ R2 = x2) = Pr (R1 = x1) · Pr (R2 = x2) for all x1 in the codomain of R1 and x2 in the codomain of R2. As with events, we can formulate independence for random variables in an equivalent and perhaps more intuitive way: random variables R1 and R2 are independent if and only if Pr (R1 = x1 | R2 = x2) = Pr (R1 = x1) or Pr (R2 = x2) = 0
Random variables for all i in the codomain of Ri and r2 in the codomain of R2. In words, the probability that R, takes on a particular value is unaffected by the value of r As an example are C and M independent? Intuitively the answer should be"no The number of heads, C, completely determines whether all three coins match; that is whether M= l. But to verify this intuition we must find some 21, 2 E R such that: Pr(C=x1∩M=x2)≠Pr(C=x1)·Pr(M=x2) One appropriate choice of values is T1=2 and 2=1. In that case, we have Pr(c=2nM=1)=0 but Pr(C=2). Pr(M=1) ≠0 The notion of independence generalizes to a set of random variables as follows. Ran dom variables R1, R2, .. Rn are mutually independent if Pr(R1=x1∩R2=x2∩…∩Rn1=xn) Pr(R1=a1). Pr(R2=T2). Pr(Rn=an) for all x1,…, Inin the codomains of R1,…,Rn A consequence of this definition of mutual independence is that the probability of ssignment to a subset of the variables is equal to the product of the probabilites of the individual assignments. Thus, for example, if R1, R2,..., R1oo are mutually independent random variables with codomain n, then it follows that Pr(R1=9∩R7=84∩R23=13)=Pr(R1=9)·Pr(R7=84)Pr(R23=13) (This follows by summing over all possible values of the other random variables; we omit the details. 1.5 An Example with dice Suppose that we roll two fair, independent dice. The sample space for this experiment consists of all pairs(r1, r2)where r1, r2 E(1,2, 3, 4, 5, 6. Thus, for example, the outcome (3, 5)corresponds to rolling a 3 on the first die and a 5 on the second. The probability of each outcome in the sample space is 1 /6 1 6=1 36 since the dice are fair and indepen We can regard the numbers that come up on the individual dice as random variables D1 and D2. So D1( 3, 5)=3 and D2(3, 5)=5. Then the expression D1+ D2 random variable; lets call it T for"total". More precisely, weve defined T(w)=D1(w)+ D(w) for every outcome w Thus, T(3, 5)=D1( 3, 5)+ D2 (3, 5)=3+5=8. In general, any function of random variables is itself a random variable. For example,VDi+cos(D2)is a strange, but well defined random variable
� 4 Random Variables for all x1 in the codomain of R1 and x2 in the codomain of R2. In words, the probability that R1 takes on a particular value is unaffected by the value of R2. As an example, are C and M independent? Intuitively, the answer should be “no”. The number of heads, C, completely determines whether all three coins match; that is, whether M = 1. But to verify this intuition we must find some x1, x2 ∈ R such that: Pr (C = x1 ∩ M = x2) = Pr (C = x1) · Pr (M = x2) One appropriate choice of values is x1 = 2 and x2 = 1. In that case, we have: 3 1 Pr (C = 2 ∩ M = 1) = 0 but Pr (C = 2) · Pr (M = 1) = = 0 8 · 4 � The notion of independence generalizes to a set of random variables as follows. Random variables R1, R2, . . . , Rn are mutually independent if Pr (R1 = x1 ∩ R2 = x2 ∩ · · · ∩ Rn = xn) = Pr (R1 = x1) · Pr (R2 = x2)· · ·Pr (Rn = xn) for all x1, . . . , xn in the codomains of R1, . . . , Rn. A consequence of this definition of mutual independence is that the probability of an assignment to a subset of the variables is equal to the product of the probabilites of the individual assignments. Thus, for example, if R1, R2, . . . , R100 are mutually independent random variables with codomain N, then it follows that: Pr (R1 = 9 ∩ R7 = 84 ∩ R23 = 13) = Pr (R1 = 9) · Pr (R7 = 84) · Pr (R23 = 13) (This follows by summing over all possible values of the other random variables; we omit the details.) 1.5 An Example with Dice Suppose that we roll two fair, independent dice. The sample space for this experiment consists of all pairs (r1, r2) where r1, r2 ∈ {1, 2, 3, 4, 5, 6}. Thus, for example, the outcome (3, 5) corresponds to rolling a 3 on the first die and a 5 on the second. The probability of each outcome in the sample space is 1/6 1· /6 = 1/36 since the dice are fair and independent. We can regard the numbers that come up on the individual dice as random variables D1 and D2. So D1(3, 5) = 3 and D2(3, 5) = 5. Then the expression D1 + D2 is another random variable; let’s call it T for “total”. More precisely, we’ve defined: T(w) = D1(w) + D2(w) for every outcome w Thus, T(3, 5) = D1(3, 5) + D2(3, 5) = 3 + 5 = 8. In general, any function of random variables is itself a random variable. For example, √D1 + cos(D2) is a strange, but welldefined random variable
Let's also define an indicator random variable s for the event that the total of the two dice is seven: 1 if T( ifT()≠7 So S is equal to 1 when the sum is seven and is equal to 0 otherwise. For example, S(4,3)=1,butS(5,3) Now lets consider a couple questions about independence. First, are D1 and T inde- pendent? Intuitively, the answer would seem to be"no" since the number that comes up on the first die strongly affects the total of the two dice. But to prove this, we must find integers c1 and t, such that: Pr(D1 2)+Pr(D1=.1). Pr(T= 2) For example, we might choose T1=2 and 2=3. In this case, we have Pr(T=2|D1=3)=0 since the total can not be only 2 when one die alone is 3. On the other hand, we have (T=2).Pr(D1=3)=Pr({1,1})·Pr({(3,1),(3,2),…,(3,6)}) ≠0 So, as we suspected, these random variables are not independent. Are S and D independent? Once again, intuition suggests that the answer is"no The number on the first die ought to affect whether or not the sum is equal to seven But this time intuition turns out to be wrong! These two random variables actua Proving that two random variables are independent takes some work.(fortunately this is an uncommon task; usually independence is a modeling assumption. Only rarely do random variables unexpectedly turn out to be independent. In this case, we must show that Pr(S=rinD=T2)=Pr(S=a1). Pr(D1=x2) for all 1 E 0, 1) and all 2 E(1, 2, 3, 4, 5, 6. We can work through all these possibilities Suppose that 1=1. Then for every value of z2 we have Pr(S=1)=Pr(.6,2.),…,(6,1)=6 Pr(D1=2)=P(x21,(22,…,(x26)=6 Pr(S=1nD1=x2)=Pr(x2,7-x2)) Since 1/6. 1/6=1 /36, the independence condition is satisfied
� Random Variables 5 Let’s also define an indicator random variable S for the event that the total of the two dice is seven: � 1 if T(w) = 7 S(w) = 0 if T(w) =� 7 So S is equal to 1 when the sum is seven and is equal to 0 otherwise. For example, S(4, 3) = 1, but S(5, 3) = 0. Now let’s consider a couple questions about independence. First, are D1 and T independent? Intuitively, the answer would seem to be “no” since the number that comes up on the first die strongly affects the total of the two dice. But to prove this, we must find integers x1 and x2 such that: Pr (D1 = x1 ∩ T = x2) = Pr (D1 = x1) · Pr (T = x2) For example, we might choose x1 = 2 and x2 = 3. In this case, we have Pr (T = 2 | D1 = 3) = 0 since the total can not be only 2 when one die alone is 3. On the other hand, we have: Pr (T = 2) · Pr (D1 = 3) = Pr ({1, 1}) · Pr ({(3, 1), (3, 2), . . . , (3, 6)}) 1 6 = = 0 36 · 36 � So, as we suspected, these random variables are not independent. Are S and D1 independent? Once again, intuition suggests that the answer is “no”. The number on the first die ought to affect whether or not the sum is equal to seven. But this time intuition turns out to be wrong! These two random variables actually are independent. Proving that two random variables are independent takes some work. (Fortunately, this is an uncommon task; usually independence is a modeling assumption. Only rarely do random variables unexpectedly turn out to be independent.) In this case, we must show that Pr (S = x1 ∩ D1 = x2) = Pr (S = x1) · Pr (D1 = x2) (1) for all x1 ∈ {0, 1} and all x2 ∈ {1, 2, 3, 4, 5, 6}. We can work through all these possibilities in two batches: • Suppose that x1 = 1. Then for every value of x2 we have: 1 Pr (S = 1) = Pr ((1, 6), (2, 5), . . . , (6, 1)) = 6 1 Pr (D1 = x2) = Pr ((x2, 1), (x2, 2), . . . , (x2, 6)) = 6 1 Pr (S = 1 ∩ D1 = x2) = Pr ((x2, 7 − x2)) = 36 Since 1/6 1· /6 = 1/36, the independence condition is satisfied