6.042/18.] Mathematics for Computer Science April 14, 2005 Srini devadas and Eric Lehman Lecture notes Introduction to Probability Probability is the last topic in this course and perhaps the most important. Many Igorithms rely on randomization. Investigating their correctness and performance re- quires probability theory. Moreover, many aspects of computer systems, such as memory management, branch prediction, packet routing, and load balancing are designed around probabilistic assumptions and analyses. Probability also comes up in information theory cryptography, artificial intelligence, and game theory. Beyond these engineering applica- tions, an understanding of probability gives insight into many everyday issues, such as polling, DNA testing, risk assessment, investing, and gambling So probability is good stuff 1 Monty Hall In the September 9, 1990 issue of Parade magazine, the columnist Marilyn vos Savant responded to this letter Suppose you're on a game show, and you're given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say number 1, and the host, who knows what's behind the doors, opens another door, say number 3, which has a goat. He says to you, Do you want to pick door number 2?" Is it to your advantage to switch your choice of doors? Craig. F. Whitaker Columbia, MD The letter roughly describes a situation faced by contestants on the 1970s game show Let's Make a Deal, hosted by Monty Hall and Carol Merrill. Marilyn replied that the con testant should indeed switch. But she soon received a torrent of letters- many from mathematicians- telling her that she was wrong. The problem generated thousands of hours of heated debate Yet this isis an elementary problem with an elementary solution. Why was there much dispute? Apparently, most people believe they have an intuitive grasp of probability stark contrast to other branches of mathematics; few people believe they have n intuitive ability to compute integrals or factor large integers mately 100% of those people are wrong. In fact, everyone who has studied probability at
6.042/18.062J Mathematics for Computer Science April 14, 2005 Srini Devadas and Eric Lehman Lecture Notes Introduction to Probability Probability is the last topic in this course and perhaps the most important. Many algorithms rely on randomization. Investigating their correctness and performance requires probability theory. Moreover, many aspects of computer systems, such as memory management, branch prediction, packet routing, and load balancing are designed around probabilistic assumptions and analyses. Probability also comes up in information theory, cryptography, artificial intelligence, and game theory. Beyond these engineering applications, an understanding of probability gives insight into many everyday issues, such as polling, DNA testing, risk assessment, investing, and gambling. So probability is good stuff. 1 Monty Hall In the September 9, 1990 issue of Parade magazine, the columnist Marilyn vos Savant responded to this letter: Suppose you’re on a game show, and you’re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say number 1, and the host, who knows what’s behind the doors, opens another door, say number 3, which has a goat. He says to you, ”Do you want to pick door number 2?” Is it to your advantage to switch your choice of doors? Craig. F. Whitaker Columbia, MD The letter roughly describes a situation faced by contestants on the 1970’s game show Let’s Make a Deal, hosted by Monty Hall and Carol Merrill. Marilyn replied that the contestant should indeed switch. But she soon received a torrent of letters— many from mathematicians— telling her that she was wrong. The problem generated thousands of hours of heated debate. Yet this is is an elementary problem with an elementary solution. Why was there so much dispute? Apparently, most people believe they have an intuitive grasp of probability. (This is in stark contrast to other branches of mathematics; few people believe they have an intuitive ability to compute integrals or factor large integers!) Unfortunately, approximately 100% of those people are wrong. In fact, everyone who has studied probability at
2 Introduction to Probability length can name a half-dozen problems in which their intuition led them astray--often The way to avoid errors is to distrust informal arguments and rely instead on a rigor on intuition, then there are lots of compelling financial deals we'd love to offer you!8 ous, systematic approach. In short: intuition bad, formalism good. If you insist on relyin 1.1 The Four-Step method Every probability problem involves some sort of randomized experiment, process, or game And each such problem involves two distinct challenges 1. How do we model the situation mathematically? 2. How do we solve the resulting mathematical problem? In this section, we introduce a four-step approach to questions of the form, What is the In this approach, we build a probabilistic model step-by-step, formalizing the original question in terms of that model. Remarkably, the structured thinking that this approach imposes reduces many famously-confusing problems to near triviality. For example, as you'll see, the four-step method cuts through the confusion sur rounding the Monty Hall problem like a Ginsu knife. However, more complex probability questions may spin off challenging counting, summing, and approximation problems which, fortunately, you've already spent weeks learning how to solve 1.2 Clarifying the Problem Craig s original letter to Marilyn vos Savant is a bit vague, so we must make some as- sumptions in order to have any hope of modeling the game formally 1. The car is equally likely to be hidden behind each of the three doors 2. The player is equally likely to pick each of the three doors, regardless of the cars location 3. After the player picks a door, the host must open a different door with a goat behind it and offer the player the choice of staying with the original door or switch 4. If the host has a choice of which door to open, then he is equally likely to select each In making these assumptions, were reading a lot into Craig Whitaker's letter. Other nterpretations are at least as defensible, and some actually lead to different answers. But let's accept these assumptions for now and address the question, What is the probability that a player who switches wins the car
2 Introduction to Probability length can name a halfdozen problems in which their intuition led them astray— often embarassingly so. The way to avoid errors is to distrust informal arguments and rely instead on a rigorous, systematic approach. In short: intuition bad, formalism good. If you insist on relying on intuition, then there are lots of compelling financial deals we’d love to offer you! 1.1 The FourStep Method Every probability problem involves some sort of randomized experiment, process, or game. And each such problem involves two distinct challenges: 1. How do we model the situation mathematically? 2. How do we solve the resulting mathematical problem? In this section, we introduce a fourstep approach to questions of the form, “What is the probability that —– ?” In this approach, we build a probabilistic model stepbystep, formalizing the original question in terms of that model. Remarkably, the structured thinking that this approach imposes reduces many famouslyconfusing problems to near triviality. For example, as you’ll see, the fourstep method cuts through the confusion surrounding the Monty Hall problem like a Ginsu knife. However, more complex probability questions may spin off challenging counting, summing, and approximation problems— which, fortunately, you’ve already spent weeks learning how to solve! 1.2 Clarifying the Problem Craig’s original letter to Marilyn vos Savant is a bit vague, so we must make some assumptions in order to have any hope of modeling the game formally: 1. The car is equally likely to be hidden behind each of the three doors. 2. The player is equally likely to pick each of the three doors, regardless of the car’s location. 3. After the player picks a door, the host must open a different door with a goat behind it and offer the player the choice of staying with the original door or switching. 4. If the host has a choice of which door to open, then he is equally likely to select each of them. In making these assumptions, we’re reading a lot into Craig Whitaker’s letter. Other interpretations are at least as defensible, and some actually lead to different answers. But let’s accept these assumptions for now and address the question, “What is the probability that a player who switches wins the car?
Introduction to Probability 1.3 Step 1 Find the Sample Space Our first objective is to identify all the possible outcomes of the experiment. a typical experiment involves several randomly-determined quantities. For example, the Monty Hall ga olves three such quantities 1. The door concealing the car 2. The door initially chosen by the player 3. The door that the host opens to reveal a goat Every possible combination of these randomly-determined quantities is called an out come. The set of all possible outcomes is called the sample space for the experiment. a tree diagram is a graphical tool that can help us work through the four-step ap proach when the number of outcomes is not too large or the problem is nicely structured In particular, we can use a tree diagram to help understand the sample space of an exper- iment. The first randomly-determined quantity in our experiment is the door concealing the prize. We represent this as a tree with three branches location A B In this diagram, the doors are called A, B, and C instead of 1, 2, and 3 because well be adding a lot of other numbers to the picture later. Now, for each possible location of the rize, the player could initially chose any of the three doors. We represent this by adding a second layer to the tree
Introduction to Probability 3 1.3 Step 1: Find the Sample Space Our first objective is to identify all the possible outcomes of the experiment. A typical experiment involves several randomlydetermined quantities. For example, the Monty Hall game involves three such quantities: 1. The door concealing the car. 2. The door initially chosen by the player. 3. The door that the host opens to reveal a goat. Every possible combination of these randomlydetermined quantities is called an outcome. The set of all possible outcomes is called the sample space for the experiment. A tree diagram is a graphical tool that can help us work through the fourstep approach when the number of outcomes is not too large or the problem is nicely structured. In particular, we can use a tree diagram to help understand the sample space of an experiment. The first randomlydetermined quantity in our experiment is the door concealing the prize. We represent this as a tree with three branches: car location C A B In this diagram, the doors are called A, B, and C instead of 1, 2, and 3 because we’ll be adding a lot of other numbers to the picture later. Now, for each possible location of the prize, the player could initially chose any of the three doors. We represent this by adding a second layer to the tree:
Introduction to Probability guess location Finally, the host opens a door to reveal a goat. The host has either one choice or two, depending on the position of the car and the door initially selected by the player. For example, if the prize is behind door a and the player picks door B, then the host must open door C. However, if the prize is behind door A and the player picks door A, then the host could open either door B or door C. All of these possibilities are worked out in a hird layer of the tree
4 Introduction to Probability car location player’s initial guess C C A B A B C A B C A B Finally, the host opens a door to reveal a goat. The host has either one choice or two, depending on the position of the car and the door initially selected by the player. For example, if the prize is behind door A and the player picks door B, then the host must open door C. However, if the prize is behind door A and the player picks door A, then the host could open either door B or door C. All of these possibilities are worked out in a third layer of the tree:
Introduction to Probability door outcome car location (A, C, B) (B, B, A) Now let's relate this picture to the terms we introduced earlier: the leaves of the tree represent outcomes of the experiment, and the set of all leaves represents the sample space Thus, for this experiment, the sample space consists of 12 outcomes. For reference weve labeled each outcome with a triple of doors indicating door concealing prize, door initially chosen, door opened to reveal a goat) In these terms, the sample space is the set (A,A,B),(A,A,C),(A,B,C),(A,C,B),(B,A,C),(B,B,A), (B,B,C),(B,C,A),(C,A,B),(C,B,A),(C,C,A),(C,C,B) The tree diagram has a broader interpretation as well: we can regard the whole exper iment as"walk"from the root down to a leaf, where the branch taken at each stage randomly determined. Keep this interpretation in mind; we'll use it again later 1.4 Step 2: Define Events of Interest Our objective is to answer questions of the form What is the probability that where the horizontal line stands for some phrase such as"the player wins by switching the player initially picked the door concealing the prize, or"the prize is behind door C Almost any such phrase can be modeled mathematically as an event, which is defined to be a subset of the sample space
� � Introduction to Probability 5 car location player’s initial guess door revealed C C C A B A B A B C A B C A B A C A C C B A B outcome B (A,A,B) (A,A,C) (A,B,C) (B,A,C) (B,B,A) (B,B,C) (B,C,A) (C,A,B) (C,B,A) (C,C,A) (C,C,B) (A,C,B) Now let’s relate this picture to the terms we introduced earlier: the leaves of the tree represent outcomes of the experiment, and the set of all leaves represents the sample space. Thus, for this experiment, the sample space consists of 12 outcomes. For reference, we’ve labeled each outcome with a triple of doors indicating: (door concealing prize, door initially chosen, door opened to reveal a goat) In these terms, the sample space is the set: (A, A, B), (A, A, C), (A, B, C), (A, C, B), (B, A, C), (B, B, A), S = (B, B, C), (B, C, A), (C, A, B), (C, B, A), (C, C, A), (C, C, B) The tree diagram has a broader interpretation as well: we can regard the whole experiment as “walk” from the root down to a leaf, where the branch taken at each stage is randomly determined. Keep this interpretation in mind; we’ll use it again later. 1.4 Step 2: Define Events of Interest Our objective is to answer questions of the form “What is the probability that —– ?”, where the horizontal line stands for some phrase such as “the player wins by switching”, “the player initially picked the door concealing the prize”, or “the prize is behind door C”. Almost any such phrase can be modeled mathematically as an event, which is defined to be a subset of the sample space