SupplementaryExercises171.25The following data give the lengths of time to failure for n = 88 radio transmitter-receivers:16224168096536400803921282244032576566563583842562463284644487163041672880725660810819413622480164242647215621616818455218424032272152328438120308480602083401047216840152360232401121122881683525672644018426496224280152160176168168114208Use the range to approximate s for the n = 88 lengths of time to failure.abConstruct a frequency histogram for the data. [Notice the tendency of the distribution totail outward (skew) to the right.)Use a calculator (or computer)to calculate y and s.(Hand calculation is much too tediousfor this exercise.)dCalculatetheintervalsy±ks,k=1,2,and3,and countthenumberofmeasurementsfalling in each interval. Compareyour results with the empirical rule results.Note that theempirical rule provides arather good description of these data, even though the distributionis highly skewed1.26Comparethe ratio of the rangeto sforthe three sample sizes (n =6, 20, and 88)forExercises 1.12,1.24, and 1.25.Notethat the ratio tends to increaseas theamount of dataincreases.Thegreatertheamount ofdata, thegreaterwill betheirtendencytocontain afewextreme values that will inflate the range and have relatively little effect on s. We ignored thisphenomenon and suggested that you use 4 as theratioforfinding a guessed value ofs in checkingcalculations.1.27Asetof340examinationscoresexhibitingabell-shaped relativefrequencydistributionhasamean ofy=72 and a standard deviation of s=8.Approximatelyhowmany of thescoreswould you expecttofall in the interval from 64to80?The interval from56to88?1.28The discharge of suspended solids from a phosphate mine is normally distributed with meandailydischarge27milligramsperliter (mg/L)and standarddeviation14mg/L.Inwhatpro-portion of the days will the daily discharge be less than 13 mg/L?1.29Amachineproduces bearings withmean diameter3.00 inches and standarddeviation0.01 inchBearingswithdiametersinexcessof3.02inchesorlessthan2.98incheswillfailtomeetqualityspecifications.Approximately what fraction of this machine's production will fail to meet specifications?What assumptions did you makeconcerning the distribution of bearing diameters in orderbto answer this question?1.30Compared to their stay-at-home peers, women employed outside the home have higher levelsof high-density lipoproteins (HDL),the"good"cholesterol associated with lower risk for heartattacks.Astudy ofcholesterol levels in 2000 women,aged25-64, living in Augsburg,Germanywas conducted by Ursula Haertel, Ulrigh Keil, and colleagues? at the GSF-Medis Institut in2. Science News 135 (June 1989): 389Copyright 2011 Cengage LearsAll Rights Fduolicinwholeorinpartapter(s)Erors
Supplementary Exercises 17 1.25 The following data give the lengths of time to failure for n = 88 radio transmitter-receivers: 16 224 16 80 96 536 400 80 392 576 128 56 656 224 40 32 358 384 256 246 328 464 448 716 304 16 72 8 80 72 56 608 108 194 136 224 80 16 424 264 156 216 168 184 552 72 184 240 438 120 308 32 272 152 328 480 60 208 340 104 72 168 40 152 360 232 40 112 112 288 168 352 56 72 64 40 184 264 96 224 168 168 114 280 152 208 160 176 a Use the range to approximate s for the n = 88 lengths of time to failure. b Construct a frequency histogram for the data. [Notice the tendency of the distribution to tail outward (skew) to the right.] c Use a calculator (or computer) to calculate y and s. (Hand calculation is much too tedious for this exercise.) d Calculate the intervals y ± ks, k = 1, 2, and 3, and count the number of measurements falling in each interval. Compare your results with the empirical rule results. Note that the empirical rule provides a rather good description of these data, even though the distribution is highly skewed. 1.26 Compare the ratio of the range to s for the three sample sizes (n = 6, 20, and 88) for Exercises 1.12, 1.24, and 1.25. Note that the ratio tends to increase as the amount of data increases. The greater the amount of data, the greater will be their tendency to contain a few extreme values that will inflate the range and have relatively little effect on s. We ignored this phenomenon and suggested that you use 4 as the ratio for finding a guessed value ofs in checking calculations. 1.27 A set of 340 examination scores exhibiting a bell-shaped relative frequency distribution has a mean of y = 72 and a standard deviation of s = 8. Approximately how many of the scores would you expect to fall in the interval from 64 to 80? The interval from 56 to 88? 1.28 The discharge of suspended solids from a phosphate mine is normally distributed with mean daily discharge 27 milligrams per liter (mg/L) and standard deviation 14 mg/L. In what proportion of the days will the daily discharge be less than 13 mg/L? 1.29 A machine produces bearings with mean diameter 3.00 inches and standard deviation 0.01 inch. Bearings with diameters in excess of 3.02 inches or less than 2.98 inches will fail to meet quality specifications. a Approximately what fraction of this machine’s production will fail to meet specifications? b What assumptions did you make concerning the distribution of bearing diameters in order to answer this question? 1.30 Compared to their stay-at-home peers, women employed outside the home have higher levels of high-density lipoproteins (HDL), the “good” cholesterol associated with lower risk for heart attacks. A study of cholesterol levels in 2000 women, aged 25–64, living in Augsburg, Germany, was conducted by Ursula Haertel, Ulrigh Keil, and colleagues2 at the GSF-Medis Institut in 2. Science News 135 (June 1989): 389. Copyright 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it
18Chapter1What Is Statistics?Munich.Of these 2000 women, the 48% who worked outside the home had HDLlevels that werebetween 2.5 and 3.6 milligrams per deciliter (mg/dL) higher than the HDL levels of their stay-at-home counterparts. Suppose that the difference in HDL levels is normally distributed, withmeanO(indicatingnodifferencebetweenthetwogroupsofwomen)and standarddeviation1.2 mg/dL.If you were to select an employed woman and a stay-at-home counterpart atrandom,what isthe probability that thedifference in theirHDLlevelswould bebetween 1.2and 2.4?1.31Over the past year, a fertilizer production process has shown an average daily yield of 60 tonswithavarianceindailyyields of 100.If theyield shouldfall tolessthan 40tons tomorrowshouldthis resultcauseyouto suspect an abnormality intheprocess?(Calculate the probabilityof obtaining less than 40 tons.)What assumptions did you make concerning the distribution ofyields?*1.32Letk ≥1.Show that,for any setof n measurements,thefraction included in the interval y-ksto y+ks is at least ( -1/k-).[Hint:2In this expression, replace all deviations for which |yi ≥ ks with ks. Simplify.J This resultisknownasTchebysheff'stheorem.31.33Apersonnel managerfora certain industryhas recordsofthenumberof employeesabsentper day. The average number absent is 5.5, and the standard deviation is 2.5.Because thereare many days with zero, one, or two absent and only a few with more than ten absent, thefrequency distribution is highly skewed. The manager wants to publish an interval in which atleast 75% of these values lie. Use the result in Exercise 1.32 to find such an interval.1.34For the data discussed in Exercise 1.33,give an upper bound to thefraction of days when thereare more than 13 absentees.1.35Apharmaceutical company wants toknowwhetheran experimentaldrughas an effectonsystolic blood pressure. Fifteen randomly selected subjects were given the drug and, aftersufficient time for the drug to have an impact, their systolic blood pressures were recorded.The data appear below:172140123130115148108129137161123152133128142Approximatethevalue ofs using therange approximation.2bCalculate the values of y and s for the 15 blood pressure readings.Use Tchebysheff's theorem (Exercise 1.32)to find values a and b such that at least 75%Cof the blood pressure measurements lie between a and b.dDid Tchebysheff's theorem work?That is,use the data to find the actual percent of bloodpressure readings that are between the values a and b you found in part (c). Is this actualpercentage greater than 75%?1.36Arandom sampleof 100foxes was examinedby a team ofveterinarians to determine the preva-lenceof a specificparasite.Countingthenumber ofparasites of thisspecifictype,theveterinarians found that 69 foxes had no parasites of the type of interest, 17had one parasite of the3. Exercises preceded by an asterisk are optional.Copyright 2011 Cengage LeaAll Rights Fduoliewhole orinenEditors
18 Chapter 1 What Is Statistics? Munich. Of these 2000 women, the 48% who worked outside the home had HDL levels that were between 2.5 and 3.6 milligrams per deciliter (mg/dL) higher than the HDL levels of their stayat-home counterparts. Suppose that the difference in HDL levels is normally distributed, with mean 0 (indicating no difference between the two groups of women) and standard deviation 1.2 mg/dL. If you were to select an employed woman and a stay-at-home counterpart at random, what is the probability that the difference in their HDL levels would be between 1.2 and 2.4? 1.31 Over the past year, a fertilizer production process has shown an average daily yield of 60 tons with a variance in daily yields of 100. If the yield should fall to less than 40 tons tomorrow, should this result cause you to suspect an abnormality in the process? (Calculate the probability of obtaining less than 40 tons.) What assumptions did you make concerning the distribution of yields? *1.32 Let k ≥ 1. Show that, for any set of n measurements, the fraction included in the interval y −ks to y + ks is at least (1 − 1/k 2 ). [Hint: s 2 = 1 n − 1 6bn i=1 (yi − y) 2 = . In this expression, replace all deviations for which |yi − y| ≥ ks with ks. Simplify.] This result is known as Tchebysheff ’s theorem.3 1.33 A personnel manager for a certain industry has records of the number of employees absent per day. The average number absent is 5.5, and the standard deviation is 2.5. Because there are many days with zero, one, or two absent and only a few with more than ten absent, the frequency distribution is highly skewed. The manager wants to publish an interval in which at least 75% of these values lie. Use the result in Exercise 1.32 to find such an interval. 1.34 For the data discussed in Exercise 1.33, give an upper bound to the fraction of days when there are more than 13 absentees. 1.35 A pharmaceutical company wants to know whether an experimental drug has an effect on systolic blood pressure. Fifteen randomly selected subjects were given the drug and, after sufficient time for the drug to have an impact, their systolic blood pressures were recorded. The data appear below: 172 140 123 130 115 148 108 129 137 161 123 152 133 128 142 a Approximate the value of s using the range approximation. b Calculate the values of y and s for the 15 blood pressure readings. c Use Tchebysheff’s theorem (Exercise 1.32) to find values a and b such that at least 75% of the blood pressure measurements lie between a and b. d Did Tchebysheff’s theorem work? That is, use the data to find the actual percent of blood pressure readings that are between the values a and b you found in part (c). Is this actual percentage greater than 75%? 1.36 A random sample of 100 foxes was examined by a team of veterinarians to determine the prevalence of a specific parasite. Counting the number of parasites of this specific type, the veterinarians found that 69 foxes had no parasites of the type of interest, 17 had one parasite of the 3. Exercises preceded by an asterisk are optional. Copyright 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it
19Supplementary Exercisestype under study, and so on. A summary of their results is given in the following table:12345678Number of Parasites06917601Numberof Foxes3121Construct the relative frequencyhistogram for the number of parasites per fox.abCalculate y and s for the data given.cWhatfractionoftheparasitecountsfallswithin2standarddeviations ofthemean?Within3 standard deviations?Doyour results agree with Tchebysheff's theorem (Exercise 1.32)and/or the empirical rule?1.37Studies indicate that drinking water supplied by some old lead-lined city piping systems maycontainharmful levels oflead.Basedon data presented byKaralekas and colleagues,itappearsthatthedistributionofleadcontentreadingsforindividualwaterspecimenshasmean.033mg/Land standard deviation .10 mg/L. Explain why it is obvious that the lead content readings arenotnormallydistributed1.38In Exercise1.19, themean and standard deviation of the amountofchloroformpresent in watersourcesweregiventobe34and53,respectively.You arguedthattheamountsofchloroformcould therefore not be normally distributed. Use Tchebysheff's theorem (Exercise 1.32) todescribe the distribution of chloroform amounts in water sources.4. P. C. Karalekas, Jr., C. R. Ryan, and F. B. Taylor, "Control of Lead, Copper and Iron Pipe Corrosion inBoston," American Water Works Journal (February 1983): 92Copyrighs 2011 Cengage Leaning, All Rights Reserved. May not be copiecor duplicated, in whole or in part, Due to electronic rights,sr eChapter(s)aPnEditorial
Supplementary Exercises 19 type under study, and so on. A summary of their results is given in the following table: Number of Parasites 0 1 2345678 Number of Foxes 69 17 6 3 1 2 1 0 1 a Construct the relative frequency histogram for the number of parasites per fox. b Calculate y and s for the data given. c What fraction of the parasite counts falls within 2 standard deviations of the mean? Within 3 standard deviations? Do your results agree with Tchebysheff’s theorem (Exercise 1.32) and/or the empirical rule? 1.37 Studies indicate that drinking water supplied by some old lead-lined city piping systems may contain harmful levels of lead. Based on data presented by Karalekas and colleagues,4 it appears that the distribution of lead content readings for individual water specimens has mean .033 mg/L and standard deviation .10 mg/L. Explain why it is obvious that the lead content readings are not normally distributed. 1.38 In Exercise 1.19, the mean and standard deviation of the amount of chloroform present in water sources were given to be 34 and 53, respectively. You argued that the amounts of chloroform could therefore not be normally distributed. Use Tchebysheff’s theorem (Exercise 1.32) to describe the distribution of chloroform amounts in water sources. 4. P. C. Karalekas, Jr., C. R. Ryan, and F. B. Taylor, “Control of Lead, Copper and Iron Pipe Corrosion in Boston,” American Water Works Journal (February 1983): 92. Copyright 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it
CHAPTER 2Probability2.1Introduction2.2Probability and Inference2.3AReviewof SetNotation2.4A Probabilistic Model for an Experiment: The Discrete Case2.5Calculating the Probability of an Event:The Sample-Point Method2.6Tools forCountingSamplePoints2.7Conditional Probabilityand the Independence of Events2.8Two Laws of Probability2.9Calculating the Probability of an Event:The Event-Composition Method2.10TheLawofTotalProbabilityandBayesRule2.11Numerical Eventsand RandomVariables2.12RandomSampling2.13 SummaryReferences and Further Readings2.1 IntroductionIn everyday conversation,the term probability is a measure of one's belief in theoccurrenceof a future event.We accept this as a meaningful and practical interpreta-tion of probability but seek a clearer understanding of its context, how it is measured,and howitassists inmakinginferences.The concept of probability is necessary in work with physical, biological, or so-cial mechanisms that generate observations that cannot be predicted with certainty.For example, the blood pressure of a person at a given point in time cannot be pre-dicted with certainty, and we never know the exact load that a bridge will endurebeforecollapsing intoa river.Suchrandom events cannotbepredicted with certainty,but the relative frequency with which they occur in a long series of trials is oftenremarkably stable.Eventspossessing thispropertyarecalledrandom,or stochastic,events.This stable long-term relative frequency provides an intuitively meaningful201RialON0Be
CHAPTER 2 Probability 2.1 Introduction 2.2 Probability and Inference 2.3 A Review of Set Notation 2.4 A Probabilistic Model for an Experiment: The Discrete Case 2.5 Calculating the Probability of an Event: The Sample-Point Method 2.6 Tools for Counting Sample Points 2.7 Conditional Probability and the Independence of Events 2.8 Two Laws of Probability 2.9 Calculating the Probability of an Event: The Event-Composition Method 2.10 The Law of Total Probability and Bayes’ Rule 2.11 Numerical Events and Random Variables 2.12 Random Sampling 2.13 Summary References and Further Readings 2.1 Introduction In everyday conversation, the term probability is a measure of one’s belief in the occurrence of a future event. We accept this as a meaningful and practical interpretation of probability but seek a clearer understanding of its context, how it is measured, and how it assists in making inferences. The concept of probability is necessary in work with physical, biological, or social mechanisms that generate observations that cannot be predicted with certainty. For example, the blood pressure of a person at a given point in time cannot be predicted with certainty, and we never know the exact load that a bridge will endure before collapsing into a river. Such random events cannot be predicted with certainty, but the relative frequency with which they occur in a long series of trials is often remarkably stable. Events possessing this property are called random, or stochastic, events. This stable long-term relative frequency provides an intuitively meaningful 20 Copyright 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it
212.2Probabilityand Inferencemeasure of our belief in the occurrence ofarandom event if a future observationisto be made. It is impossible, for example, to predict with certainty the occurrence ofheads on a singletoss of a balanced coin,but we would be willing to state with a fairmeasure of confidence that the fraction of heads in a long series of trials would bevery near.5.That this relative frequency is commonlyused as a measure of belief inthe outcome for a single toss is evident when we consider chance from a gambler'sperspective. He risks money on the single toss of a coin, not a long series of tosses.Therelativefrequency of a head in a long series of tosses,which a gambler calls theprobabilityofahead,giveshim ameasureof thechance of winningona singletoss.Ifthe coin were unbalanced and gave 90%heads in a long series of tosses,thegamblerwould say that the probability of a head is.9,and he would be fairly confident in theoccurrenceofa head on a singletoss ofthecoin.The preceding example possesses some realistic and practical analogies. In manyrespects all people are gamblers. The research physician gambles time and moneyon a research project, and she is concerned with her success on a single flip of thissymbolic coin. Similarly,the investment of capital in a new manufacturing plant isa gamble that represents a single flip of a coin on which the entrepreneur has highhopes for success.Thefraction of similar investments that are successful in a longseries of trials is of interest to the entrepreneur only insofar as it provides a measureof belief in the successful outcome of a single individual investment.The relativefrequency concept of probability,although intuitively meaningfuldoes not provide a rigorous definition of probability.Many other concepts of proba-bility have been proposed, including that of subjective probability, which allows theprobability of an eventto vary depending upon the person performing the evaluation.Nevertheless,for our purposes we accept an interpretation based on relativefrequencyas a meaningful measure of our belief in the occurrence of an event. Next, we willexamine the link that probability provides between observation and inference.2.2 Probability and InferenceThe role that probability plays in making inferences will be discussed in detail afteran adequate foundation has been laid for the theory of probability. At this point wewill present an elementary treatment of this theory through an example and an appealtoyourintuitionThe example selected is similar to that presented in Section 1.4 but simpler andlesspractical.Itwaschosenbecauseoftheeasewithwhichwecanvisualizethepopulation and sample and because it provides an observation-producing mechanismfor which a probabilistic model will be constructed in Section 2.3.Consider a gambler who wishes tomake an inference concerning the balanceof a die.The conceptual population of interest is the set of numbers that would begenerated if the die were rolled over and over again, ad infinitum. If the die wereperfectly balanced, one-sixth of the measurements in this population would be ls.one-sixth, 2s,one-sixth, 3s, and so on.The corresponding frequency distribution isshown in Figure 2.1.Using the scientific method, the gambler proposes the hypothesis that the die isbalanced, and he seeks observations from nature to contradict the theory, if false.Copyright 2011 Cengage LeaAll Rightapter(s)
2.2 Probability and Inference 21 measure of our belief in the occurrence of a random event if a future observation is to be made. It is impossible, for example, to predict with certainty the occurrence of heads on a single toss of a balanced coin, but we would be willing to state with a fair measure of confidence that the fraction of heads in a long series of trials would be very near .5. That this relative frequency is commonly used as a measure of belief in the outcome for a single toss is evident when we consider chance from a gambler’s perspective. He risks money on the single toss of a coin, not a long series of tosses. The relative frequency of a head in a long series of tosses, which a gambler calls the probability of a head, gives him a measure of the chance of winning on a single toss. If the coin were unbalanced and gave 90% heads in a long series of tosses, the gambler would say that the probability of a head is .9, and he would be fairly confident in the occurrence of a head on a single toss of the coin. The preceding example possesses some realistic and practical analogies. In many respects all people are gamblers. The research physician gambles time and money on a research project, and she is concerned with her success on a single flip of this symbolic coin. Similarly, the investment of capital in a new manufacturing plant is a gamble that represents a single flip of a coin on which the entrepreneur has high hopes for success. The fraction of similar investments that are successful in a long series of trials is of interest to the entrepreneur only insofar as it provides a measure of belief in the successful outcome of a single individual investment. The relative frequency concept of probability, although intuitively meaningful, does not provide a rigorous definition of probability. Many other concepts of probability have been proposed, including that of subjective probability, which allows the probability of an event to vary depending upon the person performing the evaluation. Nevertheless, for our purposes we accept an interpretation based on relative frequency as a meaningful measure of our belief in the occurrence of an event. Next, we will examine the link that probability provides between observation and inference. 2.2 Probability and Inference The role that probability plays in making inferences will be discussed in detail after an adequate foundation has been laid for the theory of probability. At this point we will present an elementary treatment of this theory through an example and an appeal to your intuition. The example selected is similar to that presented in Section 1.4 but simpler and less practical. It was chosen because of the ease with which we can visualize the population and sample and because it provides an observation-producing mechanism for which a probabilistic model will be constructed in Section 2.3. Consider a gambler who wishes to make an inference concerning the balance of a die. The conceptual population of interest is the set of numbers that would be generated if the die were rolled over and over again, ad infinitum. If the die were perfectly balanced, one-sixth of the measurements in this population would be 1s, one-sixth, 2s, one-sixth, 3s, and so on. The corresponding frequency distribution is shown in Figure 2.1. Using the scientific method, the gambler proposes the hypothesis that the die is balanced, and he seeks observations from nature to contradict the theory, if false. Copyright 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it