Exercises7If one of the stocks is selected at random from the 40 for which the preceding data wereCtaken, what is the probability that it will have traded fewer than 5% ofits outstanding shares?1.5Given here is the relative frequency histogram associated with grade point averages (GPAs) ofa sample of 30 students:RelativeFrequency6/303/301.852.052.252.452.652.853.053.253.45GradePointAverageWhich oftheGPAcategoriesidentified onthehorizontal axis areassociated withthelargestaproportion of students?bWhat proportion of students had GPAs in each of the categories that you identified?What proportion of the students had GPAs less than 2.65?c1.6The relativefrequencyhistogram given next was constructed fromdata obtainedfromarandomsample of25families.Each was asked the numberof quarts ofmilk that had been purchasedthe previous week.RelativeFrequency3230451QuartsaUsethis relativefrequencyhistogramtodetermine the numberof quarts ofmilk purchasedbythe largest proportion of the25families.The category associated with thelargestrelativefrequencyiscalledthemodal category.bWhat proportion of the 25 families purchased more than 2 quarts of milk?What proportion purchased more thanObutfewerthan5quarts?cCopyrigh 2011 Cengage Ing.All Right:eorinDaCLDet0ensEditori
Exercises 7 c If one of the stocks is selected at random from the 40 for which the preceding data were taken, what is the probability that it will have traded fewer than 5% of its outstanding shares? 1.5 Given here is the relative frequency histogram associated with grade point averages (GPAs) of a sample of 30 students: 1.85 2.05 2.25 2.45 2.65 2.85 3.05 3.25 3.45 3/30 6/30 0 Relative Frequency Grade Point Average a Which of the GPA categories identified on the horizontal axis are associated with the largest proportion of students? b What proportion of students had GPAs in each of the categories that you identified? c What proportion of the students had GPAs less than 2.65? 1.6 The relative frequency histogram given next was constructed from data obtained from a random sample of 25 families. Each was asked the number of quarts of milk that had been purchased the previous week. 0 .1 .2 .3 .4 0 Relative Frequency 1 2 3 4 5 Quarts a Use this relative frequency histogram to determine the number of quarts of milk purchased by the largest proportion of the 25 families. The category associated with the largest relative frequency is called the modal category. b What proportion of the 25 families purchased more than 2 quarts of milk? c What proportion purchased more than 0 but fewer than 5 quarts? Copyright 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it
8Chapter1What Is Statistics?1.7The self-reported heights of 105 students in a biostatistics class were used to construct thehistogram given below.Relative10/105frequency5/105636972606675HeightsaDescribethe shape of the histogram.b Does this histogram have an unusual feature?Can youthink of an explanation for thetwopeaks inthehistogram?Is there someconsid-ceration other than height that results in the two separate peaks? What is it?1.8An article in Archaeometry presented an analysis of 26 samples of Romano-British pottery,found at four different kiln sites in the United Kingdom. The percentage of aluminum oxide ineachof the26samplesisgivenbelow:LlanederynCaldicotIslandThornsAshley Rails14.411.611.818.317.711.615.818.313.811.113.416.714.618.011.512.418.014.813.813.120.819.110.912.710.112.5Source: A.Tubb, A. J.Parker, and G.Nickless, "The Analysis of Romano-British Pottery by AtomicAbsorptionSpectrophotometry"Archaeometry22(1980):153.Construct a relative frequencyhistogram to describe the aluminum oxide content of alla.26 pottery samples.bWhat unusual feature do you see in this histogram? Looking at the data, can you think ofan explanationforthisunusual feature?1.33Characterizing a Set of Measurements:Numerical MethodsThe relative frequency histograms presented in Section 1.2provide useful informa-tion regarding the distribution of sets of measurement, but histograms are usuallynot adequatefor thepurposeof making inferences.Indeed,many similar histogramsCopyright 2011 Cengage LAll RightEtitou
8 Chapter 1 What Is Statistics? 1.7 The self-reported heights of 105 students in a biostatistics class were used to construct the histogram given below. 66 69 Heights 60 63 0 5/105 Relative frequency 10/105 72 75 a Describe the shape of the histogram. b Does this histogram have an unusual feature? c Can you think of an explanation for the two peaks in the histogram? Is there some consideration other than height that results in the two separate peaks? What is it? 1.8 An article in Archaeometry presented an analysis of 26 samples of Romano–British pottery, found at four different kiln sites in the United Kingdom. The percentage of aluminum oxide in each of the 26 samples is given below: Llanederyn Caldicot Island Thorns Ashley Rails 14.4 11.6 11.8 18.3 17.7 13.8 11.1 11.6 15.8 18.3 14.6 13.4 18.0 16.7 11.5 12.4 18.0 14.8 13.8 13.1 20.8 19.1 10.9 12.7 10.1 12.5 Source: A. Tubb, A. J. Parker, and G. Nickless, “The Analysis of Romano–British Pottery by Atomic Absorption Spectrophotometry,” Archaeometry 22 (1980): 153. a Construct a relative frequency histogram to describe the aluminum oxide content of all 26 pottery samples. b What unusual feature do you see in this histogram? Looking at the data, can you think of an explanation for this unusual feature? 1.3 Characterizing a Set of Measurements: Numerical Methods The relative frequency histograms presented in Section 1.2 provide useful information regarding the distribution of sets of measurement, but histograms are usually not adequate for the purpose of making inferences. Indeed, many similar histograms Copyright 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it
1.3Characterizinga Setof Measurements:NumericalMethodscouldbeformedfromthesame setof measurements.Tomakeinferences aboutapopulationbased on information contained ina sampleandtomeasurethegoodnessoftheinferences,weneedrigorouslydefinedquantitiesforsummarizingtheinformation contained in a sample.These sample quantities typically have mathematicalproperties,tobe developed in thefollowing chapters,that allowus tomakeprobabilitystatements regarding the goodness of our inferences.The quantities we define are numerical descriptive measures of a set of dataWeseek somenumbersthathavemeaningful interpretations and that can beusedto describe the frequency distribution for any set of measurements.We will confineourattentiontotwotypes ofdescriptivenumbers:measuresofcentraltendencyandmeasuresofdispersionorvariation.Probably the most common measure of central tendency used in statistics is thearithmetic mean. (Because this is the only type of mean discussed in this text, we willomitthewordarithmetic.)DEFINITION 1.1The mean ofa sample of n measuredresponses yi,y2, ..,Yn is given byThe correspondingpopulation mean is denoted μThe symbol y,read “y bar,"refers to a sample mean.We usually cannot measurethe value of the population mean, μ; rather, μ is an unknown constant that we maywanttoestimateusing sampleinformation.Themean of a set of measurements only locatesthe center of the distributionof data; by itself,it does not provide an adequate description of a set of measure-ments.Two sets of measurements could have widelydifferent frequencydistributionsbut equal means, as pictured in Figure 1.3.The difference between distributions Iand II in the figure lies in the variation or dispersion of measurements on eitherside of the mean.To describe data adequately,we must also define measures of datavariability.Themostcommonmeasure ofvariabilityused in statistics isthe variance,whichis afunction ofthedeviations (ordistances)ofthe samplemeasurementsfrom theirmeanFIGURE1.3Frequencydistributions withequalmeans butdifferent amountsofvariationHutII1Copyright2011 Ce
1.3 Characterizing a Set of Measurements: Numerical Methods 9 could be formed from the same set of measurements. To make inferences about a population based on information contained in a sample and to measure the goodness of the inferences, we need rigorously defined quantities for summarizing the information contained in a sample. These sample quantities typically have mathematical properties, to be developed in the following chapters, that allow us to make probability statements regarding the goodness of our inferences. The quantities we define are numerical descriptive measures of a set of data. We seek some numbers that have meaningful interpretations and that can be used to describe the frequency distribution for any set of measurements. We will confine our attention to two types of descriptive numbers: measures of central tendency and measures of dispersion or variation. Probably the most common measure of central tendency used in statistics is the arithmetic mean. (Because this is the only type of mean discussed in this text, we will omit the word arithmetic.) DEFINITION 1.1 The mean of a sample of n measured responses y1, y2,., yn is given by y = 1 n bn i=1 yi . The corresponding population mean is denoted μ. The symbol y, read “y bar,” refers to a sample mean. We usually cannot measure the value of the population mean, μ; rather, μ is an unknown constant that we may want to estimate using sample information. The mean of a set of measurements only locates the center of the distribution of data; by itself, it does not provide an adequate description of a set of measurements. Two sets of measurements could have widely different frequency distributions but equal means, as pictured in Figure 1.3. The difference between distributions I and II in the figure lies in the variation or dispersion of measurements on either side of the mean. To describe data adequately, we must also define measures of data variability. The most common measure of variability used in statistics is the variance, which is a function of the deviations (or distances) of the sample measurements from their mean. & " "" & FIGURE 1.3 Frequency distributions with equal means but different amounts of variation Copyright 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it
10Chapter 1What Is Statistics?DEFINITION 1.2The variance of a sample of measurements yi, y2+.*-, yn is the sum of thesquare of the differences between the measurements and their mean,dividedby n --1. Symbolically, the sample variance is52-~)2The corresponding population variance is denoted by the symbol g?Notice that we divided by n -1 instead of by n in our definition of s.Thetheoretical reason for this choice of divisor is provided in Chapter 8, where we willshow that s? defined this way provides a "better"estimator for the true populationvariance,2.Nevertheless, it is useful to think of s’ as“almost"the average of thesquared deviations of the observed values from their mean. The larger the variance ofa set of measurements,thegreater will be the amount of variation within the set.Thevariance is of value in comparing the relative variation of two sets of measurements,but it gives information about the variation in a single set only when interpreted interms ofthe standard deviation.DEFINITION1.3The standard deviation of a sample ofmeasurements is the positive square rootof the variance; that is,S= V2.The corresponding population standard deviation is denoted by =Although it is closely related to the variance, the standard deviation can be used togiveafairlyaccuratepictureofdatavariationfora singlesetofmeasurements.Itcanbeinterpreted using Tchebysheff's theorem (which is discussed in Exercise 1.32 and willbe presentedformally in Chapter 3)andby the empirical rule (which we now explain).Many distributions of data in real life are mound-shaped; that is, they can beapproximated by a bell-shaped frequency distribution known as a normal curve.Data possessing mound-shaped distributions have definite characteristics of varia-tion, as expressed in thefollowing statement.Empirical RuleFor a distribution of measurements that is approximatelynormal (bell shaped),it follows that the interval with end pointsμ±containsapproximately68%ofthemeasurements.μ±2gcontainsapproximately95%ofthemeasurements.μ ± 3a contains almost all of the measurements.Copyright 2011 C
10 Chapter 1 What Is Statistics? DEFINITION 1.2 The variance of a sample of measurements y1, y2,., yn is the sum of the square of the differences between the measurements and their mean, divided by n − 1. Symbolically, the sample variance is s 2 = 1 n − 1 bn i=1 (yi − y) 2 . The corresponding population variance is denoted by the symbol σ 2 . Notice that we divided by n − 1 instead of by n in our definition of s 2 . The theoretical reason for this choice of divisor is provided in Chapter 8, where we will show that s 2 defined this way provides a “better” estimator for the true population variance, σ 2 . Nevertheless, it is useful to think of s 2 as “almost” the average of the squared deviations of the observed values from their mean. The larger the variance of a set of measurements, the greater will be the amount of variation within the set. The variance is of value in comparing the relative variation of two sets of measurements, but it gives information about the variation in a single set only when interpreted in terms of the standard deviation. DEFINITION 1.3 The standard deviation of a sample of measurements is the positive square root of the variance; that is, s = √ s 2 . The corresponding population standard deviation is denoted by σ = √ σ 2 . Although it is closely related to the variance, the standard deviation can be used to give a fairly accurate picture of data variation for a single set of measurements. It can be interpreted using Tchebysheff’s theorem (which is discussed in Exercise 1.32 and will be presented formally in Chapter 3) and by the empirical rule (which we now explain). Many distributions of data in real life are mound-shaped; that is, they can be approximated by a bell-shaped frequency distribution known as a normal curve. Data possessing mound-shaped distributions have definite characteristics of variation, as expressed in the following statement. Empirical Rule For a distribution of measurements that is approximately normal (bell shaped), it follows that the interval with end points μ ± σ contains approximately 68% of the measurements. μ ± 2σ contains approximately 95% of the measurements. μ ± 3σ contains almost all of the measurements. Copyright 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it
11ExercisesFIGURE1.4Normal curve68%AAs was mentioned in Section 1.2,once the frequency distribution of a set of mea-surements is known,probability statements regarding the measurements can be made.These probabilities were shown as areas under a frequency histogram. Analogously,the probabilities specified in theempirical rule are areas underthe normal curve shownin Figure 1.4.Use of theempirical rule is illustrated bythe followingexample.Suppose that thescores on an achievement test given to all high school seniors in a state areknown tohave, approximately, a normal distribution with mean μ = 64 and standard deviation=10.Itcanthenbededucedthatapproximately68%ofthescoresarebetween54and74,thatapproximately95% of the scores arebetween44 and 84, and that almostall ofthe scores arebetween 34 and 94.Thus,knowledge of themean and the standarddeviation gives us a fairly good picture of the frequency distribution of scores.Suppose that a single high school student is randomly selected from those who tookthe test.What is the probability that his score will be between 54 and 74?Basedon theempirical rule, we find that 0.68 is a reasonable answer to this probability question.The utility and value of the empirical rule are due to the common occurrenceof approximately normal distributionsof data innaturemoreso because theruleapplies to distributions that are not exactly normal but just mound-shaped.You willfind that approximately 95% of a setof measurements will be within 2o of μfor avariety of distributions.Exercises1.9Resting breathing rates for college-age students are approximately normally distributed withmean12andstandarddeviation2.3breathsperminute.Whatfractionofallcollege-agestudentshavebreathingrates inthefollowing intervals?9.7 to 14.3 breaths per minute7.4 to 16.6 breaths per minutebc9.7 to 16.6breaths per minutedLessthan5.1ormorethan18.9breathsperminute1.10It has been projected that the average and standard deviation of the amount of time spent onlineusing the Internet are, respectively, 14 and 17 hours per person per year (many do not usethe Internet at all).What value is exactly 1 standard deviation below the mean?1bIf the amount of time spent online using the Internet is approximately normally distributed,what proportion of the users spend an amount of time online that is less than the value youfound in part (a)?Copyright 2011 Cengage LeaAll Rightoleor in parthapter(s)Editor
Exercises 11 68% ' ' & FIGURE 1.4 Normal curve As was mentioned in Section 1.2, once the frequency distribution of a set of measurements is known, probability statements regarding the measurements can be made. These probabilities were shown as areas under a frequency histogram. Analogously, the probabilities specified in the empirical rule are areas under the normal curve shown in Figure 1.4. Use of the empirical rule is illustrated by the following example. Suppose that the scores on an achievement test given to all high school seniors in a state are known to have, approximately, a normal distribution with mean μ = 64 and standard deviation σ = 10. It can then be deduced that approximately 68% of the scores are between 54 and 74, that approximately 95% of the scores are between 44 and 84, and that almost all of the scores are between 34 and 94. Thus, knowledge of the mean and the standard deviation gives us a fairly good picture of the frequency distribution of scores. Suppose that a single high school student is randomly selected from those who took the test. What is the probability that his score will be between 54 and 74? Based on the empirical rule, we find that 0.68 is a reasonable answer to this probability question. The utility and value of the empirical rule are due to the common occurrence of approximately normal distributions of data in nature—more so because the rule applies to distributions that are not exactly normal but just mound-shaped. You will find that approximately 95% of a set of measurements will be within 2σ of μ for a variety of distributions. Exercises 1.9 Resting breathing rates for college-age students are approximately normally distributed with mean 12 and standard deviation 2.3 breaths per minute. What fraction of all college-age students have breathing rates in the following intervals? a 9.7 to 14.3 breaths per minute b 7.4 to 16.6 breaths per minute c 9.7 to 16.6 breaths per minute d Less than 5.1 or more than 18.9 breaths per minute 1.10 It has been projected that the average and standard deviation of the amount of time spent online using the Internet are, respectively, 14 and 17 hours per person per year (many do not use the Internet at all!). a What value is exactly 1 standard deviation below the mean? b If the amount of time spent online using the Internet is approximately normally distributed, what proportion of the users spend an amount of time online that is less than the value you found in part (a)? Copyright 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it