Individuals and variables Lecture 3 Individuals are the objects described by a set of data. Individuals may be people, but they may also be animals Relationship things Between variables a variable is any characteristic of an individual. a variable can take different values for different individuals The 1997 survey data set, for example includes data about a sample of women The individuals described are the women at childbearing ages Each row recodes data on one individual The women are the individuals You will often see each row of data called described by the data set For each a case. each column contains the values individual. the data contain the values of of one variable for all the individuals variables such as date of birth, place of Most data sets follow this format---each residence and educational level row is an individual, and each column is a In practice, any set of data variable ccompanied by background information that helps us understand the data
1 1 Lecture 3 Relationship Between Variables 2 Individuals and Variables • Individuals are the objects described by a set of data. Individuals may be people, but they may also be animals or things. • A variable is any characteristic of an individual. A variable can take different values for different individuals. 3 • The 1997 survey data set, for example, includes data about a sample of women at childbearing ages. • The women are the individuals described by the data set. For each individual, the data contain the values of variables such as date of birth, place of residence, and educational level. • In practice, any set of data is accompanied by background information that helps us understand the data. 4 • The individuals described are the women. Each row recodes data on one individual. You will often see each row of data called a case. Each column contains the values of one variable for all the individuals. • Most data sets follow this format---each row is an individual, and each column is a variable
Measuring center: the mean To find the mean of a set of observations add their values and divide by the number of observations if the n observations are A description of a distribution almost always includes a measure of its center or x1,x2…,xn, their mean is verage. The most common measure of center is the ordinary arithmetic average, x1+x2+…+x n Or in more compact notation =∑x n EXample: mean age at first marrage Q105: When were you married for the first time? Statistics age at first marriage Valid 4134 Mean
2 5 Measuring center: the mean • A description of a distribution almost always includes a measure of its center or average. The most common measure of center is the ordinary arithmetic average, or mean. 6 • To find the mean of a set of observations, add their values and divide by the number of observations. If the n observations are , their mean is: Or in more compact notation: 1 2 , ,..., n xx x 1 2 ... n xx x x n + + + = 1 i x x n = ∑ 7 Example: mean age at first marriage • Q105: When were you married for the first time? Statistics age at first marriage 4134 872 21.04 Valid Missing N Mean 8 age at first marriage 2 .0 .0 .0 2 .0 .0 .1 4 .1 .1 .2 13 .3 .3 .5 38 .8 .9 1.4 111 2.2 2.7 4.1 185 3.7 4.5 8.6 309 6.2 7.5 16.1 528 10.5 12.8 28.8 604 12.1 14.6 43.4 609 12.2 14.7 58.2 589 11.8 14.2 72.4 435 8.7 10.5 82.9 307 6.1 7.4 90.4 191 3.8 4.6 95.0 90 1.8 2.2 97.2 58 1.2 1.4 98.6 27 .5 .7 99.2 18 .4 .4 99.7 3 .1 .1 99.7 7 .1 .2 99.9 1 .0 .0 99.9 2 .0 .0 100.0 1 .0 .0 100.0 4134 82.6 100.0 872 17.4 5006 100.0 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Total Valid Missing System Total Frequency Percent Valid Percent Cumulative Percent
age at first marriage An important point Since the single age refers to a 12 month age range, accuracy in the calculations requires that the mid-point of the range be used to represent the average age of all bers of the gro 5td Dev= 271 Use115,12.5,13,5.34.5 instead of 11513515517519521.52525527.5295315335 11,12,13,34 12514516518520.522.524.5265285305325345 Measuring center: the median 2. If the number of observations n is odd the median is the center observation in the The median is the mid-point of ordered list. Find the location of the distribution the number such that half the median by counting(n+1)/2 observations observations are smaller and the other half up( down) from the bottom(top) of the list are larger. To find the median of a 3. f the number of observations n is even distribution the median is the mean of the two center 1. Arrange all observations in order of size observations in the ordered list. The from smallest to largest location of the median is again(n+1)/2
3 9 age at first marriage 34.5 33.5 32.5 31.5 30.5 29.5 28.5 27.5 26.5 25.5 24.5 23.5 22.5 21.5 20.5 19.5 18.5 17.5 16.5 15.5 14.5 13.5 12.5 11.5 age at first marriage Frequency 700 600 500 400 300 200 100 0 Std. Dev = 2.71 Mean = 21.0 N = 4134.00 10 An important point • Since the single age refers to a 12 month age range, accuracy in the calculations requires that the mid-point of the range be used to represent the average age of all members of the group. Use 11.5, 12.5, 13,5,……34.5 instead of 11, 12, 13, ……34 11 Measuring center: the median • The median is the mid-point of a distribution, the number such that half the observations are smaller and the other half are larger. To find the median of a distribution: 1. Arrange all observations in order of size, from smallest to largest. 12 2. If the number of observations n is odd, the median is the center observation in the ordered list. Find the location of the median by counting (n+1)/2 observations up (down) from the bottom (top) of the list. 3. If the number of observations n is even, the median is the mean of the two center observations in the ordered list. The location of the median is again (n+1)/2 from the bottom (top) of the list
EXamples 9223233393942494652 2225343541414646464749 The count of observations n=10 is even There is an odd number of observations There is no center observation but there is a center pair These are two 39s. the so there is one center observation this is median is the average of these two the median It is 41 observations which is 39 location of the median=(11+1)/2=6 location of the median=(10+1)2=5.5 The median age at first The formula for the media N age at first marriage marriage F Median= /=lower limit of the age group containing the age at first marriage N=total population Valid F=cumulative frequency up to the age group containing the median Missing 872 21.00 median cy of the age i=the size of the interval of the age group containing the median
4 13 Examples There is an odd number of observations, so there is one center observation. This is the median. It is 41. n=11, location of the median=(11+1)/2=6 22 25 34 35 41 41 46 46 46 47 49 14 The count of observations n=10 is even. There is no center observation, but there is a center pair. These are two 39s. The median is the average of these two observations, which is 39. n=10, location of the median=(10+1)/2=5.5 9 22 32 33 39 39 42 49 46 52 15 The median age at first marriage Statistics age at first marriage 4134 872 21.00 Valid Missing N Median 16 The formula for the median age at first marriage: Median • l =lower limit of the age group containing the median • N =total population • F =cumulative frequency up to the age group containing the median • f =frequency of the age group containing the median • i =the size of the interval of the age group containing the median 2 N F l i f − = + ×
Comparing the mean and the M=A+AM median 4812 48120 A mB Mean=(4+8+12)3=8Mean=(4+8+1203=44 Median=8 F The median unlike the mean is resistant A+AM=/+ Measuring spread: the standard The mean and median of a symmetric deviation distribution are close together. If the distribution is exactly symmetric, the The mean to measure center and the mean and median are exactly the standard deviation to measure spread same. In a skewed distribution the The standard deviation measures mean is farther out in the long tail than is the median spread by looking at how far the observations are from their mean
5 17 • M=A+AM A M B A+AM 2 N F l i f − = + × 18 Comparing the mean and the median 4 8 12 Mean=(4+8+12)/3=8 Median=8 4 8 120 Mean=(4+8+120)/3=44 Median=8 The median, unlike the mean, is resistant. 19 • The mean and median of a symmetric distribution are close together. If the distribution is exactly symmetric, the mean and median are exactly the same. In a skewed distribution, the mean is farther out in the long tail than is the median. 20 Measuring spread: the standard deviation • The mean to measure center and the standard deviation to measure spread • The standard deviation measures spread by looking at how far the observations are from their mean