The Simple Regression Model ed value of y by the amount B. For any given value of x, the distribution of y is cen- tered about E(ylr), as illustrated in Figure 2.1 When(2.6)is true, it is useful to break y into two components. The piece Bo+B,x is sometimes called the systematic part of y--that is, the part of y explained by x-and u is called the unsystematic part, or the part of y not explained by x. We will use assumption(2.6)in the next section for motivating estimates of Bo and B,. This assump- tion is also crucial for the statistical analysis in Section 2.5 2.2 DERIVING THE ORDINARY LEAST SQUARES ESTIMAES Now that we have discussed the basic ingredients of the simple regression model, we will address the important issue of how to estimate the parameters Bo and B, in equa- tion(2.1). To do this, we need a sample from the population. Let ((r, yi): i=1 denote a random sample of size n from the population. Since these data come from (2.1), we can write Bo+ Bx for each i. here. u is the error term for observation i since it contains all factors affect y: other than x As an example, x; might be the annual income and y; the annual savings for family i during a particular year. If we have collected data on 15 families, then n= 15. A scat ter plot of such a data set is given in Figure 2.2, along with the(necessarily fictitious population regression function. We must decide how to use these data to obtain estimates of the intercept and slope the population regression of savings on income There are several ways to motivate the following estimation procedure. We will use (2.5)and an important implication of assumption(2.6): in the population, u has a zero mean and is uncorrelated with x. Therefore, we see that u has zero expected value and that the covariance between x and u is zero Cov(x, u)= E(xu)=0, 21 where the first equality in(2 11)follows from (2.10). (See Section B 4 for the defini- tion and properties of covariance. In terms of the observable variables x and y and the unknown parameters Bo and BI, equations(2. 10)and (2 1 1)can be written as E(y-B0-B1x)=0 212) Exoy B1x]=0 213) respectively. Equations(2. 12)and (2. 13)imply two restrictions on the joint probability distribution of (x, y)in the population. Since there are two unknown parameters to esti- late, we might hope that equations(2. 12)and (2.13) can be used to obtain good esti
ed value of y by the amount 1. For any given value of x, the distribution of y is centered about E(yx), as illustrated in Figure 2.1. When (2.6) is true, it is useful to break y into two components. The piece 0 1x is sometimes called the systematic part of y—that is, the part of y explained by x—and u is called the unsystematic part, or the part of y not explained by x. We will use assumption (2.6) in the next section for motivating estimates of 0 and 1. This assumption is also crucial for the statistical analysis in Section 2.5. 2.2 DERIVING THE ORDINARY LEAST SQUARES ESTIMATES Now that we have discussed the basic ingredients of the simple regression model, we will address the important issue of how to estimate the parameters 0 and 1 in equation (2.1). To do this, we need a sample from the population. Let {(xi ,yi ): i1,…,n} denote a random sample of size n from the population. Since these data come from (2.1), we can write yi 0 1xi ui (2.9) for each i. Here, ui is the error term for observation i since it contains all factors affecting yi other than xi . As an example, xi might be the annual income and yi the annual savings for family i during a particular year. If we have collected data on 15 families, then n 15. A scatter plot of such a data set is given in Figure 2.2, along with the (necessarily fictitious) population regression function. We must decide how to use these data to obtain estimates of the intercept and slope in the population regression of savings on income. There are several ways to motivate the following estimation procedure. We will use (2.5) and an important implication of assumption (2.6): in the population, u has a zero mean and is uncorrelated with x. Therefore, we see that u has zero expected value and that the covariance between x and u is zero: E(u) 0 (2.10) Cov(x,u) E(xu) 0, (2.11) where the first equality in (2.11) follows from (2.10). (See Section B.4 for the definition and properties of covariance.) In terms of the observable variables x and y and the unknown parameters 0 and 1, equations (2.10) and (2.11) can be written as E(y 0 1x) 0 (2.12) and E[x(y 0 1x)] 0, (2.13) respectively. Equations (2.12) and (2.13) imply two restrictions on the joint probability distribution of (x,y) in the population. Since there are two unknown parameters to estimate, we might hope that equations (2.12) and (2.13) can be used to obtain good estiChapter 2 The Simple Regression Model 27 d 7/14/99 4:30 PM Page 27
alysis with Cross-Sectional Data Scatterplot of savings and income for 15 families, and the population regression E(savingslincome)=B.+B, income mators of Bo and B,. In fact, they can be Given a sample of data, we choose estimates Bo and B, to solve the sample counterparts of(2. 12)and(2.13) ∑(-A-B1x)=0 214 n∑x(y2-A0-A1x)=0 This is an example of the method of moments approach to estimation. (See Section C 4 for a discussion of different estimation approaches. These equations can be solved for Bo and Bl Using the basic properties of the summation operator from Appendix A, equation (2. 14)can be rewritten as 5=A+,2.16) wherey=n-i>yis the sample average of the y, and likewise for i. This equation allows us to write Bo in terms of B1,y, and i
mators of 0 and 1. In fact, they can be. Given a sample of data, we choose estimates ˆ 0 and ˆ 1 to solve the sample counterparts of (2.12) and (2.13): n1 n i1 (yi ˆ 0 ˆ 1xi ) 0. (2.14) n1 n i1 xi (yi ˆ 0 ˆ 1xi ) 0. (2.15) This is an example of the method of moments approach to estimation. (See Section C.4 for a discussion of different estimation approaches.) These equations can be solved for ˆ 0 and ˆ 1. Using the basic properties of the summation operator from Appendix A, equation (2.14) can be rewritten as y¯ ˆ 0 ˆ 1x¯, (2.16) where y¯ n1 n i1 yi is the sample average of the yi and likewise for x¯. This equation allows us to write ˆ 0 in terms of ˆ 1, y¯, and x¯: Part 1 Regression Analysis with Cross-Sectional Data 28 Figure 2.2 Scatterplot of savings and income for 15 families, and the population regression E(savingsincome) 0 1income. E(savingsincome) 0 1income savings 0 income 0 d 7/14/99 4:30 PM Page 28
The Simple Regression Model B=j-B1元. 2们 Therefore, once we have the slope estimate B,, it is straightforward to obtain the inter- cept estimate Bo, given y and i. Dropping the n in(2.15)(since it does not affect the solution) and plugging(2.17) into(2. 15) yields xy-(5-B1-B1 which, upon rearrangement, gives )=B∑x From basic properties of the summation operator [seeA7)and (A 8)] 尽xx1-x=2x-3and∑x0,-=2x-0y- Therefore, provided that (x2-)2>0 218) the estimated slope is (x1-元)(-y) BI 219) uation(2.19)is simply the sample covariance between x and y divided by the sam- ple variance of x. (See Appendix C. Dividing both the numerator and the denominator by n-l changes nothing. This makes sense because B, equals the population covari ance divided by the variance of x when E(u)=0 and Cov(r )=0. An immediate implication is that if x and y are positively correlated in the sample, then B, is positive if x and y are negatively correlated, then B, is negative Although the method for obtaining(2.17)and(2.19)is motivated by (2.6), the only assumption needed to compute the estimates for a particular sample is2. 18). This is hardly an assumption at all: (2. 18)is true provided the x; in the sample are not all equal to the same value. If(2.18)fails, then we have either been unlucky in obtaining our sample from the population or we have not specified an interesting problem(x does not vary in the population. ) For example, if y wage and x=educ, then(2. 18) fails only if everyone in the sample has the same amount of education.( For example, if everyone is a high school graduate. See Figure 2.3. ) If just one person has a different amount of education, then(2. 18)holds, and the Ols estimates can be computed
ˆ 0 y¯ ˆ 1x¯. (2.17) Therefore, once we have the slope estimate ˆ 1, it is straightforward to obtain the intercept estimate ˆ 0, given y¯ and x¯. Dropping the n1 in (2.15) (since it does not affect the solution) and plugging (2.17) into (2.15) yields n i1 xi (yi (y¯ ˆ 1x¯) ˆ 1xi ) 0 which, upon rearrangement, gives n i1 xi (yi y¯) ˆ 1 n i1 xi (xi x¯). From basic properties of the summation operator [see (A.7) and (A.8)], n i1 xi (xi x¯) n i1 (xi x¯)2 and n i1 xi (yi y¯) n i1 (xi x¯)(yi y¯). Therefore, provided that n i1 (xi x¯)2 0, (2.18) the estimated slope is ˆ 1 . (2.19) Equation (2.19) is simply the sample covariance between x and y divided by the sample variance of x. (See Appendix C. Dividing both the numerator and the denominator by n 1 changes nothing.) This makes sense because 1 equals the population covariance divided by the variance of x when E(u) 0 and Cov(x,u) 0. An immediate implication is that if x and y are positively correlated in the sample, then ˆ 1 is positive; if x and y are negatively correlated, then ˆ 1 is negative. Although the method for obtaining (2.17) and (2.19) is motivated by (2.6), the only assumption needed to compute the estimates for a particular sample is (2.18). This is hardly an assumption at all: (2.18) is true provided the xi in the sample are not all equal to the same value. If (2.18) fails, then we have either been unlucky in obtaining our sample from the population or we have not specified an interesting problem (x does not vary in the population.). For example, if y wage and x educ, then (2.18) fails only if everyone in the sample has the same amount of education. (For example, if everyone is a high school graduate. See Figure 2.3.) If just one person has a different amount of education, then (2.18) holds, and the OLS estimates can be computed. n i1 (xi x¯) (yi y¯) n i1 (xi x¯)2 Chapter 2 The Simple Regression Model 29 d 7/14/99 4:30 PM Page 29
alysis with Cross-Sectional Data Figure 2.3 A scatterplot of wage against education when educ;= 12 for all i 0 The estimates given in(2.17)and(2. 19)are called the ordinary least squares (OLS) estimates of Bo and B. To justify this name, for any Bo and Bl, define a fitted value for y when x= x such yi= Bo 220 for the given intercept and slope. This is the value we predict for y when x=x. There a fitted value for each observation in the sample. The residual for observation i is the difference between the actual y, and its fitted value: 1=yi-yi=y-Bo- Bx, 221 Again, there are n such residuals. ( These are not the same as the errors in(2.9), a point we return to in Section 2.5.) The fitted values and residuals are indicated in Figure 2.4 Now, suppose we choose Bo and B, to make the sum of squared residuals, ∑好=∑(-A-A1x)2 222)
The estimates given in (2.17) and (2.19) are called the ordinary least squares (OLS) estimates of 0 and 1. To justify this name, for any ˆ 0 and ˆ 1, define a fitted value for y when x xi such as yˆi ˆ 0 ˆ 1xi , (2.20) for the given intercept and slope. This is the value we predict for y when x xi . There is a fitted value for each observation in the sample. The residual for observation i is the difference between the actual yi and its fitted value: uˆi yi yˆi yi ˆ 0 ˆ 1xi . (2.21) Again, there are n such residuals. (These are not the same as the errors in (2.9), a point we return to in Section 2.5.) The fitted values and residuals are indicated in Figure 2.4. Now, suppose we choose ˆ 0 and ˆ 1 to make the sum of squared residuals, n i1 uˆi 2 n i1 (yi ˆ 0 ˆ 1xi ) 2 , (2.22) Part 1 Regression Analysis with Cross-Sectional Data 30 Figure 2.3 A scatterplot of wage against education when educi 12 for all i. wage 0 12 educ d 7/14/99 4:30 PM Page 30
The Simple Regression Model Fi Fitted values and residuals y=βo+βx y =Fitted value y as small as possible. The appendix to this chapter shows that the conditions necessary for(Bo.B,) to minimize(2.22)are given exactly by equations(2. 14)and(2. 15), without n.Equations(2.14)and(2.15) are often called the first order conditions for the Ols estimates, a term that comes from optimization using calculus(see Appendix A). From our previous calculations, we know that the solutions to the ols first order conditions are given by (2.17)and (2. 19). The name"ordinary least squares"comes from the fact Once we have determined the OLS intercept and slope estimates, we form the OlS egression line t Bx where it is understood that &o and B, have been obtained using equations(2. 17) and (2. 19). The notation y, read as"y hat, emphasizes that the predicted values from equa tion(2.23)are estimates. The intercept, Bo, is the predicted value of y when x=0, although in some cases it will not make sense to set x= 0. In those situations, Bo is not, in itself, very interesting. When using(2.23) to compute predicted values of y for vari ous values of x, we must account for the intercept in the calculations. Equation(2.23) is also called the sample regression function(SRF) because it is the estimated version of the population regression function E(lx)=B.+ B-r. It is important to remember that the Prf is something fixed, but unknown, in the population. Since the SrF is
as small as possible. The appendix to this chapter shows that the conditions necessary for (ˆ 0,ˆ 1) to minimize (2.22) are given exactly by equations (2.14) and (2.15), without n1 . Equations (2.14) and (2.15) are often called the first order conditions for the OLS estimates, a term that comes from optimization using calculus (see Appendix A). From our previous calculations, we know that the solutions to the OLS first order conditions are given by (2.17) and (2.19). The name “ordinary least squares” comes from the fact that these estimates minimize the sum of squared residuals. Once we have determined the OLS intercept and slope estimates, we form the OLS regression line: yˆ ˆ 0 ˆ 1x, (2.23) where it is understood that ˆ 0 and ˆ 1 have been obtained using equations (2.17) and (2.19). The notation yˆ, read as “y hat,” emphasizes that the predicted values from equation (2.23) are estimates. The intercept, ˆ 0, is the predicted value of y when x 0, although in some cases it will not make sense to set x 0. In those situations, ˆ 0 is not, in itself, very interesting. When using (2.23) to compute predicted values of y for various values of x, we must account for the intercept in the calculations. Equation (2.23) is also called the sample regression function (SRF) because it is the estimated version of the population regression function E(yx) 0 1x. It is important to remember that the PRF is something fixed, but unknown, in the population. Since the SRF is Chapter 2 The Simple Regression Model 31 Figure 2.4 Fitted values and residuals. y 0 1x y ˆ ˆ ˆ x1 xi x yi yi Fitted value y1 ûi residual ˆ d 7/14/99 4:30 PM Page 31