The Simple Regression Model Table 2.2(concluded obsno salary 14.8 1339 1237.009 1019911 937 1375768 38.7678 2011 2004.808 6.191895 The first four CEOs have lower salaries than what we predicted from the ols regression line (2.26); in other words, given only the firms roe, these CEOs make less than what we pre- dicted. As can be seen from the positive what, the fifth Ceo makes more than predicted from the OLS regression line Algebraic Properties of oLs Statistics There are several useful algebraic properties of oLs estimates and their associated sta- tistics. We now cover the three most important of these (1)The sum, and therefore the sample average of the Ols residuals, is zero Mathematical This property needs no proof; it follows immediately from the Ols first orde tion(2. 14), when we remember that the residuals are defined by li;=y;-Po In other words, the Ols estimates B and B, are chosen to make the residuals zero(for any data set). This says nothing about the residual for any particular (2)The sample covariance between the regressors and the ols residuals is zero This follows from the first order condition(2. 15), which can be written in terms of the residuals as 231 The sample average of the Ols residuals is zero, so the left hand side of (2.31)is pro- equation(2.23)and plug in i for x, then the predicted value is y. This is exactly what equation(2.16)shows us
Table 2.2 (concluded ) obsno roe salary salaryhat uhat 13 14.8 1339 1237.009 101.9911 14 22.3 937 1375.768 438.7678 15 56.3 2011 2004.808 006.191895 The first four CEOs have lower salaries than what we predicted from the OLS regression line (2.26); in other words, given only the firm’s roe, these CEOs make less than what we predicted. As can be seen from the positive uhat, the fifth CEO makes more than predicted from the OLS regression line. Algebraic Properties of OLS Statistics There are several useful algebraic properties of OLS estimates and their associated statistics. We now cover the three most important of these. (1) The sum, and therefore the sample average of the OLS residuals, is zero. Mathematically, n i1 uˆi 0. (2.30) This property needs no proof; it follows immediately from the OLS first order condition (2.14), when we remember that the residuals are defined by uˆi yi ˆ 0 ˆ 1xi . In other words, the OLS estimates ˆ 0 and ˆ 1 are chosen to make the residuals add up to zero (for any data set). This says nothing about the residual for any particular observation i. (2) The sample covariance between the regressors and the OLS residuals is zero. This follows from the first order condition (2.15), which can be written in terms of the residuals as n i1 xi uˆi 0. (2.31) The sample average of the OLS residuals is zero, so the left hand side of (2.31) is proportional to the sample covariance between xi and uˆi . (3) The point (x¯,y¯) is always on the OLS regression line. In other words, if we take equation (2.23) and plug in x¯ for x, then the predicted value is y¯. This is exactly what equation (2.16) shows us. Chapter 2 The Simple Regression Model 37 d 7/14/99 4:30 PM Page 37
alysis with Cross-Sectional Data E X 2.7 For the data in WAGE1 RAW, the average hourly wage in the sample is 5.90, rounded to two decimal places, and the average education is 12. 56. If we plug educ 12.56 into the OLS regression line (2.27), we get wage =-0.90+0.54(12.56)=5.8824, which equals 5.9 when rounded to the first decimal place. The reason these figures do not exactly agree is that we have rounded the average wage and education, as well as the intercept and slope estimates If we did not initially round any of the values, we would get the answers to agree more closely, but this practice has little useful effect. Writing each y, as its fitted value, plus its residual, provides another way to intepret an OLs regression. For each i, write yi=y:+ i, From property(I)above, the average of the residuals is zero; equivalently, the sample average of the fitted values, yi, is the same as the sample average of the yi, or y=y Further, properties (1) and (2) can be used to show that the sample covariance between y; and ii, is zero. Thus, we can view OLS as decomposing each y; into two parts, a fitted value and a residual. The fitted values and residuals are uncorrelated in Define the total sum of squares(SST), the explained sum of squares(SSE), and the residual sum of squares(SSR)(also known as the sum of squared residuals),as sST≡ 23) SSR SST is a measure of the total sample variation in the yi; that is, it measures how spread out the y; are in the sample. If we divide sst by n-l, we obtain the sample variance of y, as discussed in Appendix C. Similarly, Sse measures the sample variation in the y,(where we use the fact thaty=y), and Ssr measures the sample variation in the i The total variation in y can always be expressed as the sum of the explained variation and the unexplained variation SSR. thus SST= SSE SSR
EXAMPLE 2.7 (Wage and Education) For the data in WAGE1.RAW, the average hourly wage in the sample is 5.90, rounded to two decimal places, and the average education is 12.56. If we plug educ 12.56 into the OLS regression line (2.27), we get waˆge 0.90 0.54(12.56) 5.8824, which equals 5.9 when rounded to the first decimal place. The reason these figures do not exactly agree is that we have rounded the average wage and education, as well as the intercept and slope estimates. If we did not initially round any of the values, we would get the answers to agree more closely, but this practice has little useful effect. Writing each yi as its fitted value, plus its residual, provides another way to intepret an OLS regression. For each i, write yi yˆi uˆi . (2.32) From property (1) above, the average of the residuals is zero; equivalently, the sample average of the fitted values, yˆi , is the same as the sample average of the yi , or yˆ ¯ y¯. Further, properties (1) and (2) can be used to show that the sample covariance between yˆi and uˆi is zero. Thus, we can view OLS as decomposing each yi into two parts, a fitted value and a residual. The fitted values and residuals are uncorrelated in the sample. Define the total sum of squares (SST), the explained sum of squares (SSE), and the residual sum of squares (SSR) (also known as the sum of squared residuals), as follows: SST n i1 (yi y¯)2 . (2.33) SSE n i1 (yˆi y¯)2 . (2.34) SSR n i1 uˆi 2 . (2.35) SST is a measure of the total sample variation in the yi ; that is, it measures how spread out the yi are in the sample. If we divide SST by n 1, we obtain the sample variance of y, as discussed in Appendix C. Similarly, SSE measures the sample variation in the yˆi (where we use the fact that yˆ ¯ y¯), and SSR measures the sample variation in the uˆi . The total variation in y can always be expressed as the sum of the explained variation and the unexplained variation SSR. Thus, SST SSE SSR. (2.36) Part 1 Regression Analysis with Cross-Sectional Data 38 d 7/14/99 4:30 PM Page 38
The Simple Regression Model Proving(.36) is not difficult, but it requires us to use all of the properties of the sum- mation operator covered in Appendix A. Write ∑(-y2=∑(y-+(-列 2+2(-+∑⑥;-列 =SSR+22i 0-D)+SSE. Now(2.36) holds if we show that (1-5=0. 237 But we have already claimed that the sample covariance between the residuals and the fitted values is zero, and this covariance is just (2.37)divided by n-1. Thus, we have blished (2.36) Some words of caution about sst. sse. and ssr are in order. There is no uniform agreement on the names or abbreviations for the three quantities defined in equations (2.33),(2.34), and (2.35). The total sum of squares is called either SST or TSS, So there is little confusion here. Unfortunately, the explained sum of squares is sometimes called the "regression sum of squares. "If this term is given its natural abbreviation, it can eas ily be confused with the term residual sum of squares. Some regression packages refer to the explained sum of squares as the"model sum of squares To make matters even worse, the residual sum of squares is often called the"error sum of squares. This is especially unfortunate because, as we will see in Section 2.5, the errors and the residuals are different quantities. Thus, we will always call (2.35) residual sum of squares or the sum of squared residuals. We prefer to use the abbrevia tion SSr to denote the sum of squared residuals, because it is more common in econo- metric packages. Goodness-of-Fit So far, we have no way of measuring how well the explanatory or independent variable, x, explains the dependent variable, y. It is often useful to compute a number that sum- marizes how well the Ols regression line fits the data In the following discussion, be sure to remember that we assume that an intercept is estimated along with the slope. Assuming that the total sum of squares, SsT, is not equal to zero-which is true except in the very unlikely event that all the y; equal the same value-we can divide (2.36) by SSt to get 1= SSE/SST+ SSr/SST. The R-squared of the regression, sometimes called the coefficient of determination is defined as
Proving (2.36) is not difficult, but it requires us to use all of the properties of the summation operator covered in Appendix A. Write n i1 (yi y¯)2 n i1 [(yi yˆi ) (yˆi y¯)]2 n i1 [uˆi (yˆi y¯)]2 n i1 uˆi 2 2 n i1 uˆi(yˆi y¯) n i1 (yˆi y¯)2 SSR 2 n i1 uˆi(yˆi y¯) SSE. Now (2.36) holds if we show that n i1 uˆi (yˆi y¯) 0. (2.37) But we have already claimed that the sample covariance between the residuals and the fitted values is zero, and this covariance is just (2.37) divided by n1. Thus, we have established (2.36). Some words of caution about SST, SSE, and SSR are in order. There is no uniform agreement on the names or abbreviations for the three quantities defined in equations (2.33), (2.34), and (2.35). The total sum of squares is called either SST or TSS, so there is little confusion here. Unfortunately, the explained sum of squares is sometimes called the “regression sum of squares.” If this term is given its natural abbreviation, it can easily be confused with the term residual sum of squares. Some regression packages refer to the explained sum of squares as the “model sum of squares.” To make matters even worse, the residual sum of squares is often called the “error sum of squares.” This is especially unfortunate because, as we will see in Section 2.5, the errors and the residuals are different quantities. Thus, we will always call (2.35) the residual sum of squares or the sum of squared residuals. We prefer to use the abbreviation SSR to denote the sum of squared residuals, because it is more common in econometric packages. Goodness-of-Fit So far, we have no way of measuring how well the explanatory or independent variable, x, explains the dependent variable, y. It is often useful to compute a number that summarizes how well the OLS regression line fits the data. In the following discussion, be sure to remember that we assume that an intercept is estimated along with the slope. Assuming that the total sum of squares, SST, is not equal to zero—which is true except in the very unlikely event that all the yi equal the same value—we can divide (2.36) by SST to get 1 SSE/SST SSR/SST. The R-squared of the regression, sometimes called the coefficient of determination, is defined as Chapter 2 The Simple Regression Model 39 d 7/14/99 4:30 PM Page 39
alysis with Cross-Sectional Data R= SSESST=1-/SST 238) R is the ratio of the explained variation compared to the total variation, and thus it is interpreted as the fraction of the sample variation in y that is explained by x. The sec- ond equality in(2. 38) provides another way for computing R. From(2.36), the value of R-is always between zero and one, since Sse can be no greater than SST. When interpreting R, we usually multiply it by 100 to change it into a percent: 100. R is the percentage of the sample variation in y that is explained by x If the data points all lie on the same line, OLS provides a perfect fit to the data. this case, R=1. A value of R- that is nearly equal to zero indicates a poor fit of the OLS line: very little of the variation in the y; is captured by the variation in the y, ( which all lie on the Ols regression line). In fact, it can be shown that R- is equal to the square of the sample correlation coefficient between y, and y, This is where the term R-squared"came from. (The letter R was traditionally used to denote an estimate of pulation correlation coefficient, and its usage has survived in regression analysis. EXAMPLE 2.8 (CEo Salary and return on equity) In the CEo salary regression, we obtain the following any=963191+18501me(2.39) n=209,R2=00132 /e have reproduced the Ols regression line and the number of observations for clarity Using the R-squared (rounded to four decimal places) reported for this equation, we can see how much of the variation in salary is actually explained by the return on equity. The answer is: not much. The firms return on equity explains only about 1.3% of the variation in salaries for this sample of 209 CEOs. That means that 98.7% of the salary variations for these CEOs is left unexplained This lack of explanatory power may not be too surprising any other characteristics of both the firm and the individual CEo that should influence salary; these factors are necessarily included in the errors in a simple regression analysis In the social sciences, low R-squareds in regression equations are not pecially for cross-sectional analysis. We will discuss this issue more generally under multiple regression analysis, but it is worth emphasizing now that a seemingly low R- squared does not necessarily mean that an Ols regression equation is useless. It is still possible that (2.39)is a good estimate of the ceteris paribus relationship between salary and roe; whether or not this is true does not depend directly on the size of R-squared Students who are first learning econometrics tend to put too much weight on the size of the R-squared in evaluating regression equations. For now, be aware that using R-squared as the main gauge of success for an econometric analysis can lead to trouble. Sometimes the explanatory variable explains a substantial part of the sample varia tion in the dependent variable 40
R2 SSE/SST 1 SSR/SST. (2.38) R2 is the ratio of the explained variation compared to the total variation, and thus it is interpreted as the fraction of the sample variation in y that is explained by x. The second equality in (2.38) provides another way for computing R2 . From (2.36), the value of R2 is always between zero and one, since SSE can be no greater than SST. When interpreting R2 , we usually multiply it by 100 to change it into a percent: 100 R2 is the percentage of the sample variation in y that is explained by x. If the data points all lie on the same line, OLS provides a perfect fit to the data. In this case, R2 1. A value of R2 that is nearly equal to zero indicates a poor fit of the OLS line: very little of the variation in the yi is captured by the variation in the yˆi (which all lie on the OLS regression line). In fact, it can be shown that R2 is equal to the square of the sample correlation coefficient between yi and yˆi . This is where the term “R-squared” came from. (The letter R was traditionally used to denote an estimate of a population correlation coefficient, and its usage has survived in regression analysis.) EXAMPLE 2.8 (CEO Salary and Return on Equity) In the CEO salary regression, we obtain the following: salˆ ary 963.191 18.501 roe (2.39) n 209, R2 0.0132 We have reproduced the OLS regression line and the number of observations for clarity. Using the R-squared (rounded to four decimal places) reported for this equation, we can see how much of the variation in salary is actually explained by the return on equity. The answer is: not much. The firm’s return on equity explains only about 1.3% of the variation in salaries for this sample of 209 CEOs. That means that 98.7% of the salary variations for these CEOs is left unexplained! This lack of explanatory power may not be too surprising since there are many other characteristics of both the firm and the individual CEO that should influence salary; these factors are necessarily included in the errors in a simple regression analysis. In the social sciences, low R-squareds in regression equations are not uncommon, especially for cross-sectional analysis. We will discuss this issue more generally under multiple regression analysis, but it is worth emphasizing now that a seemingly low Rsquared does not necessarily mean that an OLS regression equation is useless. It is still possible that (2.39) is a good estimate of the ceteris paribus relationship between salary and roe; whether or not this is true does not depend directly on the size of R-squared. Students who are first learning econometrics tend to put too much weight on the size of the R-squared in evaluating regression equations. For now, be aware that using R-squared as the main gauge of success for an econometric analysis can lead to trouble. Sometimes the explanatory variable explains a substantial part of the sample variation in the dependent variable. Part 1 Regression Analysis with Cross-Sectional Data 40 d 7/14/99 4:30 PM Page 40
The Simple Regression Model EX 2.9 In the voting outcome equation in(2.28), R=0.505. Thus, the share of campaign expen- ditures explains just over 50 percent of the variation in the election outcomes for this sam- ple. This is a fairly sizable portion 2.4 UNTS OF MEASUREMENT AND FUNCTIONAL FORM Two important issues in applied economics are(1) understanding how changing the units of measurement of the dependent and/or independent variables affects OLS est mates and (2) knowing how to incorporate popular functional forms used in economics into regression analysis. The mathematics needed for a full understanding of func tional form issues is reviewed in Appendix A The Effects of Changing Units of Measurement on OLS Statistics In Example 2.3, we chose to measure annual salary in thousands of dollars, and the return on equity was measured as a percent(rather than as a decimal). It is crucial to know how salary and oe are measured in this example in order to make sense of the estimates in equation(2.39). We must also know that Ols estimates change in entirely expected ways when the units of measurement of the dependent and independent variables change. In Example 2.3, suppose that, rather than measuring salary in thousands of dollars, we measure it in dollars. Let salardol be salary in dollars(salardol =845, 761 would be interpreted as $845,761 ). Of course, salardol has a simple relationship to the salary measured in thousands of dollars: salardol= 1, 000 salary. We do not need to actually run the egression of salardol on roe to know that the estimated equation is: salardol=963.191+18.501moe 240 We obtain the intercept and slope in(2. 40) simply by multiplying the intercept and the slope in(2.39)by 1,000. This gives equations(2.39)and(2. 40)the same interpretation. Looking at(2. 40), if roe =0, then salardol = 963, 191, so the predicted salary is $963, 191 [the same value we obtained from equation(2.39). Furthermore, if re increases by one, then the predicted salary increases by $18,501; again, this is what we concluded from our earlier analysis of equation(2.39) Generally, it is easy to figure out what happens to the intercept and slope estimates when the dependent variable changes units of measurement. If the dependent variable is multiplied by the constant c-which means each value in the sample is multiplied by then the ols pt and slope estimates are al assumes othing has changed about the independent variable. )In 1,000 in moving from salary to salardol
EXAMPLE 2.9 (Voting Outcomes and Campaign Expenditures) In the voting outcome equation in (2.28), R2 0.505. Thus, the share of campaign expenditures explains just over 50 percent of the variation in the election outcomes for this sample. This is a fairly sizable portion. 2.4 UNITS OF MEASUREMENT AND FUNCTIONAL FORM Two important issues in applied economics are (1) understanding how changing the units of measurement of the dependent and/or independent variables affects OLS estimates and (2) knowing how to incorporate popular functional forms used in economics into regression analysis. The mathematics needed for a full understanding of functional form issues is reviewed in Appendix A. The Effects of Changing Units of Measurement on OLS Statistics In Example 2.3, we chose to measure annual salary in thousands of dollars, and the return on equity was measured as a percent (rather than as a decimal). It is crucial to know how salary and roe are measured in this example in order to make sense of the estimates in equation (2.39). We must also know that OLS estimates change in entirely expected ways when the units of measurement of the dependent and independent variables change. In Example 2.3, suppose that, rather than measuring salary in thousands of dollars, we measure it in dollars. Let salardol be salary in dollars (salardol 845,761 would be interpreted as $845,761.). Of course, salardol has a simple relationship to the salary measured in thousands of dollars: salardol 1,000 salary. We do not need to actually run the regression of salardol on roe to know that the estimated equation is: salaˆrdol 963,191 18,501 roe. (2.40) We obtain the intercept and slope in (2.40) simply by multiplying the intercept and the slope in (2.39) by 1,000. This gives equations (2.39) and (2.40) the same interpretation. Looking at (2.40), if roe 0, then salaˆrdol 963,191, so the predicted salary is $963,191 [the same value we obtained from equation (2.39)]. Furthermore, if roe increases by one, then the predicted salary increases by $18,501; again, this is what we concluded from our earlier analysis of equation (2.39). Generally, it is easy to figure out what happens to the intercept and slope estimates when the dependent variable changes units of measurement. If the dependent variable is multiplied by the constant c—which means each value in the sample is multiplied by c—then the OLS intercept and slope estimates are also multiplied by c. (This assumes nothing has changed about the independent variable.) In the CEO salary example, c 1,000 in moving from salary to salardol. Chapter 2 The Simple Regression Model 41 d 7/14/99 4:30 PM Page 41