Ch. 7 Violations of the ideal conditions 1 ST pecification 1.1 Selection of variables Consider a initial model. which we assume that Y=x1/1+E, It is not unusual to begin with some formulation and then contemplate adding more variable(regressors) to the model Y=X1B1+X262+E Let Ri be the R-square of the model with fewer regressor, and Ri2 be the R-square of the model with more regressors. It is apparent as we have shown earlier that Ri2>R3. Clearly, it would be possible to push R2 as high as desired by adding regressors. This problem motivates the use of the adjusted R-square, T-1 (1-R2) It has been suggested that the adjusted R-square does not penalize the loss of degree of freedom heavily, two alternative have been proposed for comparing models are T+k (1-B2) and Akaike' s information criterion In(e:)2k 2k AIc In o+ Although intuitively appealing, these measures are a bit unorthodox in that they have no firm basis in theory (unless that are used in time series analysis model). Perhaps a somewhat more palatable alternative is the method of step- wise regression; However, economists have tends to avoid stepwise regression method for the break down of inference procedures
Ch. 7 Violations of the Ideal Conditions 1 Specification 1.1 Selection of Variables Consider a initial model, which we assume that Y = X1β1 + ε, It is not unusual to begin with some formulation and then contemplate adding more variable (regressors) to the model: Y = X1β1 + X2β2 + ε. Let R2 1 be the R-square of the model with fewer regressor, and R2 12 be the R-square of the model with more regressors. It is apparent as we have shown earlier that R2 12 > R2 1 . Clearly, it would be possible to push R2 as high as desired by adding regressors. This problem motivates the use of the adjusted R-square, R¯2 = 1 − T − 1 T − k (1 − R 2 ) It has been suggested that the adjusted R-square does not penalize the loss of degree of freedom heavily, two alternative have been proposed for comparing models are R˜2 j = T + kj T − Kj (1 − R 2 j ) and Akaike’s information criterion: AICj = ln e 0 jej T + 2kj T = ln σˆ 2 j + 2kj T . Although intuitively appealing, these measures are a bit unorthodox in that they have no firm basis in theory (unless that are used in time series analysis model). Perhaps a somewhat more palatable alternative is the method of stepwise regression; However, economists have tends to avoid stepwise regression method for the break down of inference procedures. 1
1.2 Omission of relevant variables Suppose that a correctly specified regression model would be Y=x11+X2B2 where the two parts of X have ki and k2 columns, respectively. If we regress Y on Xi without including X2, that is you have estimate the model Y=X1B1+E and obtain the estimator as B1=(X1X1)-x1Y=(X1X1)-1x1(X1B1+X2B2+e) B1+(X1X1)-1X1A2+(X1X1)-1X1e Taking the expectation, we see that unless X1X2=0 or B2=0, 6, is biase E(B1)=B1+(X1X1)-1X1X22 The variance of B, is (1)=a2(X1X1)-1 If we had computed the correct regression, including X2, then the slope estimator on Xl, denoted by B,2 would have a covariance matrix equal to the upper left block of o2(X'X)-,i.e V va(32)|=a2(xXx)-1= XiXi XiX Var(B22) XXI XX X1X1-X1X2(X2X2)-1x2x1 Var(312)=a2(X1X1-X1X2(X2X2)-1x2X1]-1. We can compare the covariance matrix of B, and B12 more easily by comparing their inverse Var(1)-1-Var(B12)-1=(1/o2)X1X2(X2X2)-1X2X1
1.2 Omission of Relevant variables Suppose that a correctly specified regression model would be Y = X1β1 + X2β2 + ε, where the two parts of X have k1 and k2 columns, respectively. If we regress Y on X1 without including X2, that is you have estimate the model Y = X1β1 + ε, and obtain the estimator as βˆ 1 = (X0 1X1) −1X0 1Y = (X0 1X1) −1X0 1 (X1β1 + X2β2 + ε) = β1 + (X0 1X1) −1X0 1β2 + (X0 1X1) −1X0 1ε. Taking the expectation, we see that unless X0 1X2 = 0 or β2 = 0, βˆ 1 is biased: E(βˆ 1 ) = β1 + (X0 1X1) −1X0 1X2β2. The variance of βˆ 1 is V ar(βˆ 1 ) = σ 2 (X0 1X1) −1 . If we had computed the correct regression, including X2, then the slope estimator on X1, denoted by βˆ 12 would have a covariance matrix equal to the upper left block of σ 2 (X0X) −1 , i.e. V ar(βˆ) = V ar(βˆ 12) V ar(βˆ 22) = σ 2 (X0X) −1 = σ 2 X0 1X1 X0 1X2 X0 2X1 X0 2X1 −1 = σ 2 [X0 1X1 − X0 1X2(X0 2X2) −1X0 2X1] −1 ∗ ∗ ∗ , or V ar(βˆ 12) = σ 2 [X0 1X1 − X0 1X2(X0 2X2) −1X0 2X1] −1 . We can compare the covariance matrix of βˆ 1 and βˆ 12 more easily by comparing their inverse: V ar(βˆ 1 ) −1 − V ar(βˆ 12) −1 = (1/σ2 )X0 1X2(X0 2X2) −1X0 2X1, 2
which is nonnegative definite. We conclude that although B, is biased, it has a smaller variance than B12 Lemma Let a be a positive definite(n xn) matrix and let b denote any nonzero(n x m) matrix. Then b ab is nonnegative definite Proof: Let x be ant nonzero vector. Define B Then x can be any vector including the zero vector. Then x'B′ABx=xAx>0 from the positive definiteness of matrix A For statistical inference, it would be necessary to estimate o2. Proceeding as usual. we would enel T一k 1=M1Y=M1(X1月1+X22+E)=M1X22+M1e Thus Ee1e1]=B2X2M1X2/2+o2tr(M1)=B2X2M1X22+a2(T-k1) It is simple to see that B2 X2M1X2B2 is positive(how? )so s2 is biased upward The conclusion is that if we omit relevant variables from the regression, then our estimate of both B1 and a are biased although it is possible that Bi is more precise than B12 1. 3 Inclusion of relevant variables If the regression model is correct given by
which is nonnegative definite. We conclude that although βˆ 1 is biased, it has a smaller variance than βˆ 12. Lemma: Let A be a positive definite (n×n) matrix and let B denote any nonzero (n×m) matrix. Then B0AB is nonnegative definite. Proof: Let x be ant nonzero vector. Define x˜ ≡ Bx. Then x˜ can be any vector including the zero vector. Then x 0B 0ABx = x˜ 0Ax˜ ≥ 0 from the positive definiteness of matrix A. For statistical inference, it would be necessary to estimate σ 2 . Proceeding as usual, we would use s 2 = e 0 1e1 T − k1 . But e1 = M1Y = M1(X1β1 + X2β2 + ε) = M1X2β2 + M1ε. Thus, E[e 0 1e1] = β2 0X0 2M1X2β2 + σ 2 tr(M1) = β2 0X0 2M1X2β2 + σ 2 (T − k1). It is simple to see that β2 0X0 2M1X2β2 is positive (how ?) so s 2 is biased upward. The conclusion is that if we omit relevant variables from the regression, then our estimate of both β1 and σ 2 are biased although it is possible that βˆ1 is more precise than βˆ12. 1.3 Inclusion of Irrelevant Variables If the regression model is correct given by Y = X1β1 + ε, 3
and we estimate it b Y=x11+X2B2 from partitioned regression estimator, we obtain that B1=(X1M2X1)x1M2Y=(X1M2X1)-X1M2(X1/1+e) B1+(X1M2X1)-1X1M2e B2=(X2M1X2)-x2M1Y=(X2M1X2)-x2M1(X1B1+e) 0+(X2M1X2)-x2M Therefore, E(B,)=B, and E(B2)=0 Exercise: Show that s2 is unbiased ee T-k1-k2 Then what's the problem? It would seem that one would generally want to overfit" the model. However the cost is the reduction of the precision of the e. As we have seen that the covariance matrix of the shorter regressors in never larger than the covariance matrix for the estimator obtained in the presence of the superfluous variables 2 Functional form 2.1 Dummy Variables One of the most useful devices in regression analysis is the binary, or dummy variables, which takes value of only 0 and 1
and we estimate it by Y = X1β1 + X2β2 + ε, from partitioned regression estimator, we obtain that βˆ 1 = (X0 1M2X1 ) −1X0 1M2Y = (X0 1M2X1 ) −1X0 1M2(X1β1 + ε) = β1 + (X0 1M2X1 ) −1X0 1M2ε, and βˆ 2 = (X0 2M1X2 ) −1X0 2M1Y = (X0 2M1X2 ) −1X0 2M1(X1β1 + ε) = 0 + (X0 2M1X2 ) −1X0 2M1ε. Therefore, E(βˆ 1 ) = β1 and E(βˆ 2 ) = 0. Exercise: Show that s 2 is unbiased: E e 0e T − k1 − k2 = σ 2 . Then what’s the problem ? It would seem that one would generally want to ”overfit” the model. However the cost is the reduction of the precision of the estimate. As we have seen that the covariance matrix of the shorter regressors in never larger than the covariance matrix for the estimator obtained in the presence of the superfluous variables. 2 Functional Form 2.1 Dummy Variables One of the most useful devices in regression analysis is the binary, or dummy variables, which takes value of only 0 and 1. 4
2.1.1 Comparing Two Mean If a model describe the salary-paid function by y=u+rB+E where u can be regard as"initial-pay"to anyone(even)with different academic degree. This model can be made more realistic by dividing the"initial-pay"into two category: individuals attending college and not attending college. Formally y=+8d:+x3+6, where d =1, if attending college di=0, if not attending college Logically,a>0 and di is the dummy variable. The above model can also be written equivalently as y=odli+ nd2i +'b+e e dui =0, if not attending college and dai=0, if attending college dli=l, if attending college dai= 1, if not attending college but not y=+6d1+nd21+x+e to avoid dummy trap. Therefore to remove seasonal effect, we need 4 dummy without a common mean or use 3 dummy with a common mean( see eq. 7-1 at p.118) 2.2 Nonlinearity in the Variables The linear model we proposed is not as"limited as the first glance. By using garithms, exponential, reciprocal, transcendental functions and polynomials and so on, this"linear model is also useful the general form (y)=A1f1(2)+A22(x)+….+kfk(x)+E B1x1+/2 BkCk+ 6+ which can be tailored to any number of situations
2.1.1 Comparing Two Mean If a model describe the salary-paid function by y = µ + x 0β + ε, where µ can be regard as ”initial-pay” to anyone (even) with different academic degree. This model can be made more realistic by dividing the ”initial-pay” into two category: individuals attending college and not attending college. Formally y = µ + δdi + x 0β + ε, where di = 1, if attending college di = 0, if not attending college. Logically, δ > 0 and di is the dummy variable. The above model can also be written equivalently as y = δd1i + ηd2i + x 0β + ε where d1i = 1, if attending college d1i = 0, if not attending college and d2i = 0, if attending college d2i = 1, if not attending college but not y = µ + δd1i + ηd2i + x 0β + ε, to avoid dummy trap. Therefore, to remove seasonal effect, we need 4 dummy without a common mean or use 3 dummy with a common mean (see eq. 7-1 at p. 118). 2.2 Nonlinearity in the Variables The linear model we proposed is not as ”limited’ as the first glance. By using logarithms, exponential, reciprocal, transcendental functions and polynomials, and so on, this ”linear model is also useful the general form: g(y) = β1f1(z) + β2f2(z) + ... + βkfk(z) + ε = β1x1 + β2x2 + ... + βkxk + ε = x 0β + ε. which can be tailored to any number of situations. 5