Contents 5.3 Regression When X Is a Binary Variable 186 Interpretation of the Regression Coefficients16 5.4 Heteroskedasticity and Homoskedasticity 188 What Are Heteroskedasticity and Homoskedasticity?188 Mathematical Implications of Homoskedasticity 190 What Does This Mean in Practice?192 5.5 The Theoretical Foundations of Ordinary Least Squares 194 Linear Conditionally Unbiased Estimators and the Gauss-Markov Theorem 194 Regression Estimators Other Than OLS 195 5.6 Using the t-Statistic in Regression When the Sample Size Is Small 196 5.7 Conclusion 197 APPENDIX 5.1 Formulas for OLS Standard Errors 206 APPENDIX 5.2 The Gauss-Markov Conditions and a Proof of the Gauss-Markov Theorem 207 CHAPTER 6 Linear Regression with Multiple Regressors 211 6.1 Omitted Variable Bias 211 Addressing Omitted Variable Bias by Dividing the Data into Groups 215 6.2 The Multiple Regression Model 217 ession Line 217 6.3 The OLS Estimator in Multiple Regression 220 The OLS Estimator 220 Application to Test Scores and the Student-Teacher Ratio 221 6.4 Measures of Fit in Multiple Regression 222 The Standard Error of the Regression(SER)222 The R2 223 The Adjusted R2 223 Application to Test Scores 224 6.5 Assumptions for Causal inferece n Multiple Assumption 1:The Conditional Distribution of u Given XHasa Mean of0 225 Assumption 2:(Y),i=1.n.Arei.id.225 Assumption 3:Large Outliers Are Unlikely 225 Assumption 4:No Perfect Multicollinearity 226
5.3 Regression When X Is a Binary Variable 186 Interpretation of the Regression Coefficients 186 5.4 Heteroskedasticity and Homoskedasticity 188 What Are Heteroskedasticity and Homoskedasticity? 188 Mathematical Implications of Homoskedasticity 190 What Does This Mean in Practice? 192 5.5 The Theoretical Foundations of Ordinary Least Squares 194 Linear Conditionally Unbiased Estimators and the Gauss–Markov Theorem 194 Regression Estimators Other Than OLS 195 5.6 Using the t-Statistic in Regression When the Sample Size Is Small 196 The t-Statistic and the Student t Distribution 196 Use of the Student t Distribution in Practice 197 5.7 Conclusion 197 APPENDIX 5.1 Formulas for OLS Standard Errors 206 APPENDIX 5.2 The Gauss–Markov Conditions and a Proof of the Gauss–Markov Theorem 207 CHAPTER 6 Linear Regression with Multiple Regressors 211 6.1 Omitted Variable Bias 211 Definition of Omitted Variable Bias 212 A Formula for Omitted Variable Bias 214 Addressing Omitted Variable Bias by Dividing the Data into Groups 215 6.2 The Multiple Regression Model 217 The Population Regression Line 217 The Population Multiple Regression Model 218 6.3 The OLS Estimator in Multiple Regression 220 The OLS Estimator 220 Application to Test Scores and the Student–Teacher Ratio 221 6.4 Measures of Fit in Multiple Regression 222 The Standard Error of the Regression (SER) 222 The R2 223 The Adjusted R2 223 Application to Test Scores 224 6.5 The Least Squares Assumptions for Causal Inference in Multiple Regression 225 Assumption 1: The Conditional Distribution of ui Given X1i , X2i , . . . , Xki Has a Mean of 0 225 Assumption 2: (X1i , X2i , . . . , Xki, Yi ), i = 1, . . . , n, Are i.i.d. 225 Assumption 3: Large Outliers Are Unlikely 225 Assumption 4: No Perfect Multicollinearity 226 10 Contents A01_STOC4455_04_GE_FM.indd 10 07/01/19 4:33 PM
Contents 11 6.6 The Distribution of the OLS Estimators in Multiple Regression 227 6.7 Multicollinearity 228 Examples of Perfect Multicollinearity 228 Imperfect Multicollinearity 230 6.8 Control Variables and Conditional Mean Independence 231 Control Variables and Conditional Mean Independence 232 6.9 Conclusion 234 APPENDIX 6.1 Derivation of Equation(6.1)242 APPENDIX 6.2 Distribution of the OLS Estimators When There Are Two Regressors and Homoskedastic Errors 243 APPENDIX 6.3 The Frisch-Waugh Theorem 243 APPENDIX 6.4 The Least Squares Assumptions for Prediction with Multiple Regressors 244 APPENDIX 6.5 Distribution of OLS Estimators in Multiple Regression with Control Variables 245 CHAPTER 7 Hypothesis Tests and Confidence Intervals in Multiple Regression 247 7.1 Hypothesis Tests and Confidence Intervals for a Single Coefficient 247 Standard Errors for the OLS Estimators 247 Hypothesis Tests for a Single Coefficient 248 Confidence Intervals for a Single Coefficient 249 Application to Test Scores and the Student-Teacher Ratio 249 7.2 Tests of Joint Hypotheses 251 Testing Hypotheses on Two or More Coefficients 252 The F-Statistic 253 Application to Test Scores and the Student-Teacher Ratio 255 The Homoskedasticity-Only F-Statistic 256 7.3 Testing Single Restrictions Involving Multiple Coefficients 258 7.4 Confidence Sets for Multiple Coefficients 259 25 Model Specification for Multiple Regression 260 Model Specification and Choosing Control Variables 261 Interpreting the R2and the Adjusted R2in Practice 262 7.6 Analysis of the Test Score Data Set 262 7.7 Conclusion 268 APPENDIX 7.1 The Bonferroni Test of a Joint Hypothesis 274
Contents 11 6.6 The Distribution of the OLS Estimators in Multiple Regression 227 6.7 Multicollinearity 228 Examples of Perfect Multicollinearity 228 Imperfect Multicollinearity 230 6.8 Control Variables and Conditional Mean Independence 231 Control Variables and Conditional Mean Independence 232 6.9 Conclusion 234 APPENDIX 6.1 Derivation of Equation (6.1) 242 APPENDIX 6.2 Distribution of the OLS Estimators When There Are Two Regressors and Homoskedastic Errors 243 APPENDIX 6.3 The Frisch–Waugh Theorem 243 APPENDIX 6.4 The Least Squares Assumptions for Prediction with Multiple Regressors 244 APPENDIX 6.5 Distribution of OLS Estimators in Multiple Regression with Control Variables 245 CHAPTER 7 Hypothesis Tests and Confidence Intervals in Multiple Regression 247 7.1 Hypothesis Tests and Confidence Intervals for a Single Coefficient 247 Standard Errors for the OLS Estimators 247 Hypothesis Tests for a Single Coefficient 248 Confidence Intervals for a Single Coefficient 249 Application to Test Scores and the Student–Teacher Ratio 249 7.2 Tests of Joint Hypotheses 251 Testing Hypotheses on Two or More Coefficients 252 The F-Statistic 253 Application to Test Scores and the Student–Teacher Ratio 255 The Homoskedasticity-Only F-Statistic 256 7.3 Testing Single Restrictions Involving Multiple Coefficients 258 7.4 Confidence Sets for Multiple Coefficients 259 7.5 Model Specification for Multiple Regression 260 Model Specification and Choosing Control Variables 261 Interpreting the R2 and the Adjusted R2 in Practice 262 7.6 Analysis of the Test Score Data Set 262 7.7 Conclusion 268 APPENDIX 7.1 The Bonferroni Test of a Joint Hypothesis 274 A01_STOC4455_04_GE_FM.indd 11 06/12/18 5:49 PM
12 Contents CHAPTER8 Nonlinear Regression Functions 277 8.1 A General Strategy for Modeling Nonlinear Regression Functions 279 Test Scores and District Income 279 The Effect on Yof a Change in X in Nonlinear Specifications 282 A General Approach to Modeling Nonlinearities Using Multiple Regression 285 8.2 Nonlinear Functions of a Single Independent Variable 286 Polynomial and Logarithmic Models of Test Scores and District Income 296 8.3 Interactions Between Independent Variables 297 Interactions Between Two Binary Variables 298 Interactions Between a Continuous and a Binary Variable 300 Interactions Between Two Continuous Variables 305 8.4 Nonlinear Effects on Test Scores of the Student-Teacher Ratio 310 Discussion of Regression Results 310 Summary of Findings 314 8.5 Conclusion 315 APPENDIX 8.1 Regression Functions That Are Nonlinear in the Parameters 325 APPENDIX 8.2 Slopes and Elasticities for Nonlinear Regression Functions 328 CHAPTER 9 Assessing Studies Based on Multiple Regression 330 9.1 Internal and External Validity 330 Threats to Internal Validity 331 Threats to Extemal Validity 332 9.2 Threats to Internal Validity of Multiple Regression Analysis 333 Omitted Variable Bias 334 Misspecification of the Functional Form of the Regression Function 336 Measurement Error and Errors-in-Variables Bias 336 Missing Data and Sample Selection 339 Sources of Inconsistency of OLS Standard Errors 343 9.3 Internal and External Validity When the Regression Is Used for Prediction 344 9.4 Example:Test Scores and Class Size 345 External Validity 346 Internal Validity 352 Discussion and Implications 353 9.5 Conclusion 354 APPENDIX 9.1 The Massachusetts Elementary School Testing Data 360
CHAPTER 8 Nonlinear Regression Functions 277 8.1 A General Strategy for Modeling Nonlinear Regression Functions 279 Test Scores and District Income 279 The Effect on Y of a Change in X in Nonlinear Specifications 282 A General Approach to Modeling Nonlinearities Using Multiple Regression 285 8.2 Nonlinear Functions of a Single Independent Variable 286 Polynomials 286 Logarithms 288 Polynomial and Logarithmic Models of Test Scores and District Income 296 8.3 Interactions Between Independent Variables 297 Interactions Between Two Binary Variables 298 Interactions Between a Continuous and a Binary Variable 300 Interactions Between Two Continuous Variables 305 8.4 Nonlinear Effects on Test Scores of the Student–Teacher Ratio 310 Discussion of Regression Results 310 Summary of Findings 314 8.5 Conclusion 315 APPENDIX 8.1 Regression Functions That Are Nonlinear in the Parameters 325 APPENDIX 8.2 Slopes and Elasticities for Nonlinear Regression Functions 328 CHAPTER 9 Assessing Studies Based on Multiple Regression 330 9.1 Internal and External Validity 330 Threats to Internal Validity 331 Threats to External Validity 332 9.2 Threats to Internal Validity of Multiple Regression Analysis 333 Omitted Variable Bias 334 Misspecification of the Functional Form of the Regression Function 336 Measurement Error and Errors-in-Variables Bias 336 Missing Data and Sample Selection 339 Simultaneous Causality 341 Sources of Inconsistency of OLS Standard Errors 343 9.3 Internal and External Validity When the Regression Is Used for Prediction 344 9.4 Example: Test Scores and Class Size 345 External Validity 346 Internal Validity 352 Discussion and Implications 353 9.5 Conclusion 354 APPENDIX 9.1 The Massachusetts Elementary School Testing Data 360 12 Contents A01_STOC4455_04_GE_FM.indd 12 06/12/18 10:52 AM
Contents 13 PART THREE Further Topics in Regression Analysis CHAPTER 10 Regression with Panel Data 361 10.1 Panel Data 362 Example:Traffic Deaths and Alcohol Taxes 362 10.2 Panel Data with Two Time Periods:"Before and After"Comparisons 365 10.3 Fixed Effects Regression 367 The Fixed Effects Regression Model 367 Estimation and Inference 369 Application to Traffic Deaths 370 10.4 Regression with Time Fixed Effects 371 Time Effects Only 371 Both Entity and Time Fixed Effects 372 10.5 The Fixed Effects Regression Assumptions and Standard Errors for Fixed Effects Regression 374 10.6 Drunk Driving Laws and Traffic Deaths 377 10.7 Conclusion 381 APPENDIX 10.1 The State Traffic Fatality Data Set 387 APPENDIX 10.2 Standard Errors for Fixed Effects Regression 388 CHAPTER 11 Regression with a Binary Dependent Variable 392 11.1 Binary Dependent Variables and the Linear Probability Model 393 Binary Dependent Variables 9 The Linear Probability Model 395 11.2 Probit and Logit Regression 397 Probit Regression 397 ogit Regression 401 Comparing the Linear Probability.Probit,and Logit Models 403 11.3 Estimation and Inference in the Logit and Probit Models 404 Nonlinear Least Squares Estimation 404 Maximum Likelihood Estimation 405 Measures of Fit 406 11.4 Application to the Boston HMDA Data 407 11.5 Conclusion 413 APPENDIX 11.1 The Boston HMDA Data Set 421 APPENDIX 11.2 Maximum Likelihood Estimation 421 APPENDIX 11.3 Other Limited Dependent Variable Models 424
PART THREE Further Topics in Regression Analysis CHAPTER 10 Regression with Panel Data 361 10.1 Panel Data 362 Example: Traffic Deaths and Alcohol Taxes 362 10.2 Panel Data with Two Time Periods: “Before and After” Comparisons 365 10.3 Fixed Effects Regression 367 The Fixed Effects Regression Model 367 Estimation and Inference 369 Application to Traffic Deaths 370 10.4 Regression with Time Fixed Effects 371 Time Effects Only 371 Both Entity and Time Fixed Effects 372 10.5 The Fixed Effects Regression Assumptions and Standard Errors for Fixed Effects Regression 374 The Fixed Effects Regression Assumptions 374 Standard Errors for Fixed Effects Regression 376 10.6 Drunk Driving Laws and Traffic Deaths 377 10.7 Conclusion 381 APPENDIX 10.1 The State Traffic Fatality Data Set 387 APPENDIX 10.2 Standard Errors for Fixed Effects Regression 388 CHAPTER 11 Regression with a Binary Dependent Variable 392 11.1 Binary Dependent Variables and the Linear Probability Model 393 Binary Dependent Variables 393 The Linear Probability Model 395 11.2 Probit and Logit Regression 397 Probit Regression 397 Logit Regression 401 Comparing the Linear Probability, Probit, and Logit Models 403 11.3 Estimation and Inference in the Logit and Probit Models 404 Nonlinear Least Squares Estimation 404 Maximum Likelihood Estimation 405 Measures of Fit 406 11.4 Application to the Boston HMDA Data 407 11.5 Conclusion 413 APPENDIX 11.1 The Boston HMDA Data Set 421 APPENDIX 11.2 Maximum Likelihood Estimation 421 APPENDIX 11.3 Other Limited Dependent Variable Models 424 Contents 13 A01_STOC4455_04_GE_FM.indd 13 06/12/18 10:52 AM
Contents CHAPTER 12 Instrumental Variables Regression 427 12.1 The IV Estimator with a Single Regressor and a Single Instrument 428 The IV Model and Assumptions 428 The Two Stage Least Squares Estimator 429 Why Does IV Regression Work?429 The Sampling Distribution of the TSLS Estimator 434 Application to the Demand for Cigarettes 435 12.2 The General IV Regression Model 437 TSLS in the G del 439 trument Relevance and Exogeneity in the Genera// The IV Regression Assumptions and Sampling Distribution of the TSLS Estimator 441 Inference Using the TSLS Estimator 442 Application to the Demand for Cigarettes 443 12.3 Checking Instrument Validity 444 Assumption 1:Instrument Relevance Assumption 2:Instrument Exogeneity 446 12.4 Application to the Demand for Cigarettes 450 12.5 Where Do Valid Instruments Come From?454 Three Examples 455 12.6 Conclusion 459 APPENDIX 12.1 The Cigarette Consumption Panel Data Set 467 APPENDIX 122 Derivation of the Formula for the TSLS Estimator in Equation(12.4)467 APPENDIX 123 Large-Sample Distribution of the TSLS Estimator 468 APPENDIX 124 Large-Sample Distribution of the TSLS Estimator When the Instrument Is Not Valid 469 APPENDIX 12.5 Instrumental Variables Analysis with Weak Instruments 470 APPENDIX 12.6 TSLS with Control Variables 472 CHAPTER 13 Experiments and Quasi-Experiments 474 13.1 Potential Outcomes,Causal Effects,and Idealized Experiments 475 Potential Outcomes and the Average Causal Effect 475 Econometric Methods for Analyzing Experimental Data 476 13.2 Threats to Validity of Experiments 478 Threats to Internal validity 478 Threats to Exteral Validity 481 13.3 Experimental Estimates of the Effect of Class Size Reductions 482 Experimental Design 482 Analysis of the STAR Data 483 Comparison of the Observational and Experimental Estimates of Class Size Effects 488
CHAPTER 12 Instrumental Variables Regression 427 12.1 The IV Estimator with a Single Regressor and a Single Instrument 428 The IV Model and Assumptions 428 The Two Stage Least Squares Estimator 429 Why Does IV Regression Work? 429 The Sampling Distribution of the TSLS Estimator 434 Application to the Demand for Cigarettes 435 12.2 The General IV Regression Model 437 TSLS in the General IV Model 439 Instrument Relevance and Exogeneity in the General IV Model 440 The IV Regression Assumptions and Sampling Distribution of the TSLS Estimator 441 Inference Using the TSLS Estimator 442 Application to the Demand for Cigarettes 443 12.3 Checking Instrument Validity 444 Assumption 1: Instrument Relevance 444 Assumption 2: Instrument Exogeneity 446 12.4 Application to the Demand for Cigarettes 450 12.5 Where Do Valid Instruments Come From? 454 Three Examples 455 12.6 Conclusion 459 APPENDIX 12.1 The Cigarette Consumption Panel Data Set 467 APPENDIX 12.2 Derivation of the Formula for the TSLS Estimator in Equation (12.4) 467 APPENDIX 12.3 Large-Sample Distribution of the TSLS Estimator 468 APPENDIX 12.4 Large-Sample Distribution of the TSLS Estimator When the Instrument Is Not Valid 469 APPENDIX 12.5 Instrumental Variables Analysis with Weak Instruments 470 APPENDIX 12.6 TSLS with Control Variables 472 CHAPTER 13 Experiments and Quasi-Experiments 474 13.1 Potential Outcomes, Causal Effects, and Idealized Experiments 475 Potential Outcomes and the Average Causal Effect 475 Econometric Methods for Analyzing Experimental Data 476 13.2 Threats to Validity of Experiments 478 Threats to Internal Validity 478 Threats to External Validity 481 13.3 Experimental Estimates of the Effect of Class Size Reductions 482 Experimental Design 482 Analysis of the STAR Data 483 Comparison of the Observational and Experimental Estimates of Class Size Effects 488 14 Contents A01_STOC4455_04_GE_FM.indd 14 20/12/18 2:12 PM