Lecture 3 Regression Analysis Dr.李晓瑜Xiaoyu Li Email:xiaoyuuestc@uestc.edu.cn http://blog.sciencenet.cn/u/uestc2014xiaoyu 2019-Spring SunData Group http://www.sundatagroup.org School of Information and Software Engineering,UESTC 1966 Copyright2019 by Xiaoyu Li
Dr.李晓瑜 Xiaoyu Li Email:xiaoyuuestc@uestc.edu.cn http://blog.sciencenet.cn/u/uestc2014xiaoyu 2019-Spring Lecture 3 Regression Analysis SunData Group http://www.sundatagroup.org/ School of Information and Software Engineering, UESTC Copyright © 2019 by Xiaoyu Li. 1
sunData Groun Review 1 Simple Linear Regression .2 Multiple Regression 3 Understanding the Regression Output .4 Coefficient of Determination R2 .5 Validating the Regression Model 3 Copyright 2019 by Xiaoyu Li
Review 1 Simple Linear Regression 2 Multiple Regression 3 Understanding the Regression Output 4 Coefficient of Determination R2 5 Validating the Regression Model Copyright © 2019 by Xiaoyu Li. 3
Practice of Regression Choose which independent variables to include in the model, based on common sense and context specific knowledge. Collect data(create dummy variables in necessary). Run regression-the easy part. Analyze the output and make changes in the model-this is where the action is. ·Test the regression result on“out-of-sample”data DATA 4 Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 4 Practice of Regression
The Post-Regression Checklist 1)Statistics checklist: Calculate the correlation between pairs of x variables -watch for evidence of multicollinearity Check signs of coefficients-do they make sense? Check 95%C.I.(use t-statistics as quick scan)-are coefficients significantly different from zero? R2:overall quality of the regression,but not the only measure 2)Residual checklist: Normality-look at histogram of residuals Heteroscedasticity-plot residuals with each x variable Autocorrelation-if data has a natural order,plot residuals in order and check for a pattern ATA 5 Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 5 The Post-Regression Checklist
The Grand Checklist Linearity: scatter plot,common sense,and knowing your problem, transform including interactions is useful. t-statistics: are the coefficients significantly different from zero? Look at width of confidence intervals .F-test for subsets,equality of coefficients .R2:is it reasonably high in the context? Influential observations,outliers in predictor space, dependent variable space DATA 6 Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 6 The Grand Checklist Linearity: scatter plot, common sense, and knowing your problem, transform including interactions is useful. t-statistics: are the coefficients significantly different from zero? Look at width of confidence intervals F-test for subsets, equality of coefficients R2: is it reasonably high in the context? Influential observations, outliers in predictor space, dependent variable space