Simple Linear Regression Lecture 8 Like correlation, there are two major Simple Linear assumptions Regression The relationship should be linear, and The level of data must be continuou Simple Linear Regression The regression equation (Bivariate Regression) The purpose of simple linear looked at measuring relationships to fit a line to the two variables this line is between two interval variables using correlation called the line of best fit, or the Now we continue to look at the bivariate analysis of the two variables using regression analysis regression line. When we do a scatterplot However, the purpose of doing regression rather of two variables, it is possible to fit a line than correlation is that we can predict results in which best represents the data one variable based on another variable. so rather than simply see if the variables are related, we can interpret their effect
1 1 Lecture 8 Simple Linear Regression 2 Simple Linear Regression (Bivariate Regression) We already looked at measuring relationships between two interval variables using correlation. Now we continue to look at the bivariate analysis of the two variables using regression analysis. However, the purpose of doing regression rather than correlation is that we can predict results in one variable, based on another variable. So, rather than simply see if the variables are related, we can interpret their effect. 2 3 Simple Linear Regression Like correlation, there are two major assumptions: • The relationship should be linear; and • The level of data must be continuous 4 The regression equation The purpose of simple linear regression is to fit a line to the two variables. This line is called the line of best fit, or the regression line. When we do a scatterplot of two variables, it is possible to fit a line which best represents the data
The regression equation The regression equation A regression equation is used to define the relationship between two variables. It These represent the following: takes the form a or A=Constant value. It is the value at which the line intersects the Y=a+bX bor A Contant vale It is the slope(or gradient)or the ne. It represents the change in Y for each inerease or decrease nX. Y=B0+B1X1+8 I= The value of the x variable for each case The regression equation Scatterplot and regression ine They are essentially the same, except that the second includes an error term at the end. This error term indicates that what we Change in Y s 1o have is in fact a model and hence won t fit the data perfectly
3 5 The regression equation A regression equation is used to define the relationship between two variables. It takes the form: or 6 The regression equation They are essentially the same, except that the second includes an error term at the end. This error term indicates that what we have is in fact a model, and hence won't fit the data perfectly. 4 7 The regression equation 8 Scatterplot and regression line 0 10 20 30 40 50 60 70 012345 X YChange in X is 1 Change in Y is 10 Intercept is 20
How do we fit a line to data? Now, we do not have to test every possible line to see which fits the data best. The method of least squares In order to fit a line of best fit we use a provides the optimal values of a(or岛)andb(or月 method called the method of least Squares. This method allows us to Once we have established them, we can use them in determine which line, out of all the lines the regression equation. that could be drawn, best represents the The formulas for calculating a and b are least amount of difference between the actual values(the data points )and the n(EXr)-CxcEr predicted line In the Figure above, three data points fall on the Example 1 line, while the remaining 6 are slightly above or below the line. The difference between these Agen Children ints and the line are called residuals. some of thile others will be negative(below the line) we add up all these differences, some of the positive and negative values will cancel each other out. which will have the effect of verestimating how well the line represents the data. Instead, if we square the differences and Total 149 92994903 then add them up then we can ut which line has the smallest sum of squares(that is, the one with the least error) so,n=5,2X=149,ZY=9,2XY=29922=403 22=19,(2X2=149149=2221
5 9 How do we fit a line to data? In order to fit a line of best fit we use a method called the Method of Least Squares. This method allows us to determine which line, out of all the lines that could be drawn, best represents the least amount of difference between the actual values (the data points) and the predicted line. 10 In the Figure above, three data points fall on the line, while the remaining 6 are slightly above or below the line. The difference between these points and the line are called residuals. Some of these differences will be positive (above the line), while others will be negative (below the line). If we add up all these differences, some of the positive and negative values will cancel each other out, which will have the effect of overestimating how well the line represents the data. Instead, if we square the differences and then add them up then we can work out which line has the smallest sum of squares (that is, the one with the least error). 6 11 12 Example 1
CI)-()C∑1 We could now draw a line of best fit x2)-②m through the observed data points 5*(299)-(149)*(9)154 5*(4803)-(202181085 9-0085*(149) 5 1520253035404550 =-0.73+0085X Predictio Regression Variables Entered/ Remove based on their age. So, if some woman in the community re aged 27 we could predict that their CEB number was: Model Summary F=-0.73+0085*27=1.56 a Predictors: (Constant), AGE 8
7 13 14 Prediction 8 15 We could now draw a line of best fit through the observed data points 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 15 20 25 30 35 40 45 50 Age Number of children 16
Example 2: Crying and IQ ANova Infants who cry easily may be a sign of higher IQ. Crying intensity and IQ data on Crying Io Crying IQ Crying IQ crying b, Dependent vaiable: EB 181091511 1112111416118 Dependant variable CEB Inference for regression When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative response variable y, we can use the least squares line fitted to the data to predict y for a given value of x. Now we want to do tests and confidence intervals in this
9 17 18 Inference for Regression • When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative response variable y, we can use the leastsquares line fitted to the data to predict y for a given value of x. Now we want to do tests and confidence intervals in this setting. 10 19 Example 2: Crying and IQ • Infants who cry easily may be a sign of higher IQ. Crying intensity and IQ data on 38 infants: IQ=intelligence quotient 20