Plot and interpret. As always, we first This line lies as close as possible to the examine the data. Figure 3 is a scatterplot points(in the sense of least squares)in of the crying data. Plot the explanatory the vertical (y) direction. The equation of variable(crying intensity at birth) the least-squares regression line is horizontally and the response variable(IQ at age 3)vertically. Look for the form, direction, and strength of the relationship y=a+bx=91.27+1493x as well as for outliers or other deviations There is a moderate positive linear Becauser=0.207, about 21% of the relationship, with no extreme outliers or n in IQ scores is explained by potentially influential observations intensity. See SPSS output Numerical summary. Because the scatterplot shows a roughly linear (Cormant GVING (straight-line)patte, the correlation describes the direction and strength of the relationship. The correlation between crying and IQ is r=0.455 Mathematical model. We are interested b Dopamin wabR in predicting the response from information about the explanatory variable. So we find the least-squares n line for predicting IQ from crying
11 21 • Plot and interpret. As always, we first examine the data. Figure 3 is a scatterplot of the crying data. Plot the explanatory variable (crying intensity at birth) horizontally and the response variable (IQ at age 3) vertically. Look for the form, direction, and strength of the relationship as well as for outliers or other deviations. There is a moderate positive linear relationship, with no extreme outliers or potentially influential observations. 22 • Numerical summary. Because the scatterplot shows a roughly linear (straight-line) pattern, the correlation describes the direction and strength of the relationship. The correlation between crying and IQ is r = 0.455. • Mathematical model. We are interested in predicting the response from information about the explanatory variable. So we find the least-squares regression line for predicting IQ from crying. 12 23 This line lies as close as possible to the points (in the sense of least squares) in the vertical (y) direction. The equation of the least-squares regression line is Because = 0.207, about 21% of the variation in IQ scores is explained by crying intensity. See SPSS output: y ˆ =+ = + a bx x 91.27 1.493 2 r 24
The regression model The mean response !, has a straight-line relationship with: The slope b and intercept a of the least- quare line are statistics. That is, we =a+Bx calculated them from the sample data These statistics would take somewhat different values if we repeated the study The slope B and intercept a are unknown with different infants To do formal inference, we think of a and b as estimates The standard deviation of y(call it o )is of unknown parameters the same for all values of x the value of o unknown Assumptions for regression The heart of this model is that there is an inference on the average" straight-line relationship between y and X. The true regression line We have n observations on an explanatory u,=a+Bx says that the mean variable x and a response variable yOur response u, moves along a straight line goal is to study or predict the behavior of y the explanatory variable x changes. We for given values of x cant observe the true regression line. The For any fixed value of x, the response y values of y that we do observe vary about varies according to a normal distribution their means according to a normal Repeated responses y are independent of distribution If we hold x fixed and take each other many observations on y, the normal pattern will eventually appear in a histogram
13 25 The regression model • The slope b and intercept a of the leastsquares line are statistics. That is, we calculated them from the sample data. These statistics would take somewhat different values if we repeated the study with different infants. To do formal inference, we think of a and b as estimates of unknown parameters. 26 Assumptions for regression inference We have n observations on an explanatory variable x and a response variable y. Our goal is to study or predict the behavior of y for given values of x. • For any fixed value of x, the response y varies according to a normal distribution. Repeated responses y are independent of each other. 14 27 • The mean response has a straight-line relationship with x: The slope and intercept are unknown parameters. • The standard deviation of y (call it ) is the same for all values of x. The value of is unknown. y μ = + α β x μ y σ σ β α 28 The heart of this model is that there is an "on the average" straight-line relationship between y and x. The true regression line y μ = + α β x response moves along a straight line as the explanatory variable x changes. We can't observe the true regression line. The values of y that we do observe vary about their means according to a normal distribution. If we hold x fixed and take many observations on y, the normal pattern will eventually appear in a histogram. says that the mean μ y