16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde Lecture 5 Last time Characterizing groups of random variables Names for groups of random variables S=∑X s2=∑∑xX Characterize by pairs to compute E[x]=X=「d观(xy which we define as the correlation Often we do not know the complete distribution, but only simple statistics The most common of the moments of higher ordered distribution functions is the Covariance, Hn=E(X-X)(Y-)=(x-X)( XY-Xy+ Xy XY-X (correlation)-(product of means) Even more significant is the normalized covariance, or correlation coefficient: 1≤p≤1 o201o This correlation coefficient may be thought of as measuring the degree of linear dependence between the random variables: P=0 if the two are independent and p=+l if one is a linear function of the other. First note p=0 if X and Y are ndependent Calculate p for Y=a+bX If linearly related
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde Lecture 5 Last time: Characterizing groups of random variables Names for groups of random variables n S = ∑Xi i=1 n n 2 S = ∑∑X iX j i=1 j=1 Characterize by pairs to compute ∞ ∞ E XY ] = XY = dx xyf xy [ ( , ) x y dy ∫ ∫ , −∞ −∞ which we define as the correlation. Often we do not know the complete distribution, but only simple statistics. The most common of the moments of higher ordered distribution functions is the covariance, ⎣ )( ⎦ µ )( xy = E ⎡( X − XY −Y )⎤ = ( X − XY −Y ) = XY − XY − XY + XY = XY − XY − XY + XY = XY − XY = (correlation) − (product of means) Even more significant is the normalized covariance, or correlation coefficient: µxy ρ = µ 2 xy 2 = σ σ , −1 ≤ ρ ≤ 1 σ σ x y x y This correlation coefficient may be thought of as measuring the degree of linear dependence between the random variables: ρ = 0 if the two are independent and ρ = ±1 if one is a linear function of the other. First note ρ = 0 if X and Y are independent. Calculate ρ for Y a = + bX . xy If linearly related: Page 1 of 9
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde Y=a+bX YY=aX+br2 y2=a2+abx +b2X aX+bx2-X(a+bX (a2+2abX+62X-a2-2abX-b2r) bIX-X 6(x-x Degree of linear Dependence At every observation, or trial or the experiment, we observe a pair x, y. We ask how well can we approximate y as a linear function of X? =a+bX Choose a and b to minimize the mean squared error, &, in the approximation. Y=a+bX-y 22=a2tb2x2+y2+2abX-2bXY-2ay a2+62x2+2+2abx-2bXY-2ay Page 2 of 9
x 2 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde Y = + a bX XY = a X + b X 2 Y 2 2 2 = a + 2abX + b X 2 aX + b X 2 − X (a + bX ) ρ = 2 2 2 2 2 2 2 σ (a + 2abX + b X − a − 2abX − b X ) bX 2 − X 2 ) 2 = ( = bσ x = ±1 sgn( = b) 2 22 ( σ bX 2 − X bσ x 2 x ) Degree of Linear Dependence At every observation, or trial or the experiment, we observe a pair x,y. We ask: how well can we approximate Y as a linear function of X? Y = + a bX approx . 2 Choose a and b to minimize the mean squared error, ε , in the approximation. ε = Y approx . −Y = + a bX −Y 2 2 2 2 ε = a + b X + Y + 2abX − 2bXY − 2aY 2 2 Y 2 = a + b X 2 + + 2abX − 2b XY − 2aY Page 2 of 9
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde 2bF-2F=0 05=2bX2+2aX-2xY=0 -X6-=(2bX2-2XY)-(2bx2-2xy)=0 ab b XY A O Y-X=Y Y-X approx X-X y+ E=0 E2=a(a+2b-2)+b2x2+Y2-2bX (-X-x+2x-2) XI x =a1(1-p2)
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 2 ∂ε = 2a + 2bX − 2Y = 0 ∂a 2 ∂ε 2 = 2bX + 2aX − 2XY = 0 ∂b 2 2 ∂ε 2 − X ∂ε = (2b X 2 − 2XY ) − (2bX − 2XY ) = 0 ∂b ∂a b = XY − XY µxy σ y = = ρ 2 X − X 2 σ x 2 σ x aY µxy σ y = − Y 2 X = − ρ X σ x σ x Y = − µxy X + µxy approx . Y X σ2 2 x σ x Y µxy = + 2 ( X − X ) σ x y = + Y ρ σ ( X − X ) σ x y Y Y σ ε = Y approx . −=+ ρ ( X − X ) −Y σ x ε = 0 2 ( 2 2 2 ε = aa + 2bX − 2Y ) + b X +Y − 2b XY (Y Y 2 µxy = − µxy X )(Y − µxy X + 2 µxy X − 2Y ) + µxy X 2 + − 2 XY 2 2 2 2 2 σ σ σ x x x σ x σ x Y 2 µxy µxy 2 µxy 2 2 = − + 2 XY + µxy XY − 2 X + X + − 2 µxy XY 2 2 Y 2 σ σ σ σ x x x x σ x 2 = σ − 2 µxy µxy + µxy 2 σ2 y 2 4 x σ σ x x 2 ⎞ 2 2 = σ − µxy 2 = σ y ⎜ ⎛ 1− µxy 2 y ⎟ 2 σ ⎜ x ⎝ σ xσ y 2 ⎠ ⎟ 2 (1− 2 = σ ρ ) y Page 3 of 9
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde If X and y were actually linearly If X and y were independent, the related, the points would appear on points would scatter all over the x,y one straight line, p would be tl, and plane, u would be zero, so the mean squared error in the =Y, and a2 approximation would be zero Note that dependence other than linear is not necessarily measured by p Example: y=X and X=X=0 u=Y=Y=X3-X2=0 >P=0, but X, Y are dependent Also, high correlation does not imply cause and effect Example: Dying in the hospital A survey reports that two events, "entering the hospital" and "dying within 1 week"have a high correlation. This relationship, however, is not causal. There exists a third, unreported event, " disease, which causes each of the other events Page 4 of 9
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde If X and Y were actually linearly If X and Y were independent, the related, the points would appear on points would scatter all over the x,y one straight line, ρ would be ±1 , and plane, µxy would be zero, so the mean squared error in the Y = Y , and ε 2 = σ y 2 . approx . approximation would be zero. Note that dependence other than linear is not necessarily measured by ρ. 2 3 Example: Y = X and X = X = 0 . 2 µxy = XY = XY = X 3 − X X = 0 → = ρ 0, but XY , are dependent! Also, high correlation does not imply cause and effect. Example: Dying in the hospital. A survey reports that two events, “entering the hospital” and “dying within 1 week” have a high correlation. This relationship, however, is not causal. There exists a third, unreported event, “disease,” which causes each of the other events. Page 4 of 9
16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde Vector-Matrix notation Define the vectors X and x and the mean Elr E[]=「∫(x) x1…dxn:J(x) Correlation matrix ELXX'=Xr=M M=XX o the correlation matrix arrays all the correlations among the X, with the mean squared values on the diagonal Note that. The correlation matrix is symmetric If X; and X; are independent, the correlation is the product of the means (from the product rule for the pdf) XX=XX Covariance matrix [(r-XXX-x)]=ELXXJ-E[X]X-XELr]+X E[ XX-XX XX-XX=C C,=[(X-X)(X-X) XX-XX This is by definition the covariance between Xi and X. So the covariance matrix arrays all the covariances among the X, with the variances along the diagonal. Page 5 of 9
16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde Vector-Matrix Notation Define the vectors X and x and the mean E[X ] . ∞ ∞ E X dx1... dx xf ( x) [ ] = ∫ ∫ n n −∞ −∞ ⎡ ⎤ ∞ ∞ x1 dx1... dx ⎢ ⎥ M f ( x) = ∫ ∫ n n ⎢ ⎥ −∞ −∞ ⎢ ⎥ x⎣ ⎦n = X ⎡ X ⎤1 ⎢ ⎥ = M ⎢ ⎥ ⎢X ⎥ ⎣ n ⎦ Correlation Matrix T E ⎡XX T ⎤ ⎦ ⎣ = XX = M M ij = X Xi j So the correlation matrix arrays all the correlations among the Xi with the mean squared values on the diagonal. Note that: • The correlation matrix is symmetric. • If Xi and Xj are independent, the correlation is the product of the means (from the product rule for the pdf). XiX j = X Xi j Covariance Matrix T T T T T E ⎡( X − X )( X − X ) ⎤ = E ⎡XX ⎤ − E X ] X − XE ⎡X ⎤ + XX ⎣ ⎦ ⎣ ⎦ [ ⎣ ⎦ T T = E ⎡XX T ⎤ ⎦ − XX − XX T + XX ⎣ T = XX T − XX ≡ C T Cij = ⎡( X − X )( X − X ) ⎤ ⎣ ⎦ij = X Xi j − X Xi j This is by definition the covariance between Xi and Xj. So the covariance matrix arrays all the covariances among the Xi with the variances along the diagonal. Page 5 of 9