Correlation Analysis(Numeric Data Correlation coefficient (also called Pearson's product moment coefficient) YA.<(a, -A(6-B) ∑(ab)-nAB (n-1σAOB (n-Do OB where n is the number of tuples, A and b are the respective means of a and B, oa and oB are the respective standard deviation of A and B, and a b i is the sum of the ab cross-product If ab>O, a and b are positively correlated (As values increase as Bs). The higher, the stronger correlation rAB =0: independent; rAB< 0: negatively correlated 16
16 Correlation Analysis (Numeric Data) ◼ Correlation coefficient (also called Pearson’s product moment coefficient) where n is the number of tuples, and are the respective means of A and B, σA and σB are the respective standard deviation of A and B, and Σ(aibi ) is the sum of the AB cross-product. ◼ If rA,B > 0, A and B are positively correlated (A’s values increase as B’s). The higher, the stronger correlation. ◼ rA,B = 0: independent; rAB < 0: negatively correlated A B n i i i A B n i i i A B n a b nAB n a A b B r ( 1) ( ) ( 1) ( )( ) 1 1 , − − = − − − = = = A B
Correlation(viewed as linear relationship) Correlation measures the linear relationship between objects To compute correlation we standardize data objects a and b and then take their dot product ak=(ak -mean(a))/std(a) bk=(6k -mean(b)/std(B) correlation(A,B)=A·B 17
17 Correlation (viewed as linear relationship) ◼ Correlation measures the linear relationship between objects ◼ To compute correlation, we standardize data objects, A and B, and then take their dot product a' (a mean(A))/std(A) k = k − b' (b mean(B))/std(B) k = k − correlation(A,B) = A'•B
Covariance(Numeric Data) Covariance is similar to correlation Cou(A, B)=E((A-A)(B-B))=si= (a-A)(b;-B) COu(A, B) Correlation coefficient AB 0AOB where n is the number of tuples, A and b are the respective mean or expected values of a and b oa and og are the respective standard deviation of a and b Positive covariance: If Cova>0, then a and b both tend to be larger than their expected values Negative covariance: If Cova <0 then if a is larger than its expected value, b is likely to be smaller than its expected value. Independence: cova =0 but the converse is not true Some pairs of random variables may have a covariance of o but are not independent. Only under some additional assumptions (e.g. the data follow multivariate normal distributions) does a covariance of 0 imply independence
18 Covariance (Numeric Data) ◼ Covariance is similar to correlation where n is the number of tuples, and are the respective mean or expected values of A and B, σA and σB are the respective standard deviation of A and B. ◼ Positive covariance: If CovA,B > 0, then A and B both tend to be larger than their expected values. ◼ Negative covariance: If CovA,B < 0 then if A is larger than its expected value, B is likely to be smaller than its expected value. ◼ Independence: CovA,B = 0 but the converse is not true: ◼ Some pairs of random variables may have a covariance of 0 but are not independent. Only under some additional assumptions (e.g., the data follow multivariate normal distributions) does a covariance of 0 imply independence A B Correlation coefficient:
Co-Variance: An Example Cov(A, B)=E((A-A)(B-B))=2is a-A)(i-B) It can be simplified in computation as Co(A,B)=E(A·B)-AB Suppose two stocks a and b have the following values in one week (2,5),(3,8),(5,10),(411),(6,14) Question: If the stocks are affected by the same industry trends, wil their prices rise or fall together E(A)=(2+3+5+4+6)5=20/5=4 E(B)=(5+8+10+11+14)/5=48/5=9.6 nCoV(AB)=(2×5+3×8+5×10+4×11+6×14)/5-4×9.6=4 Thus, a and b rise together since Cov(a, b)>0
Co-Variance: An Example ◼ It can be simplified in computation as ◼ Suppose two stocks A and B have the following values in one week: (2, 5), (3, 8), (5, 10), (4, 11), (6, 14). ◼ Question: If the stocks are affected by the same industry trends, will their prices rise or fall together? ◼ E(A) = (2 + 3 + 5 + 4 + 6)/ 5 = 20/5 = 4 ◼ E(B) = (5 + 8 + 10 + 11 + 14) /5 = 48/5 = 9.6 ◼ Cov(A,B) = (2×5+3×8+5×10+4×11+6×14)/5 − 4 × 9.6 = 4 ◼ Thus, A and B rise together since Cov(A, B) > 0