Correlation Analysis(Numeric Data Correlation coefficient(also called Pearson's product moment coefficient) ∑(a-Ab-B)∑(ab)-nAB AB AB (n-Do OR where n is the number of tuples, A and B are the respective means of a and b oa and o are the respective standard deviation of a and B, and 2(abi) is the sum of the AB cross-product Ifrab>0, a and B are positively correlated(A's values increase as Bs). The higher the stronger correlation rAB =0: independent; rAB < 0: negatively correlated 17
17 Correlation Analysis (Numeric Data) ◼ Correlation coefficient (also called Pearson’s product moment coefficient) where n is the number of tuples, and are the respective means of A and B, σA and σB are the respective standard deviation of A and B, and Σ(aibi ) is the sum of the AB cross-product. ◼ If rA,B > 0, A and B are positively correlated (A’s values increase as B’s). The higher, the stronger correlation. ◼ rA,B = 0: independent; rAB < 0: negatively correlated A B n i i i A B n i i i A B n a b nAB n a A b B r ( 1) ( ) ( 1) ( )( ) 1 1 , − − = − − − = = = A B
Visually Evaluating Correlation 090 080 0.70 0.60050 0.40 0.30020 0.10 000 010 020 030 Scatter plots 癌摩四 showing the 0.40 0.60 0.70 0.90 18
18 Visually Evaluating Correlation Scatter plots showing the similarity from –1 to 1
Correlation(viewed as linear relationship) a Correlation measures the linear relationship between objects To compute correlation we standardize data objects a and b, and then take their dot product ak=(ak -mean(A)/std(A) b (bk -mean(b)/std(B) correlation(a, b)=a. b 19
19 Correlation (viewed as linear relationship) ◼ Correlation measures the linear relationship between objects ◼ To compute correlation, we standardize data objects, A and B, and then take their dot product a' (a mean(A))/std(A) k = k − b' (b mean(B))/std(B) k = k − correlation(A,B) = A'•B
Covariance(Numeric Data Covariance is similar to correlation COv(A, B)=E((A-A(B-B ∑=1(a-4)(2-B) COu(A, B) orrelation coefficient AB OAOB where n is the number of tuples, A and b are the respective mean or expected values of a and b oa and or are the respective standard deviation of a and b Positive covariance: If CovA B>0, then a and b both tend to be larger than their expected values Negative covariance: If Cova <0 then if a is larger than its expected value, b is likely to be smaller than its expected value Independence Cova=0 but the converse is not true Some pairs of random variables may have a covariance of o but are not independent. Only under some additional assumptions(e. g. the data follow multivariate normal distributions) does a covariance of 0 imply independence 20
20 Covariance (Numeric Data) ◼ Covariance is similar to correlation where n is the number of tuples, and are the respective mean or expected values of A and B, σA and σB are the respective standard deviation of A and B. ◼ Positive covariance: If CovA,B > 0, then A and B both tend to be larger than their expected values. ◼ Negative covariance: If CovA,B < 0 then if A is larger than its expected value, B is likely to be smaller than its expected value. ◼ Independence: CovA,B = 0 but the converse is not true: ◼ Some pairs of random variables may have a covariance of 0 but are not independent. Only under some additional assumptions (e.g., the data follow multivariate normal distributions) does a covariance of 0 imply independence A B Correlation coefficient:
c。 variance: An Example -A)(b1-B) Cou(A, B)=E((A-A)(B-B))=i= a It can be simplified in computation as Cov(A,B)=E(A·B)-AB Suppose two stocks a and b have the following values in one week: 2,5),(3,8),(5,10),(411),(6,14) Question If the stocks are affected by the same industry trends, will their prices rise or fall together? E(A)=(2+3+5+4+65=20/5=4 nE(B)=(5+8+10+11+14)/5=48/5=9.6 CovA,B)=(2×5+3×8+5×10+4×11+6×14/5-4×9.6=4 Thus, a and b rise together since CoV(a, b)>0
Co-Variance: An Example ◼ It can be simplified in computation as ◼ Suppose two stocks A and B have the following values in one week: (2, 5), (3, 8), (5, 10), (4, 11), (6, 14). ◼ Question: If the stocks are affected by the same industry trends, will their prices rise or fall together? ◼ E(A) = (2 + 3 + 5 + 4 + 6)/ 5 = 20/5 = 4 ◼ E(B) = (5 + 8 + 10 + 11 + 14) /5 = 48/5 = 9.6 ◼ Cov(A,B) = (2×5+3×8+5×10+4×11+6×14)/5 − 4 × 9.6 = 4 ◼ Thus, A and B rise together since Cov(A, B) > 0