当前位置：和泉文库 > 统计 > 浏览文档

《多元统计分析》课程教学资源（阅读材料）Statistical Classification

文件格式：PDF，文件大小：1.58MB，售价：13.22元

文档详细内容（约47页）

2.2.FISHER'S DISCRIMINANT ANALYSIS 13 two projected class means ought to be far apart,i.e.we want to make wT(2-)large.(To find a bounded solution,we specify that we want w=1.)That way the differences between the two classes are maximally highlighted and evaluating the projection of a new point against the projected sample means can accurately classify the largest number of points.The goal of being able to classify the largest number of unknown points accurately based on the training data is called the generalization ability and motivates Support Vector Machines,the next chapter. But separating the projected class means is only one concern,since if the classes have a large variance on the projected vector,then the class overlap can potentially make classification inac- curate.Refer to the figure on the opposite page.The picture on the left projects the data onto the vector connecting the two sample means,assuring that the distance between the two projected means is maximized.However,since the two projected classes are not partitioned on the projection vector,the problem of classifying new points is no easier.However,in the picture on the right,the chosen projection vector separates the projected sample means and partitions the two classes on the projection vector.Classification of a new point xo then is based on comparing the projection of xo to the midpoint between the two projected sample means.To tend toward a vector that partitions our classes,we want to minimize the within-class variance,i.e.we want to minimize ∑1(x15-1)x1-1)T+∑21(x2-2)(x2-2T wTSpooledw= n1+n2-2 (Recall that wTSw is the weighted sum of the variances of the two projected populations.) We can optimize the difference between projected means and the in-class variance in one equation so that we can find our desired projection vector: Maximize J(w)=(wT())2 such thatw =1. (2.32) wT Spooledw We then prove the maximization lemma to show that Fisher's Discriminant Analysis gives w a ()TSpooled. Lemma 2.2.1.(Maximization Lemma,(4):For B a positive definite matrir,d a given vector, and x a non zero arbitrary vector; (xTd)2 max TTBt =dTB-1d such that=1, (2.33) with the marimum attained when x=cB-d for some constant c #0. Proof.By using the extended Cauchy-Schwarz inequality,(xTd)2<(xTBx)(dTB-d).Dividing both sides by (xTBx)yields Bd.Thus the maximum is dB-d,as desired. (xd)2 Setting the equality and solving for x yields x=cB-1d. 口 Note that this is not a classification method in and of itself,because Fisher's procedure only find the optimal vector to project data onto.However,as mentioned above,classification should compare the projection of new points to the midpoint between the two projected sample means:

2.2. FISHER’S DISCRIMINANT ANALYSIS 13 two projected class means ought to be far apart, i.e. we want to make wT (x¯2 − x¯1) large. (To find a bounded solution, we specify that we want ||w|| = 1.) That way the differences between the two classes are maximally highlighted and evaluating the projection of a new point against the projected sample means can accurately classify the largest number of points. The goal of being able to classify the largest number of unknown points accurately based on the training data is called the generalization ability and motivates Support Vector Machines, the next chapter. But separating the projected class means is only one concern, since if the classes have a large variance on the projected vector, then the class overlap can potentially make classification inaccurate. Refer to the figure on the opposite page. The picture on the left projects the data onto the vector connecting the two sample means, assuring that the distance between the two projected means is maximized. However, since the two projected classes are not partitioned on the projection vector, the problem of classifying new points is no easier. However, in the picture on the right, the chosen projection vector separates the projected sample means and partitions the two classes on the projection vector. Classification of a new point x0 then is based on comparing the projection of x0 to the midpoint between the two projected sample means. To tend toward a vector that partitions our classes, we want to minimize the within-class variance, i.e. we want to minimize wT Spooledw = wT Pn1 j=1(x1j − x¯1)(x1j − x¯1) T + Pn2 j=1(x2j − x¯2)(x2j − x¯2) T ! w n1 + n2 − 2 . (Recall that wT Spooledw is the weighted sum of the variances of the two projected populations.) We can optimize the difference between projected means and the in-class variance in one equation so that we can find our desired projection vector: Maximize J(w) = (wT (x¯2 − x¯1))2 wT Spooledw such that ||w|| = 1. (2.32) We then prove the maximization lemma to show that Fisher’s Discriminant Analysis gives w α (x¯1 − x¯2) T S −1 pooled. Lemma 2.2.1. (Maximization Lemma,[4]): For B a positive definite matrix, d a given vector, and x a non zero arbitrary vector, max (x T d) 2 x T Bx = d T B −1d such that ||w|| = 1, (2.33) with the maximum attained when x = cB−1d for some constant c 6= 0. Proof. By using the extended Cauchy-Schwarz inequality, (x T d) 2 ≤ (x T Bx)(d T B−1d). Dividing both sides by (x T Bx) yields (x T d) 2 (xT Bx) ≤ d T B −1d. Thus the maximum is d T B−1d, as desired. Setting the equality and solving for x yields x = cB−1d. Note that this is not a classification method in and of itself, because Fisher’s procedure only find the optimal vector to project data onto. However, as mentioned above, classification should compare the projection of new points to the midpoint between the two projected sample means:

14 CHAPTER 2.BASIC DISCRIMINANTS Theorem 2.2.2.(Allocation Rule Based on FDA):Classify an unclassified point toto n if 0=(国-)Sda西m>m=(-)TSpooled(1+), (2.34) t切π2f 1 0=(国-Sad西<m-国-TSpooled(国+， (2.35) and unclassifiable if equality holds. Recall that in the previous section,we defined a vector 1-(and showed that given no cost or prior probability concerns,Linear Discriminant Analysis could be interpreted as projecting an unknown datum onto a vector and then comparing the projection of the unknown point to the midpoint between the two projected sample means.Thus,we can see that Fisher's method is identical to assuming that the two populations are distributed similarly and with equal covariance matrices,given no cost or prior probability concerns. We should mention that the assumption of equal covariance matrices is important,because that justifies our choice of the midpoint as the classification critical value.Consider the trivial example shown on the next page.Here the two populations have clearly difference variances,and thus one population occupies a greater portion of the projection vector.Then the midpoint is a poor choice for a critical value,and an accurate decision function will have its critical value closer to ui. 2.3 Further Concerns In this chapter,we have looked at two basic methods of linear discrimination,one that is parametric and another that is non-parametric.However,we only covered cases where the two within-class variances were equal(=E2=E).We have just seen that Fisher's Discriminant Analysis fails when the two class variances are different,but Linear Discriminant Analysis can account for this by directly plugging the two covariance matrices in separately to the pdfs,fi(x)and f2(x).Another issue that was not covered in this section is the difference between the estimated allocation and the optimal allocation when using LDA.The interested reader can find plenty of literature outlining the difference,and a good place to start is section 11.4 in [4]

14 CHAPTER 2. BASIC DISCRIMINANTS Theorem 2.2.2. (Allocation Rule Based on FDA): Classify an unclassified point x0 to π1 if y0 = (x¯1 − x¯2) T S −1 pooledx0 > mˆ = 1 2 (x¯1 − x¯2) T S −1 pooled(x¯1 + x¯2), (2.34) to π2 if y0 = (x¯1 − x¯2) T S −1 pooledx0 < mˆ = 1 2 (x¯1 − x¯2) T S −1 pooled(x¯1 + x¯2), (2.35) and unclassifiable if equality holds. Recall that in the previous section, we defined a vector ˆl = (x¯1 − x¯2) T S −1 pooled and showed that given no cost or prior probability concerns, Linear Discriminant Analysis could be interpreted as projecting an unknown datum onto a vector and then comparing the projection of the unknown point to the midpoint between the two projected sample means. Thus, we can see that Fisher’s method is identical to assuming that the two populations are distributed similarly and with equal covariance matrices, given no cost or prior probability concerns. We should mention that the assumption of equal covariance matrices is important, because that justifies our choice of the midpoint as the classification critical value. Consider the trivial example shown on the next page. Here the two populations have clearly difference variances, and thus one population occupies a greater portion of the projection vector. Then the midpoint is a poor choice for a critical value, and an accurate decision function will have its critical value closer to µ1. 2.3 Further Concerns In this chapter, we have looked at two basic methods of linear discrimination, one that is parametric and another that is non-parametric. However, we only covered cases where the two within-class variances were equal (Σ1 = Σ2 = Σ). We have just seen that Fisher’s Discriminant Analysis fails when the two class variances are different, but Linear Discriminant Analysis can account for this by directly plugging the two covariance matrices in separately to the pdfs, f1(x) and f2(x). Another issue that was not covered in this section is the difference between the estimated allocation and the optimal allocation when using LDA. The interested reader can find plenty of literature outlining the difference, and a good place to start is section 11.4 in [4]

点击进入文档下载页（PDF格式）

共47页，可试读17页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录