当前位置：和泉文库 > 统计 > 浏览文档

《实用统计软件》课程教学资源（阅读材料）多元分类问题中的应用 Variance Reduction with Monte Carlo Estimates of Error Rates in Multivariate Classication

文件格式：PDF，文件大小：199.58KB，售价：3.58元

文档详细内容（约12页）

2 Multivariate Classication 2 we will concentrate on control variates and demonstrate their variance reduction potential in our special problem. The paper is organized as follows: In section 2 we will give a brief introduction to multivariate classication. In section 3, we will propose two dierent control variates which will be studied and compared in section 4 by means of some examples. The paper closes with a conclusion in section 5. 2 Multivariate Classication Classication deals with the allocation of ob jects to g predetermined groups 1; 2;::: ; g, say. The aim is to minimize the misclassication error (rate) over all possible future allocations, given the group densities pi(x) ( i= 1 ;2;::: ;g). The minimal error rate is the so{called Bayes error. We measure d features (variables) of the ob jects we consider important for discrimination between the ob jects. These can be continuous features (GNP, consumption etc.) or discrete (number of rms, number of inhabitants etc.). Once the group densities are specied, in order to minimize the error rate we allocate an ob ject with feature vector x to group i, if pi(x) > pj (x) ( 6=j i): (1) Classication methods often assume the group densities pi(x) to be normal. Then there are at least two modelling possibilities (see, e.g., [2]): Estimate the same covariance matrix for all groups (LDA, linear discriminant analysis) or estimate a dierent covariance matrix for each group (QDA, quadratic discriminant analysis). Of course, both methods also estimate dierent mean vectors for each group. In this paper we take QDA as the adequate, and thus standard classication procedure. Often we additionally want to reduce the dimension from d to d0 = 1 or 2 to enhance human perception (dimension reduction). The construction of a d0{space with minimal error rate, given the group densities pi(x) in d{space, can be done by modern optimization techniques, for example Simulated Annealing [3]. In each optimization step, a pro jection space is proposed. Then we determine the group densities (either estimated by means of the pro jected data or directly derived from the pro jected densities of the original space) [4], and calculate the error rate in the pro jection space

3.2 Two Specific Control Variates 4 where the first term on the right hand side has no variance and the second term is computed as the sample mean of Monte-Carlo replicates.Then the variance of the right hand side of(4)is Var(error -errorc)/N, (6) where N is the sample size to determine error and errorco,and Var(error -errorc)=Var(erron+Var(erroro) -2Cov(error,errorc). (7) Now it becomes clear that a large positive correlation between error and errorc can reduce the variance compared to the "naive"estimator EMc(error),i.e.the sample mean of Monte Carlo replicates of error with variance Var(EMc(error))= Var(error)/N.We can even do better.We can use the equation E(error)=aE(errorc)+E(error-aerrorcu) (8) to select that parameter a that minimizes the variance Var(error-aerrorc), (9) leading to Q= Cov(error,errorcu) Var(errore) (10) which is almost equal to the correlation coefficient owhen Var(error)Var(errorco) holds.The final result is then min Var(error-aerrora)(1-2)Var(error), (11) i.e.there can always be a gain when o0. Considering the above arguments,what we look for as a control variate procedure is any classification method which gives results as much as possible correlated with QDA,and for which the exact expected error rate is easily computable.Moreover, one should avoid control variates for which the additional computational effort is that high that overall computation time is increased even in the case of variance reduction. 3.2 Two Specific Control Variates What is,then,a suitable control variate in our context?We will discuss two possi- bilities.In both cases the control variate procedure assumes a somewhat simplified problem situation to be true in order to simplify the Monte Carlo procedure.In the first procedure the covariance matrices of the different groups are assumed to be identical.In the second procedure the possibly high dimensional problem is optimally projected to one dimension.Note that we assumed to study problems with normal group densities with individual covariance matrices.Thus,QDA was assumed to be the standard classification method

3.2 Two Specic Control Variates 4 where the rst term on the right hand side has no variance and the second term is computed as the sample mean of Monte-Carlo replicates. Then the variance of the right hand side of (4) is Var(error errorcv)=N ; (6) where N is the sample size to determine error and errorcv , and Var(error errorcv) = Var( error) + Var( errorcv) 2 Cov( error; errorcv): (7) Now it becomes clear that a large positive correlation between error and errorcv can reduce the variance compared to the "naive" estimator E^ MC (error), i.e. the sample mean of Monte Carlo replicates of error with variance Var(E^ MC (error)) = Var(error)=N. We can even do better. We can use the equation E(error) = E(errorcv) + E( error errorcv) (8) to select that parameter that minimizes the variance Var(error errorcv); (9) leading to = Cov(error; errorcv) Var(errorcv) (10) which is almost equal to the correlation coecient % when Var(error) Var(errorcv) holds. The nal result is then min Var(error errorcv) (1 % 2)Var(error); (11) i.e. there can always be a gain when % 6= 0. Considering the above arguments, what we look for as a control variate procedure is any classication method which gives results as much as possible correlated with QDA, and for which the exact expected error rate is easily computable. Moreover, one should avoid control variates for which the additional computational eort is that high that overall computation time is increased even in the case of variance reduction. 3.2 Two Specic Control Variates What is, then, a suitable control variate in our context? We will discuss two possibilities. In both cases the control variate procedure assumes a somewhat simplied problem situation to be true in order to simplify the Monte Carlo procedure. In the rst procedure the covariance matrices of the dierent groups are assumed to be identical. In the second procedure the possibly high dimensional problem is optimally pro jected to one dimension. Note that we assumed to study problems with normal group densities with individual covariance matrices. Thus, QDA was assumed to be the standard classication method

3.2 Two Specific Control Variates 5 1.The first idea is to utilize the error rate computed by LDA as erroree based on the assumption of equal covariance matrices for all the groups.The error rate error is calculated by QDA from N random realizations drawn from the group densities.To get Emc(error)we generate W such error rates and average.Therefore we used N x W random vectors.Now we take the same random vectors and apply LDA with the same,so-called pooled,covariance matrix for all groups to calculate error rates erroree.If Ei is the assumed covariance matrix for group i,then ((N-1))/(N-g)is the pooled covariance matrix,where M is the number of realizations in group i.The differences of the w corresponding estimates error and errorc are used to calculate E(error-errore).At last we calculate E(errorce)in an exact manner (so that we have no variance)by numerical integration based on the densities with pooled covariance matrices.We now have all the ingredients we need for an efficiency comparison with the naive Monte Carlo estimator.The variance of the naive estimator is calculated by the sample of size W of the estimated error rates error and the variance of the control variate estimator by the sample of size W of(error-error).This approach has the drawback that we have to calculate an exact integral in a projection space which might be two dimensional or of even higher dimension with rather ugly borderlines. 2.A second possibility to determine the error rate error is to use another con- trol variate:the error rate of an "optimal"one dimensional projection.This can be obtained by the largest eigenvalue and the corresponding eigenvector of QDA in the original space or by direct minimization of the error rate.We do the same as in 1 to obtain Evc(error).But then we project the random vectors onto the optimally discriminating direction taking into account the different covariance structures and build the differences of corresponding er- ror estimates to compute E(error-error).Now,the exact calculation of E(error)is simply a one dimensional integration with clearcut intersection points.This speeds up the procedure compared to 1.To construct the opti- mally discriminating one dimensional projection we follow an idea in [5]where it was proposed to project on the first eigenvector v of MMT,where M=(h-4,,2-41,2g-1,,2-) (12) where the i are the group means and the Ei are (again)the group covariance matrices,i=1,...,g.The projected means,variances and feature vectors then have the form:=i=ofiv and x*=of'r. In order to represent adequate control variates the additional computation time of procedures 1 and 2 have to be small relative to the computation time of naive Monte Carlo.That this is the case not considering the computation of the exact expected error rates should be clear by the following arguments. Naive Monte Carlo estimates the means and the covariance matrices of the groups,and evaluates the corresponding estimated group densities for each observation

3.2 Two Specic Control Variates 5 1. The rst idea is to utilize the error rate computed by LDA as errorcv based on the assumption of equal covariance matrices for all the groups. The error rate error is calculated by QDA from N random realizations drawn from the group densities. To get E^ MC (error) we generate W such error rates and average. Therefore we used N W random vectors. Now we take the same random vectors and apply LDA with the same, so-called pooled, covariance matrix for all groups to calculate error rates errorcv. If i is the assumed covariance matrix for group i, then (Pg i=1(Ni 1)i)=(N g) is the pooled covariance matrix, where Ni is the number of realizations in group i. The dierences of the W corresponding estimates error and errorcv are used to calculate E( ^ error errorcv). At last we calculate E(errorcv) in an exact manner (so that we have no variance) by numerical integration based on the densities with pooled covariance matrices. We now have all the ingredients we need for an eciency comparison with the naive Monte Carlo estimator. The variance of the naive estimator is calculated by the sample of size W of the estimated error rates error and the variance of the control variate estimator by the sample of size W of (error errorcv). This approach has the drawback that we have to calculate an exact integral in a pro jection space which might be two dimensional or of even higher dimension with rather ugly borderlines. 2. A second possibility to determine the error rate error is to use another control variate: the error rate of an "optimal" one dimensional pro jection. This can be obtained by the largest eigenvalue and the corresponding eigenvector of QDA in the original space or by direct minimization of the error rate. We do the same as in 1 to obtain E^ MC (error). But then we pro ject the random vectors onto the optimally discriminating direction taking into account the dierent covariance structures and build the dierences of corresponding error estimates to compute E( ^ error errorcv). Now, the exact calculation of E(errorcv) is simply a one dimensional integration with clearcut intersection points. This speeds up the procedure compared to 1. To construct the optimally discriminating one dimensional pro jection we follow an idea in [5] where it was proposed to pro ject on the rst eigenvector v1 of MMT , where M = ( g 1; :::; 2 1; g 1; :::; 2 1) (12) where the i are the group means and the i are (again) the group covariance matrices, i = 1, ..., g. The pro jected means, variances and feature vectors then have the form: i = v T 1 i, i = v T 1 iv1 and x = v T 1 x . In order to represent adequate control variates the additional computation time of procedures 1 and 2 have to be small relative to the computation time of naive Monte Carlo. That this is the case not considering the computation of the exact expected error rates should be clear by the following arguments. Naive Monte Carlo estimates the means and the covariance matrices of the groups, and evaluates the corresponding estimated group densities for each observation.

点击进入文档下载页（PDF格式）

共12页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录