当前位置：和泉文库 > 计算机 > 浏览文档

《机器学习 Machine Learning》课程教学资源（书籍文献）A random forest guided tour

文件格式：PDF，文件大小：931.11KB，售价：8.97元

文档详细内容（约31页）

208 G.Biau,E.Scornet see that the random forest estimate adapts itself to the sparse framework.Of course, this is achieved by assuming that the procedure succeeds in selecting the informative variables for splitting,which is indeed a strong assumption. An alternative model for pure forests,called Purely Uniform Random Forests (PURF)is discussed in Genuer(2012).For p =1,a PURF is obtained by drawing k random variables uniformly on [0.1],and subsequently dividing [0,1]into random sub-intervals.(Note that as such,the PURF can only be defined for p=1.)Although this construction is not exactly recursive,it is equivalent to growing a decision tree by deciding at each level which node to split with a probability equal to its length Genuer (2012)proves that PURF are consistent and,under a Lipschitz assumption, that the estimate satisfies E[monX-m(X2-0(n-2/） This rate is minimax over the class of Lipschitz functions(Stone 1980,1982). It is often acknowledged that random forests reduce the estimation error of a single tree,while maintaining the same approximation error.In this respect,Biau(2012) argues that the estimation error of centered forests tends to zero (at the slow rate 1/log n)even if each tree is fully grown (i.e.,k log n).This result is a consequence of the tree-averaging process,since the estimation error of an individual fully grown tree does not tend to zero.Unfortunately,the choice k log n is too large to ensure consistency of the corresponding forest,whose approximation error remains constant. Similarly,Genuer(2012)shows that the estimation error of PURF is reduced by a factor of 0.75 compared to the estimation error of individual trees.The most recent attempt to assess the gain of forests in terms of estimation and approximation errors is by Arlot and Genuer(2014),who claim that the rate of the approximation error of certain models is faster than that of the individual trees. 3.2 Forests,neighbors and kernels Let us consider a sequence of independent and identically distributed random variables X1,...,Xn.In random geometry,an observation Xi is said to be a layered nearest neighbor (LNN)of a point x (from X1,...,Xn)if the hyperrectangle defined by x and Xi contains no other data points (Barndorff-Nielsen and Sobel 1966;Bai et al. 2005;see also Devroye et al.1996,Chapter 11,Problem 6).As illustrated in Fig.2, the number of LNN of x is typically larger than one and depends on the number and configuration of the sample points. Surprisingly,the LNN concept is intimately connected to random forests that ignore the resampling step.Indeed,if exactly one point is left in the leaves and if there is no resampling,then no matter what splitting strategy is used,the forest estimate at x is a weighted average of the Yi whose corresponding Xi are LNN of x.In other words. moo.n(x)= Wni(x)Yi. (3) 鱼Springer

208 G. Biau, E. Scornet see that the random forest estimate adapts itself to the sparse framework. Of course, this is achieved by assuming that the procedure succeeds in selecting the informative variables for splitting, which is indeed a strong assumption. An alternative model for pure forests, called Purely Uniform Random Forests (PURF) is discussed in Genuer (2012). For p = 1, a PURF is obtained by drawing k random variables uniformly on [0, 1], and subsequently dividing [0, 1] into random sub-intervals. (Note that as such, the PURF can only be defined for p = 1.) Although this construction is not exactly recursive, it is equivalent to growing a decision tree by deciding at each level which node to split with a probability equal to its length. Genuer (2012) proves that PURF are consistent and, under a Lipschitz assumption, that the estimate satisfies E[m∞,n(X) − m(X)] 2 = O n−2/3 . This rate is minimax over the class of Lipschitz functions (Stone 1980, 1982). It is often acknowledged that random forests reduce the estimation error of a single tree, while maintaining the same approximation error. In this respect, Biau (2012) argues that the estimation error of centered forests tends to zero (at the slow rate 1/ log n) even if each tree is fully grown (i.e., k ≈ log n). This result is a consequence of the tree-averaging process, since the estimation error of an individual fully grown tree does not tend to zero. Unfortunately, the choice k ≈ log n is too large to ensure consistency of the corresponding forest, whose approximation error remains constant. Similarly, Genuer (2012) shows that the estimation error of PURF is reduced by a factor of 0.75 compared to the estimation error of individual trees. The most recent attempt to assess the gain of forests in terms of estimation and approximation errors is by Arlot and Genuer (2014), who claim that the rate of the approximation error of certain models is faster than that of the individual trees. 3.2 Forests, neighbors and kernels Let us consider a sequence of independent and identically distributed random variables X1,..., Xn. In random geometry, an observation Xi is said to be a layered nearest neighbor (LNN) of a point x (from X1,..., Xn) if the hyperrectangle defined by x and Xi contains no other data points (Barndorff-Nielsen and Sobel 1966; Bai et al. 2005; see also Devroye et al. 1996, Chapter 11, Problem 6). As illustrated in Fig. 2, the number of LNN of x is typically larger than one and depends on the number and configuration of the sample points. Surprisingly, the LNN concept is intimately connected to random forests that ignore the resampling step. Indeed, if exactly one point is left in the leaves and if there is no resampling, then no matter what splitting strategy is used, the forest estimate at x is a weighted average of the Yi whose corresponding Xi are LNN of x. In other words, m∞,n(x) = n i=1 Wni(x)Yi, (3) 123

点击进入文档下载页（PDF格式）

共31页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录