当前位置：和泉文库 > 计算机 > 浏览文档

《Artificial Intelligence：A Modern Approach》教学资源（教材，英文版）Part VI Learning 20 Statistical Learning Methods

文件格式：PDF，文件大小：1.03MB，售价：14.25元

文档详细内容（约51页）

Section 20.3. Learning with Hidden Variables: The EM Algorithm 727 1. E-step: Compute the probabilities pij = P(C = i|xj ), the probability that datum xj was generated by component i. By Bayes’ rule, we have pij = αP(xj |C = i)P(C = i). The term P(xj |C = i) is just the probability at xj of the ith Gaussian, and the term P(C = i) is just the weight parameter for the ith Gaussian. Define pi = P j pij . 2. M-step: Compute the new mean, covariance, and component weights as follows: µi ← X j pijxj/pi Σi ← X j pijxjx > j /pi wi ← pi . The E-step, or expectation step, can be viewed as computing the expected values pij of the INDICATOR VARIABLE hidden indicator variables Zij , where Zij is 1 if datum xj was generated by the ith component and 0 otherwise. The M-step, or maximization step, finds the new values of the parameters that maximize the log likelihood of the data, given the expected values of the hidden indicator variables. The final model that EM learns when it is applied to the data in Figure 20.8(a) is shown in Figure 20.8(c); it is virtually indistinguishable from the original model from which the data were generated. Figure 20.9(a) plots the log likelihood of the data according to the current model as EM progresses. There are two points to notice. First, the log likelihood for the final learned model slightly exceeds that of the original model, from which the data were generated. This might seem surprising, but it simply reflects the fact that the data were generated randomly and might not provide an exact reflection of the underlying model. The second point is that EM increases the log likelihood of the data at every iteration. This fact can be proved in general. Furthermore, under certain conditions, EM can be proven to reach a local maximum in likelihood. (In rare cases, it could reach a saddle point or even a local minimum.) In this sense, EM resembles a gradient-based hill-climbing algorithm, but notice that it has no “step size” parameter! Things do not always go as well as Figure 20.9(a) might suggest. It can happen, for example, that one Gaussian component shrinks so that it covers just a single data point. Then its variance will go to zero and its likelihood will go to infinity! Another problem is that two components can “merge,” acquiring identical means and variances and sharing their data points. These kinds of degenerate local maxima are serious problems, especially in high dimensions. One solution is to place priors on the model parameters and to apply the MAP version of EM. Another is to restart a component with new random parameters if it gets too small or too close to another component. It also helps to initialize the parameters with reasonable values. Learning Bayesian networks with hidden variables To learn a Bayesian network with hidden variables, we apply the same insights that worked for mixtures of Gaussians. Figure 20.10 represents a situation in which there are two bags of candies that have been mixed together. Candies are described by three features: in addition to the Flavor and the Wrapper, some candies have a Hole in the middle and some do not

点击进入文档下载页（PDF格式）

共51页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录