当前位置：和泉文库 > 电子与通信 > 浏览文档

《贝叶斯学习与随机矩阵及在无线通信中的应用 BI-RM-AWC》课程教学资源（文献书籍）Pattern Recognition and Machine Learning

1 Introduction 2 Probability Distributions 3 Linear Models for Regression 4 Linear Models for Classification 5 Neural Networks 6 Kernel Methods 7 Sparse Kernel Machines 8 Graphical Models 9 Mixture Models and EM 10 Approximate Inference 11 Sampling Methods 12 Continuous Latent Variables 13 Sequential Data 14 Combining Models

文件格式：PDF，文件大小：7.32MB，售价：70.2元

共700页，可试读40页，点击往前阅读 ↑↑

文档详细内容（约700页）

1.2.Probability Theory 17 Suppose instead we are told that a piece of fruit has been selected and it is an orange,and we would like to know which box it came from.This requires that we evaluate the probability distribution over boxes conditioned on the identity of the fruit,whereas the probabilities in (1.16)-(1.19)give the probability distribution over the fruit conditioned on the identity of the box.We can solve the problem of reversing the conditional probability by using Bayes'theorem to give p(B=rIr=0)=(F=dB=r)n(B-r)=3x4x202 =4×10×9=3 (1.23) p(F=o) From the sum rule,it then follows that p(B=bF=o)=1-2/3 1/3 We can provide an important interpretation of Bayes'theorem as follows.If we had been asked which box had been chosen before being told the identity of the selected item of fruit,then the most complete information we have available is provided by the probability p(B).We call this the prior probability because it is the probability available before we observe the identity of the fruit.Once we are told that the fruit is an orange,we can then use Bayes'theorem to compute the probability p(BF),which we shall call the posterior probability because it is the probability obtained after we have observed F.Note that in this example,the prior probability of selecting the red box was 4/10,so that we were more likely to select the blue box than the red one.However,once we have observed that the piece of selected fruit is an orange,we find that the posterior probability of the red box is now 2/3,so that it is now more likely that the box we selected was in fact the red one.This result accords with our intuition,as the proportion of oranges is much higher in the red box than it is in the blue box,and so the observation that the fruit was an orange provides significant evidence favouring the red box.In fact,the evidence is sufficiently strong that it outweighs the prior and makes it more likely that the red box was chosen rather than the blue one. Finally,we note that if the joint distribution of two variables factorizes into the product of the marginals,so that p(X,Y)=p(X)p(Y),then X and Y are said to be independent.From the product rule,we see that p(YX)=p(Y),and so the conditional distribution of y given X is indeed independent of the value of X.For instance,in our boxes of fruit example,if each box contained the same fraction of apples and oranges,then p(FB)=P(F),so that the probability of selecting,say, an apple is independent of which box is chosen. 1.2.1 Probability densities As well as considering probabilities defined over discrete sets of events,we also wish to consider probabilities with respect to continuous variables.We shall limit ourselves to a relatively informal discussion.If the probability of a real-valued variable x falling in the interval (x+6x)is given by p(x)ox for ox -0,then p(x)is called the probability density over x.This is illustrated in Figure 1.12.The probability that x will lie in an interval (a,b)is then given by p(x∈(a,b) p(z)dz. (1.24)

1.2. Probability Theory 17 Suppose instead we are told that a piece of fruit has been selected and it is an orange, and we would like to know which box it came from. This requires that we evaluate the probability distribution over boxes conditioned on the identity of the fruit, whereas the probabilities in (1.16)–(1.19) give the probability distribution over the fruit conditioned on the identity of the box. We can solve the problem of reversing the conditional probability by using Bayes’ theorem to give p(B = r|F = o) = p(F = o|B = r)p(B = r) p(F = o) = 3 4 × 4 10 × 20 9 = 2 3 . (1.23) From the sum rule, it then follows that p(B = b|F = o)=1 − 2/3=1/3. We can provide an important interpretation of Bayes’ theorem as follows. If we had been asked which box had been chosen before being told the identity of the selected item of fruit, then the most complete information we have available is provided by the probability p(B). We call this the prior probability because it is the probability available before we observe the identity of the fruit. Once we are told that the fruit is an orange, we can then use Bayes’ theorem to compute the probability p(B|F), which we shall call the posterior probability because it is the probability obtained after we have observed F. Note that in this example, the prior probability of selecting the red box was 4/10, so that we were more likely to select the blue box than the red one. However, once we have observed that the piece of selected fruit is an orange, we find that the posterior probability of the red box is now 2/3, so that it is now more likely that the box we selected was in fact the red one. This result accords with our intuition, as the proportion of oranges is much higher in the red box than it is in the blue box, and so the observation that the fruit was an orange provides significant evidence favouring the red box. In fact, the evidence is sufficiently strong that it outweighs the prior and makes it more likely that the red box was chosen rather than the blue one. Finally, we note that if the joint distribution of two variables factorizes into the product of the marginals, so that p(X, Y ) = p(X)p(Y ), then X and Y are said to be independent. From the product rule, we see that p(Y |X) = p(Y ), and so the conditional distribution of Y given X is indeed independent of the value of X. For instance, in our boxes of fruit example, if each box contained the same fraction of apples and oranges, then p(F|B) = P(F), so that the probability of selecting, say, an apple is independent of which box is chosen. 1.2.1 Probability densities As well as considering probabilities defined over discrete sets of events, we also wish to consider probabilities with respect to continuous variables. We shall limit ourselves to a relatively informal discussion. If the probability of a real-valued variable x falling in the interval (x, x + δx) is given by p(x)δx for δx → 0, then p(x) is called the probability density over x. This is illustrated in Figure 1.12. The probability that x will lie in an interval (a, b) is then given by p(x ∈ (a, b)) = & b a p(x) dx. (1.24)

1.2.Probability Theory 19 that the probability of x falling in an infinitesimal volume ox containing the point x is given by p(x)ox.This multivariate probability density must satisfy p(x)≥.0 (1.29) p(x)dx 1 (1.30) in which the integral is taken over the whole of x space.We can also consider joint probability distributions over a combination of discrete and continuous variables. Note that if x is a discrete variable,then p()is sometimes called a probability mass function because it can be regarded as a set of 'probability masses'concentrated at the allowed values of x. The sum and product rules of probability,as well as Bayes'theorem,apply equally to the case of probability densities,or to combinations of discrete and con- tinuous variables.For instance,if x and y are two real variables,then the sum and product rules take the form p(x) p(x,y)dy (1.31) p(x,）=p(z)p(x) (1.32) A formal justification of the sum and product rules for continuous variables (Feller, 1966)requires a branch of mathematics called measure theory and lies outside the scope of this book.Its validity can be seen informally,however,by dividing each real variable into intervals of width A and considering the discrete probability dis- tribution over these intervals.Taking the limit A-0 then turns sums into integrals and gives the desired result. 1.2.2 Expectations and covariances One of the most important operations involving probabilities is that of finding weighted averages of functions.The average value of some function f(x)under a probability distribution p()is called the expectation of f(x)and will be denoted by Ef].For a discrete distribution,it is given by 1=∑p)f) (1.33) so that the average is weighted by the relative probabilities of the different values of x.In the case of continuous variables,expectations are expressed in terms of an integration with respect to the corresponding probability density lf]=/p(c)f(x)dz. (1.34) In either case,if we are given a finite number N of points drawn from the probability distribution or probability density,then the expectation can be approximated as a

1.2. Probability Theory 19 that the probability of x falling in an infinitesimal volume δx containing the point x is given by p(x)δx. This multivariate probability density must satisfy p(x) " 0 (1.29) & p(x) dx = 1 (1.30) in which the integral is taken over the whole of x space. We can also consider joint probability distributions over a combination of discrete and continuous variables. Note that if x is a discrete variable, then p(x) is sometimes called a probability mass function because it can be regarded as a set of ‘probability masses’ concentrated at the allowed values of x. The sum and product rules of probability, as well as Bayes’ theorem, apply equally to the case of probability densities, or to combinations of discrete and continuous variables. For instance, if x and y are two real variables, then the sum and product rules take the form p(x) = & p(x, y) dy (1.31) p(x, y) = p(y|x)p(x). (1.32) A formal justification of the sum and product rules for continuous variables (Feller, 1966) requires a branch of mathematics called measure theory and lies outside the scope of this book. Its validity can be seen informally, however, by dividing each real variable into intervals of width ∆ and considering the discrete probability distribution over these intervals. Taking the limit ∆ → 0 then turns sums into integrals and gives the desired result. 1.2.2 Expectations and covariances One of the most important operations involving probabilities is that of finding weighted averages of functions. The average value of some function f(x) under a probability distribution p(x) is called the expectation of f(x) and will be denoted by E[f]. For a discrete distribution, it is given by E[f] = " x p(x)f(x) (1.33) so that the average is weighted by the relative probabilities of the different values of x. In the case of continuous variables, expectations are expressed in terms of an integration with respect to the corresponding probability density E[f] = & p(x)f(x) dx. (1.34) In either case, if we are given a finite number N of points drawn from the probability distribution or probability density, then the expectation can be approximated as a

点击进入文档下载页（PDF格式）

共700页，可试读40页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录