TheMultivariateNormalDistribunonSDEFINITION1.2.3.ThemXIrandom vector X is said to have anm-variate normal distribution if, for every aE R, the distribution of a'x isunivariate normal.Proceeding from this definition we will now establish some properties ofthe multivariatenormal distribution.THEOREM 1.2.4.If X has an m-variate normal distribution then bothμ= E(X) and Z=Cov(X) exist and the distribution of X is determined by μand Z.Proof.If X=(Xu...,X..) then, for each i=I....,m, X, is univariatenormal (using Definition 1.2.3) so that E(X,) and Var(X,) exist and arefinite.Thus Cov(X,X)exists.(Why?)Putting μ=E(X)and =Cov(X),wehave, from (1)and (3),E(αx)=α'μandVar(α'X)= αZaso that the distribution of a'X is N(αa'μ,α'Za) for each aERm.Since theseunivariate distributions are determined by μ and Z so is the distribution ofX by Theorem 1.2.2.The m-variate normal distribution of the random vector X of Theorem1.2.4 will be denoted by Nm(μ, 2) and we will write that X is Nm(μ, 2).THEOREM1.2.5.IfXisN.(μ,Z)then thecharacteristicfunctionofX is(4)(t)=exp(ip't-tEt)Proof.Here中,(0)= E[eirx]=Φx(1),where the right side denotes the characteristic function of the randomvariable t'X evaluated at I. Since X is N.(μ, Z) then t'x is N(t'p,t'Zt) sothatdrx(1)=exp(it'u-1rZt),completingtheproof
The Multivuriate Norniul Distrihunon 5 DEFINITION 1.2.3. The mX 1 random vector X is said to have an m-variate normal distribution if, for every a€ R“‘, the distribution of a’X is univariate normal. Proceeding from this definition we will now establish some properties of the multivariate normal distribution. THEOREM 1.2.4. If X has an m-variate normal distribution then both p = E(X) and X =Cov(X) exist and the distribution of X is determined by p and 2. If X=( X,. , .,X,)’ then, for each i= 1,. . .,tn, XI is univariate normal (using Definition 1.2.3) so that E(X,) and Var(X,) exist and are finite. Thus Cov( X, X,) exists. (Why?) Putting p = E(X) and X =Cov(X), we have, from (1) and (9, Prooj. E( a’X) = a‘p and Var( a’X) = a’Za so that the distribution of a’X is N(a’p, a’&) for each a€ Rm. Since these univariate distributions are determined by p and 2 so is the distribution of X by Theorem 1.2.2. The m-variate normal distribution of the random vector X of Theorem 1.2.4 will be denoted by N,(p, Z) and we will write that X is Nm(p, 2). THEOREM 1.2.5. If X is N&, 2) then the characteristic function of X is (4) +,( t) = exp( ip’t - 5 t’Z t). Proofi Here where the right side denotes the characteristic function of the random variable t’X evaluated at 1. Since X is Nm(p,2) then t’X is N(t’p,t‘Xt) so that q+,x( I) = exp( it’p - 4 t’Zt), completing the proof
TheMultivariateNormal andRelated DistributionsSThe alert reader may have noticed that we have not yet established theexistence of the multivariate normal distribution.It couldbethat Definition1.2.3 is vacuous! To sew things up we will show that the function given by(4) is indeed the characteristic function of a random vector. Let be anmX m covariance matrix (i.e.,a non-negative definitematrix) of rank r andlet U,.., U, be independent standard normal random variables. The vectorU=(U.....,U,)has characteristicfunctionp(0) = E[exp(it'U)]= il E[exp(it,u)](by independence)1m= I exp(- ↓t,)(bynormality)=exp(- +tt).Nowput(5)X=CU+μwhere C is an m Xr matrix of rank r such that -CC', and μE R",ThenX has characteristic function (4), forE[exp(it'x)l=E[exp(it'cU))exp(it'p)=中.(C't)exp(it'p)=exp(-+tcc't)exp(ip't)=exp(ip't-tEt)It is worth remarking that we could have defined the multivariate normaldistributionN.(μ,2)bymeans of the linear transformation(5)on indepen-dent standard normal variables. Such a representation is often useful; see,forexample,theproofofTheorem1.2.9.Getting back to the properties of the multivariate normal distribution ournext result shows that any linear transformation of a normal vector has anormal distribution.THEOREM1.2.6.IfXisN.(μ,Z)andBiskXm,biskX1thenY=BX+bisN,(Bμ+b,BZB')
6 The Multiouriute Nwmul und Reluted Distrihutioris The alert reader may have noticed that we have not yet established the existence of the multivariate normal distribution. It could be that Definition 1.2.3 is vacuous! To sew things up we will show that the function given by (4) is indeed the characteristic function of a random vector. Let Z be an m X rn covariance matrix (i.e., a non-negative definite matrix) of rank r and let U,. , , r/, be independent standard normal random variables. The vector U=(V, ., 4)’ has characteristic function +“(t) = E[exp( it'^)] I = fl E[exp( if,V,)] = fl exp( - if,?) = exp( - 4 t‘t) . (by independence) J=I r (by normality) J’I Now put (5) x=cu+p where C is an m X r matrix of rank r such that Z = CC’, and pE R’”. Then X has characteristic function (4), for ~[exp(it’~)] = ~[exp(it’~~)] exp(it’p) = +“( C’t) exp( it’p) = exp( - 4 t’CC’t) exp( ip’t) = exp( ip’t - f t’Xt). It is worth remarking that we could have defined the multivariate normal distribution N,(p, 2) by means of the linear transformation (5) on independent standard normal variables. Such a representation is often useful; see, for example, the proof of Theorem 1.2.9. Getting back to the properties of the multivariate normal distribution our next result shows that any linear transformation of a normal vector has a normal distribution. THEOREM 1.2.6. If X is N,(p, 2) and B is k X m, b is k X 1 then Y = BX+ b is N,( Bp + b, BZB‘)
The Multtuariate Normal DistributionProof.The fact that Y is k-variate normal is a direct consequence ofDefinition 1.2.3, since all linear functions of the components of Y are linearfunctions of the components of X and these are all normal. The mean andcovariance matrix of Y are clearly those stated.A very important property of the multivariate normal distribution is thatall marginal distributions arenormal.THEOREM 1.2.7. If X is Nm(μ,2) then the marginal distribution of anysubsetof k(<m)components of X isk-variate normal.Proof.This follows directlyfrom the definition,or from Theorem 1.2.6.For example,partitionX,μ,and as[2212[≥21≥22]whereX, and p,arek XI and u, isk Xk.PuttingB=[I:0] (k×m),b=0in Theorem1.2.6 shows immediately thatX,isN,(μ,Z..).Similarly,themarginal distribution of any subvector of k components of X is normal,where the mean and covariance matrix are obtained from μ and bypicking out the corresponding subvector and submatrix in an obvious way.One consequence of this theorem (or of Definition 1.2.3) is that themarginal distribution of each component of X is univariate normal. Theconverse is not true in general; that is, the fact that each component of arandom vector is (marginally)normal does not imply that the vector has amultivariate normal distribution.[This is one reason why the problem oftesting multivariate normality is such a thorny one in practice. See Gnana-desikan (1977), Chapter 5.J As a counterexample, suppose U,U2,U, areindependent N(O, 1)random variables and Z is an arbitrary random variable, in-dependentofUj,U,andU.DefineX,andX,byU,+ZU1X,1+z2U, + ZU,X2yi+z2Conditional on Z, X, is N(O, I), and since this distribution does not dependon Z it is the unconditional distribution of X,. Similarly X, is N(O, I).Again
The Multiwriute Norinul Disrrihurion 7 Proof. The fact that Y is k-variate normal is a direct consequence of Definition 1.2.3, since all linear functions of the components of Y are linear functions of the components of X and these are all normal. The mean and covariance matrix of Yare clearly those stated. A very important property of the multivariate normal distribution is that all marginal distributions are normal. THEOREM 1.2.7. If X is N,(p, Z) then the marginal distribution of any subset of k( < m) components of X is k-variate normal. Proof. This follows directly from the definition, or from Theorem 1.2.6. For example, partition X, p, and I: as where X, and p, are k X 1 and Z, is k X k. Putting B=[I,:O] (kxm), b=O in Theorem 1.2.6 shows immediately that XI is Nk(pI, Z,). Similarly, the marginal distribution of any subvector of k components of X is normal, where the mean and covariance matrix are obtained from p and 2 by picking out the corresponding subvector and submatrix in an obvious way. One consequence of this theorem (or of Definition 1.2.3) is that the marginal distribution of each component of X is univariate normal. The converse is not true in general; that is, the fact that each component of a random vector is (marginally) normal does not imply that the vector has a multivariate normal distribution. [This is one reason why the problem of testing multivariate normality is such a thorny one in practice. See Gnanadesikan (1977), Chapter 5.1 As a counterexample, suppose U, U, U, are independent N(0, 1) random variables and Z is an arbitrary random variable, independent of UI, U, and U3. Define X, and X, by U2 + Z U3 x2= 111+3 - Conditional on 2, XI is N(0, I), and since this distribution does not depend on Z it is the unconditional distribution of XI. Similarly X, is N(0, I). Again
8The Multivariate Normal and Related Distributionsconditional on Z, the joint distribution of X, and X, is bivariate normal butthe unconditional distribution clearly need not be. Other examples aregiveninProblems1.7,1.8,and 1.9.Obviously the converse is trueif the compo-nents of X are all independent and normal, or if X consists of independentsubvectors, each of which is norinally distributed. For then linear functionsof the components of X are linear functions of independent normal randomvariables and hence are normal. This fact will be used in the proof of thenext theorem.Thereader will recall that independenceof tworandom variables impliesthat the covariance between them, if it exists, is zero, but that the converse isnot true in general. It is, however, for the multivariate normal distribution,asthefollowingresultshows.THEOREM 1.2.8. If X is N.(μ,E) and X, μ, and Z are partitioned as/2122M=1.221222where X, and μ, are k ×I and Z. is k ×k, then the subvectors X, and X,are independent if and onlyifZiz=0.Proof.,,is the matrix of covariances between the components of Xand the components of X,so independence of X,and X,implies thatZi2=0.Now suppose that Z,z0.Let Y,Y, be independent randomvectors whereY, is N(μ,2n)and Y, is Nm-(μ2,222)and put =(Yj.Y2)Then both X and Y are Nm(μ,2), where1so that they are identically distributed. Hence X, and X, are independent.Alternatively this result is also easily established using the fact that thecharacteristic function (4)of X factors into the product of the characteristicfunctionsof X, andX,when22=0 (seeProblem 1.1)Theorem 1.2.8 can be extended easily and in an obvious way to the casewhereX is partitioned into a number of subvectors (see Problem 1.2).Theimportant message here is that in order to determine whether two subvectors of a normally distributed vector are independent it suffices to checkthat the matrix of covzriances between the two subvectors is zero.Let us now address the problem of finding the density function of arandom vector X having the N.(μ, E) distribution. We have already notedthat if is not positive definite, and hence is singular, then X lies in some
8 The Multivariate Normal and Related Distributions conditional on 2, the joint distribution of XI and X2 is bivariate normal but the unconditional distribution clearly need not be. Other examples are given in Problems 1.7, 1.8, and 1.9. Obviously the converse is true if the components of X are all independent and normal, or if X consists of independent subvectors, each of which is normally distributed. For then linear functions of the components of X are linear functions of independent normal random variables and hence are normal. This fact will be used in the proof of the next theorem. The reader will recall that independence of two random variables implies that the covariance between them, if it exists, is zero, but that the converse is not true in general. It is, however, for the multivariate normal distribution, as the following result shows. THEOREM 1.2.8. If X is A’,@, 2) and X, p, and I: are partitioned as (“I), (PI), .=( 211 2”) != I42 22, =22 ’ X= x2 where XI and p, are k X I and Z, is k X k, then the subvectors XI and X2 are independent if and only if Z, =O. Pruut Z, is the matrix of covariances between the components of X, and the components of X, so independence of XI and X, implies that C, =O. Now suppose that 2,2 =O. Let Y,Y2 be independent random vectors where Y, is A’&, XI,) and Y2 is Nm-,(p2, Z22) and put Y =(Y;,Y;)’. Then both X and Y are Nm(p, Z), where so that they are identically distributed. Hence XI and X2 are independent. Alternatively this result is also easily established using the fact that the characteristic function (4) of X factors into the product of the characteristic functions of XI and X, when Z, =O (see Problem 1. I) Theorem 1.2.8 can be extended easily and in an obvious way to the case where X is partitioned into a number of subvectors (see Problem I .2). The important message here is that in order to determine whether two subvectors of a normally distributed vector are independent it suffices to check that the matrix of covzriances between the two subvectors is zero. Let us now address the problem of finding the density function of a random vector X having the N,(p, X) distribution. We have already noted that if Z is not positive definite, and hence is singular, then X lies in some
The Multwariate Normal Disirihuton9hyperplane with probability I so that a density function for X (with respectto Lebesgue measure on Rm)can not exist. In this case X is said to have asingular normal distribution. If is positive definite, and hence nonsingular,the density function of X does exist and is easily found using the representa-tion (5) of X in terms of independent standard normal random variables.THEOREM 1.2.9.If X is N..(μ, Z) and Z is positive definite then thedensity function of X is(x)=(2m)-m/2(det Z)-1/2exp[-±(x-μ)2-(x-μ)],(6) (Here, and throughout the book, det denotes determinant.)Proof. Write = CC where C is a nonsingular m X m matrix and putX=cU+p,where U is an m ×I vector of independent N(o, I) random variables, i.e., Uis Nm(o, Im). The joint density function of U....,Um. isJu(u)= II (2m)-1/2exp(-↓u)=(2m)"m/2exp(-↓u'u).The inverse transformation is U= B(X-μ),with B= C-1, and the Jacobianof thistransformationis[auauiaxaxm[bb12bim::det=detaumaum[bmlbm2bmmxmax,=detB=det C-1=(det C)-I=[det(CC']-1/2=(det 2)-1/2so that the density function of X is(x)=(2)=m/2(det 2)-1/2 exp[-(×-μ)C-"c- (x-μ)];and sinceZ-I=C-I'c-1,weare done
The Multtvurrare Normal Disrrihurron 9 hyperplane with probability 1 so that a density function for X (with respect to Lebesgue measure on R") can not exist. In this case X is said to have a singular normal distribution. If 2 is positive definite, and hence nonsingular, the density function of X does exist and is easily found using the representation (5) of X in terms of independent standard normal random variables. THEOREM 1.2.9. If X is N,(p, I:) and I: is positive definite then the density function of X is (6) L(x) = (2n)- "'12(det 2)- 'I2 exp[ - $ (x - p)'Z'-'(x - p )] . (Here, and throughout the book, det denotes determinant.) ProoJ Write Z = CC' where C is a nonsingular m X m matrix and put x=cu+p, where U is an m X 1 vector of independent N(0,l) random variables, i.e., U is N,(O, I,). The joint density function of UI, . . , , Urn is ju(u)= II (2n)-'12exp( - ju:) tN J=l = (2a) - '"I2exp( - iu'u). The inverse transformation is U= B(X- p), with B=C-', and the Jacobian of this transformation is detl i = det =det B=detC-'=(detC)-' = [det(CC')]-Ii2 =(det 2)-Ii2 so that the density function of X is f,(x) = (2n).- ,12(det 2)-"*exp[ - f (x- p)'C-''Cc-'(x- p)] ; and since Z-' = C-I'C-', we are done