10The Multivgriate Nornial and Related DistributionsThe density function (6) is constant whenever the quadratic form in theexponent is, sothat it is constant on theellipsoid(x-μ)-(x-μ)=kinRm,for everyk>0.Thisellipsoid has center μ,while determines itsshape and orientation.It is worthwhile looking explicitly at the bivariate normal distribution(m=2). In this caseandpg.gd.dulg12022og.g.where Var(X,)=a?,Var(X,)= 2, and the correlation between X, and X, isP.For the distribution of x to be nonsingular normal we need o? >o,>0,anddet = (1- p2)>0sothat-<p<1.Whenthisholds,1po0,021p.o20,02andthe joint density function of X, andX, is(7)12-4(x1,x2)=2m0,0,(1 p2)/zex2(1 -p2(x-μi)(x2 -μ2)-200,02
I0 The Mulriuunute Norntul and Relured Durrihutiorts The density function (6) is constant whenever the quadratic form in the exponent is, so that it is constant on the ellipsoid in Rm, for every k >O. This ellipsoid has center p, while C determines its shape and orientation. It is worthwhile looking explicitly at the bivariate normal distribution (m =2). In this case and where Var( XI)= a:, Var( X2)= a:, and the correlation between XI and X, is p. For the distribution of X to be nonsingular normal we need af10, a; >O, and det 2 = ufo;( 1 - p2)=-0 so that - 1 <p < 1. When this holds, and the joint density function of XI and X2 is (7)
The Multvariate Normal Distrihution二1The"standard"bivariate normal density function is obtained from this bytransforming to standardized variables. Putting Z,=(X, -μ,)/o, (i=1,2),the joint density function of Z,and Z, isz2+z3-2pz1(8) (z1/22)=2m(1- p2)*/zexp2(1-p2)This density is constant on the ellipse(9)(z+22-2pz)22)=kin R2, for every k >0. (Some properties of this ellipse are explored inProblem 1.3.)In orderto prove the next theorem we will use the following lemma.Inthis lemma the notations R(M)and K(M)for an n X r matrix M denote therange (or column space) and kernel (or null space) respectively:(10)R(M)=(vER":v=MuforsomeuERr)(11)K(M)=(uER':Mu=0).Clearly R(M) is a subspace of R", and K(M) is a subspace of R'.LEMMA1.2.10.If the mXm matrix Z is non-negativedefiniteand ispartitioned as[2.12Z[≥21222]whereZ..iskXkandZ22is(m-k)x(m-k)then:(a) K(Z22)C K(1z)(b) R(Z21)C R(Z22)Proof. (a) Suppose zE K(222). Then, for all yE R* and αE R' we have[2212(y'az))=y'2ny+2ay'Z122+αz2222212221o=y2y+2ayZ12(becauseZ22z=0)≥0(because is non-negative definite)
The Mulrtouriule Normul Disfrthuriotr I 1 The "standard" bivariate normal density function is obtained from this by transforming to standardized variables. Putting 2, =( X, - pl)/ul (i = 1,2), the joint density function of Z, and 2, is This density is constant on the ellipse (9) 1 -( z; + 2; -2pz,z2)= k 1-p2 in RZ, for every k>O. (Some properties of this ellipse are explored in Problem 1.3.) In order to prove the next theorem we will use the following lemma. In this lemma the notations R( M) and K(M) for an n X r matrix M denote the range (or column space) and kernel (or null space) respectively: (10) R(M)=(vER"; v=Mu forsome uER') Clearly R( M) is a subspace of R", and K( M) is a subspace of R'. LEMMA 1.2.10. partitioned as If the m X m matrix Z is non-negative definite and is where XI, is k X k and X, is (m - k)X(m - k) then: (a) K(.,)C K(.I,) (b) R(X,)C R(Z22) Proob (a) Suppose z E K( 2,*). Then, for all y E Rk and a€ R' we have = y'Z, , y + 2ay'I: 12 z 20 (because ZZ2z = 0) (because Z is non-negative definite)
12The Multivariate Normal and Related DistrioutionsTaking y=Zr2z then gives2E,12+2α(12z)(2)≥0for all a, which means that Z,2z=0,i.e.,zEK(Z,2).HenceK(222)CK(Eμ2).Then, part (b) follows immediately on noting that K(Z,2)*CK(222)*,whereK(M)denotes the orthogonal complement ofK(M) [i.e.,the setofvectors orthogonal toeveryvector inK(M))and usingtheeasilyproved fact that(12)K(M)*=R(M).Ournexttheorem showsthattheconditional distribution of asubvectorof a normally distributed vector given the remaining components is alsonormal.THEOREM1.2.1l.LetXbeN.(μ,Z)andpartitionX,μandZasX[2m212M:X[≥21222]where X, and μ,are k ×l and Z, isk Xk.Let 2 be a generalized inverseof 222,i.e.,amatrix satisfying(13)222222222222andlet2-2=2-2122222.Then(a),-1isN(p,-,22)and is independent of X2,and(b)theconditional distribution ofX,givenX,isN(μ,+21222(X2μ2),Zu-2)Proof.FromLemma1.2.10wehaveR(≥)CR(Z)sothalthereexists a k X(m-k)matrix Bsatisfying(14)12=B222.Nownote that(15)2=B22B=21
I2 The Multivuriure Normul und Reluted Distributions Taking y= Z12z then gives Z'Z~~Z I I 2 ,2 z + 2a( Z12 z)'( Z I2 z) 24) for dl a, which means that Z122=0, i.e., zEK(Z,). Hence K(Z,)C K(Z12), Then, part (b) follows immediately on noting that K(Z12)L C K(X,)", where K(M)* denotes the orthogonal complement of K(M) [i.e., the set of vectors orthogonal to every vector in K( M)] and using the easily proved fact that (12) K( M)' = R( M'). Our next theorem shows that the conditional distribution of a subvector of a normally distributed vector given the remaining components is also normal. THEOREM 1.2.1 1. Let X be N,(p, Z) and partition X, p and Z as where XI and pI are k X 1 and Xll is k X k. Let Xi2 be a generalized inverse of Z, i.e., a matrix satisfying (13) Z22Z,Zz, = 22, and let 211.2=ZlI -Z12Z;2X21. Then (a) X, -Z12&iX2 is %(PI -Z~ZXUP~,&I.~) and is independent of X, and (b) the conditional distribution of XI given X, is ~~(cl,+~lz~,(x,-cl2),~II.2). Prook From Lemma 1.2.10 we have R(Z,)cR(Z,) so that there exists a k X(m - k) matrix E satisfying Now note that
TheMultvariateNormal Distrbution13wherewehaveused (13)and (14).PutL212222Im-k0then, by Theorem 1.2.6,X, -21222XCX:Xis m-variate normal with meanμ,-212222M212and covariancematrix11(IrIx[M]212021222CEC2210222Im-kE22004[2u-2using (15)N21222]Im-k]222221211-200222]The first assertion (a) is a direct consequence of Theorems 1.2.7 and 1.2.8while the second (b) follows immediately from (a) by conditioning on X2.When the matrix 22 is nonsingular, which happens, for example, when 2is nonsingular, then 222-222'and 21.2--E12222'22The theoremis somewhat easierto prove inthis case.Themean of the conditional distribution of X,given Xe,namely(16)E(X,/X2)= μ +212222(X, -μ2)is called the regression function of X, on X, with matrix of regressioncoefficients Z,,Zz2. It is a linear regression function since it dependslinearly on the variables X, being held fixed. The covariance matrix Zu.2 ofthe conditional distribution of X, given X, does not depend on X2, thevariables being held fixed
The Multrvuriute Normal Distribution I3 where we have used (13) and (14). Put 'm-k then, by Theorem 1.2.6, is m-variate normal with mean and covariance matrix The firs: assertion (a) is a direct consequence of Theorems 1.2.7 and 1.2.8 while the second (b) follows immediately from (a) by conditioning on X,. When :he matrix C2, is nonsingular, which happens, for example, when Z is nonsingular, then 2, = 2,' and L'll.2 = XII - X122~1L121. The theorem is somewhat easier to prove in this case. The mean of the conditional distribution of XI given X, namely, is called the regression function of X, on X, with matrix of regression coefficients Z,Z;. It is a linear regression function since it depends linearly on the variables X, being held fixed. The covariance matrix 2, I. of :he conditional distribution of XI given X, does nor depend on X, the variables being held fixed
14The Muliugriate Normal and Related DistrtbutionsThere are many characterizations of the multivariate normal distribution.We will look at just one; others may be found in Rao (1973) and Kaganet al.(1972).We will need the following famous result due to Cramer(1937), which characterizes the univariate normal distribution.LEMMA1.2.12.IfXand Yareindependentrandomvariableswhose sumX+Y is normally distributed,then both X and Y are normally distributed.A proof of this lemma isgiven by Feller (1971),Section XV.8.THEOREM1.2.13.If themX1random vectorsX andYare independentand X+Y has an m-variate normal distribution, then both X and Y arenormal.Proof.For each aE R",a(X+Y)=α'X+α'Y is normal (by Definition1.2.3, since X+Yis normal).Since αX and α'Yare independent,Lemma1.2.12 implies that they are both normal, and hence X and Y are bothm-variatenormal.This proof looks easy and uses theobvious trick of reducing theproblemto a univariate one by using our definition of multivariate normality. Wehave,however,glossed over thehard part, namely,the proof of Lemma1.2.12.A well-known property of the univariate normal distribution is thatlinear combinations of independent normal random variables are normal.This generalizes to the multivariate situation in an obvious way.THEOREM 1.2.14.If X,...,X are all independent, and X, is N.(μ,2,)for i=I,...,N, then for any fixed constants a.....,a,ax is MNMa.p..EaN1=1i=11= 1The proof is immediate from Definition 1.2.3, or by inspection of thecharacteristic function of E,a,x,. It is left to the reader to finl in thedetails (Problem 1.5).COROLLARY 1.2.15.If X,....,X are independent, each having theNm(μ,2)distribution,then the distribution of the sample mean vectorN1K2 x,N=lis N.(μ,(1 / N)Z)
I4 The Multiuuriute Normul und Reluted Distributions There are many characterizations of the multivariate normal distribution. We will look at just one; others may be found in Rao (1973) and Kagan et al. (1972). We will need the following famous result due to CramCr (1937), which characterizes the univariate normal distribution. LEMMA 1.2.12. If X and Y are independent random variables whose sum X + Y is normally distributed, then both X and Y are normally distributed. A proof of this lemma is given by Feller (lY71), Section XV.8. THEOREM 1.2.13. If the m X 1 random vectors X and Y are independent and X+Y has an m-variate normal distribution, then both X and Y are normal. Proo/. For each aE A", a'(X3- Y)= a'X+ a'Y is normal (by Definition 1.2.3, since X+Y is normal). Since a'X and a'Y are independent, Lemma 1.2.12 implies that they are both normal, and hence X and Y are both m-variate normal. This proof looks easy and uses the obvious trick of reducing the problem to a univariate one by using our definition of multivariate normality. We have, however, glossed over the hard part, namely, the proof of Lemma 1.2.12. A well-known property of the univariate normal distribution is that linear combinations of independent normal random variables are normal. This generalizes to the multivariate situation in an obvious way. THEOREM 1.2.14. If X ,., X, are all independent, and XI is N,(p, C,) for i = 1,. . ., N, then for any fixed constants a ,. .,aN, The proof is immediate from Definition 1.2.3, or by inspection of the characteristic function of Efl=,a,X,. It is left to the reader to fill in the details (Problem 1.5). COROLLARY 1.2.15. If X, ., XN are independent, each having the N,(p, Z) distribution, then the distribution of the sample mean uector - lN x=- 2 XI Nl=