Aspectsof MultivariateStatistical Theory
Aspects of Multivariate Statistical Theory
Aspects ofMultivariateStatistical TheoryROBBLMUIRHEADCopyright 1982, 2005 by John Wiley & Sons, IncCHAPTER 1The Multivariate Normal andRelated Distributions1.1.INTRODUCTIONThe basic, central distribution and building block in classical multivariateanalysis is the multivariate normal distribution. There are two main reasonswhy this is so. First, it is often the case that multivariate observations are, atleastapproximately,normally distributed.This is especially true of samplemeans and covariance matrices used in formal inferential procedures, due toa central limit theorem effect. This effect is also felt, of course, when theobservations themselves can be regarded as sums of independent randomvectors or effects, a realistic model in many situations. Secondly,themultivariate normal distribution and the sampling distributions it gives riseto are, in the main, tractable. This is not generally the case with othermultivariate distributions, even for ones which appear to be close to thenormal.We will be concerned primarily with classical multivariate analysis, thatis,techniques, distributions, and inferences based on the multivariate nor-mal distribution. This distribution is defined in Section 1.2 and variousproperties are also derived there. This is followed by a review of thenoncentral x? and F distributions in Section 1.3 and some results aboutquadraticforms innormalvariables inSection1.4.A natural question is to ask what happens to the inferences we makeunder the assumption of normality if the observations are not normal. Thisis an important question, leading into the area that has come to be knowngenerally as robustness. In Section I.5 we introduce the class of ellipticaldistributions; these distributions have been commonly used as alternativemodels in robustness studies. Section 1.6 reviews some results about multi-variate cumulants. For our purposes, these are important in asymptoticdistributions of test statistics which are functions of a sample covariancematrix.1
CHAPTER 1 The Multivariate Normal and Related Distributions 1. I. INTRODUCTION The basic, central distribution and building block in classical multivariate analysis is the multivariate normal distribution. There are two main reasons why this is so. First, it is often the case that multivariate observations are, at least approximately, normally distributed. This is especially true of sample means and covariance matrices used in formal inferential procedures, due to a central limit theorem effect. This effect is also felt, of course, when the observations themselves can be regarded as sums of independent random vectors or effects, a realistic model in many situations. Secondly, the multivariate normal distribution and the sampling distributions it gives rise to are, in the main, tractable. This is not generally the case with other multivariate distributions, even for ones which appear to be close to the normal. We will be concerned primarily with classical multivariate analysis, that is, techniques, distributions, and inferences based on the multivariate normal distribution. This distribution is defined in Section 1.2 and various properties are also derived there. This is followed by a review of the noncentral x2 and F distributions in Section 1.3 and some results about quadratic forms in normal variables in Section 1.4. A natural question is to ask what happens to the inferences we make under the assumption of normality if the observations are not normal. This is an important question, leading into the area that has come to be known generally as robustness. In Section 1.5 we introduce the class of elliptical distributions; these distributions have been commonly used as alternative models in robustness studies. Section 1.6 reviews some results about multivariate cumulants. For our purposes, these are important in asymptotic distributions of test statistics which are functions of a sample covariance matrix. I Aspects of Multivanate Statistical Theow ROBE I. MUlRHEAD Copyright 8 1982.2WS by John Wiley & Sons. I~C
2The Mutrivariate Normal and Relared DistributionsIt is expected that the reader is familiar with basic distributions such asthe normal, gamma, bela, t, and F and with the concepts of jointlydistributed randomvariables,marginal distributions,moments,conditionaldistributions, independence, and related topics covered in such standardprobability and statistics texts as Bickel and Doksum (1977) and Roussas(1973).Characteristic functions and basic limit theorems are also important anduseful references are Cramer (1946),Feller (1971),and Rao (1973).Matrixnotation and theory is used extensively; some of this theory appears in thetextandsomeisreviewedintheAppendix.1.2.THE MULTIVARIATE NORMAL DISTRIBUTION1.2.1.DefinitionandPropertiesBefore proceeding to the multivariate normal distribution we need to definesome moments of a random vector, i.e., a vector whose components arejointly distributed. The mean or expectation of a random mXI vectorX=(X....,X.) is defined to be the vector of expectations:(E(X,)E(X)=E(Xm)More generally, if Z=(z,) is a pXq random matrix then E(Z), theexpectationofZ,isthematrixwhose i-jth element is E(z,).It isa simplematter to check that if B, C and D are m X p, gX n and m X n matrices ofconstants,then(1)E(BZC+D)=BE(Z)C+D.If X has meanμthe covariance matrix of X is defined to be the m X mmatrix=Cov(X)= E[(X-μ)(X-μ)The i- jth element of Z iso, =E[(X, -μ,)(x, -μ,)]
2 The Mulriourruie Normul und Related Uisrributions It is expected that the reader is familiar with basic distributions such as the normal, gamma, beta, t, and F and with the concepts of jointly distributed random variables, marginal distributions, moments, conditional distributions, independence, and related topics covered in such standard probability and statistics texts as Bickel and Doksum (1977) and Roussas (1973). Characteristic functions and basic limit theorems are also important and useful references are CramCr (1946). Feller (1971), and Rao (1973). Matrix notation and theory is used extensively; some of this theory appears in the text and some is reviewed in the Appendix. I .2. THE MULTIVARIATE NORMAL DISTRIBUTION 1.2.1. Definition and Properiies Before proceeding to the multivariate normal distribution we need to define some moments of a random vector, i.e., a vector whose components are jointly distributed. The mean or expectation of a random M X I vector X=( XI,.,&,)' is defined to be the vector of expectations: More generally, if Z=(z,) is a p X 4 random matrix then E(Z), the expectation of Z, is the matrix whose i-jth element is E(z,). It is a simple matter Lo check that if 8, C and D are m X p, q X n and m X n matrices of constants, then (1) E( BZC + D) = BE( Z)C + D. If X has mean p the covariance matrix of X is defined to be the m X rn matrix I: rCov(X)= E[(X-p)(X-p)q. The i-jth element of Z is
The Muluvariate Normal Distrthution3the covariance between X, and X,, and the i-ith element is0, = E[(x, - μ,)],the variance of X, so that the diagonal elements of must be nonnegativeObviously is symmetric, i.e, =Z. Indeed, the class of covariancematrices coincides with the class of non-negative definite matrices. Recallthat an m X m symmetric matrix A is called non-negative definite ifforall αERmα'Aαz0andpositivedefinite ifforallαER",α+oα'Aα>0(Here, and throughout the book, R denotes Euclidean space of m dimen-sionsconsistingofmXIvectorswithreal components.)LEMMA 1.2.1.The mX m matrix Z is a covariance matrix if and only ifit is non-negative definite.Proof.Suppose is the covariance matrix of a random vector X, whereXhasmeanμ,ThenforallαERm,(2)Var(α"X)= E[(aX-a'μ)]= E[(α(X-μ)]= E[α(X-μ)(X-μ)α]=α2α≥0so that is non-negative definite. Now suppose is a non-negative definitematrix of rank r, say (r ≤m). Write Z= CC', where Cis an m Xr matrix ofrank r (see Theorem A9.4). Let Y be an r × 1 vector of independent randomvariables with mean 0 and Cov(Y)=I and put X=CY. Then E(X)=0 andCov(X)= E(XX]=E[CYY'C')=CE(YY)C"=CC'=Z,sp that is a covariance matrix
The Mulrrwrture Normul Dtsmhitrton 3 the covariance between XI and 3, and the i-ith element is the variance of X, so that the diagonal elements of Z must be nonnegative. Obviously Z is symmetric, i.e., Z=Z'. Indeed, the class of covariance matrices coincides with the class of non-negative definite matrices. Recall that an m X m symmetric matrix A is called non-negative definite if a'Aa20 for all a€ R"' and positive definite if a'Aa>O forall aER"', a#O (Here, and throughout the book, R"' denotes Euclidean space of m dimensions consisting of m X 1 vectors with real components.) LEMMA 1.2.1. The m X m matrix Z is a covariance matrix if and only if it is non-negative definite. Proof. Suppose Z is the covariance matrix of a random vector X, where X has mean p, Then for all a€ R", = E[a'(X-p)(X-p)'a] = a'Xa 80 so that Z is non-negative definite. Now suppose 2 is a non-negative definite matrix of rank r, say (I 5 m). Write Z = CC', where C is an m X r matrix of rank r (see Theorem A9.4). Let Y be an r X 1 vector of independent random variables with mean 0 and Cov(Y)= I and put X= CY. Then E(X)=O and Cov(X) = E[XX'] = E[ c WC'] = CE(W')C' = CC'= 2, sb that Z is a covariance matrix
The Muftvariate Normal and Related Distributions.As a direct consequence of the inequality (2) we see that if the covariancematrix Z of a random vector X is not positive definite then, with probabilityI,the components X,of X are linearlyrelated.For then there exists αERmα0,such thatVar(αX)= α'2α=0so that, with probability l, a'X=k, where k = a'E(X)-which means thatX lies ina hyperplane.Wewill commonlymake linear transformationsof random vectors andwill need to know how covariance matrices are transformed. Suppose X isan mX1 randomvectorwith mean μxand covariance matrixZ,and letY=Bx+b, where B is kXm and b is k X1.Themean of Y is, by (1),μ,=Bμ,+b, and the covariancematrix of Y is(3)E,=E[(Y-μ,)(Y-μ,)]= E[(BX+b-(Bμ+b)(BX+b-(Bμx+b)))= BE[(X-μx)(Xμx))B=BZ,B'.In order to define the multivariate normal distribution we will use thefollowing result.THEOREM 1.2.2.If X is an mX1 random vector then its distribution isuniquely determined by the distributions of linear functions a'X, for everyaER".Proof.The characteristic function of α'x isp(t,α)=E[e"a'x]so that(1,α)= E[e"a"x],which, considered as a function of α, is the characteristic function of X (i.e..the joint characteristic function of the components of X). The requiredresult then follows by invoking the fact that a distribution in R" is uniquelydetermined by its characteristic function [see, e.g., Cramer (1946), Section10.6,orFeller(1971),SectionXV.7)
4 The Mulrrvuriurr Normd und Related Distributions As a direct consequence of the inequality (2) we see that if the covariance matrix 2 of a random vector X is not positive definite then, with probability 1, the components X, of X are linearly related. For then there exists a€ R", a ZO, such that Var( a'X) = a'Za =O so that, with probability 1, a'X= k, where k = a'E(X)-which means that X lies in a hyperplane. We will commonly make linear transformations of random vectors and will need to know how covariance matrices are transformed. Suppose X is an rn X 1 random vector with mean p, and covariance matrix Z, and let Y=BX+b, where B is kXm and b is kXl. The mean of Y is, by (l), py = 8px +b, and the covariance matrix of Y is = E[ ( BX+ b- ( Bpx + b))( BX -t- b - ( Bpx + b))'] = BE[(X-Cx)(X-Px)'] B' .= BC,B'. In order to define the multivariate normal distribution we will use the following result. THEOREM 1.2.2. If X is an m X I random vector then its distribution is uniquely determined by the distributions of linear functions a'X, for every aERrn. ProoJ The characteristic function of a'X is +(r,a)= E[e"a'x] so that which, considered as a function of a, is the characteristic function of X (i.e, the joint characteristic function of the components of X). The required result then follows by invoking the fact that a distribution in R"' is uniquely determined by its characteristic function [see, e.g., Cramtr (1946), Section 10.6, or Feller (197 I), Section XV.71