202.Random vectors“eigenvalues."Accordingly,wemay always“normalize”anyxwith >0bylettingZ = D-1/2H(x - μ),which represents athree-stage transformation of x in which wefirst relocateby μ, then rotate by H', and, finally, rescale by >-1/2 independently alongeach axis.We find, of course, thatE z =0 and var z=I.The linear transformation z = -1/2(x - μ) also satisfies E z = 0 andvar z = I.When the vector x E Rn is partitioned as x = (y',z')', where y e R",z E Rs,and n = r + s, it is useful to define the covariance between twovectors. The covariance matrix between y and z is, by definition,cov(y,z) = (cov(yi, zi)) E RrThen, we may writevar(y)cov(y,z)var(x) :var(z)cov(z,y)Sometimes, expected value of y is easier to calculate by conditioning onanother random vector z.In this regard,the conditional mean theorem andconditional variance theorem are stated.A general proof of the conditionalmean theorem can be found in Billingsley (1995, Section 34).Proposition 2.7 (Conditional mean formula) E[E(yz)]= E y.An immediate consequenceis the conditional variance formula.Proposition 2.8 (Conditional variance formula)var y = E[var(y|z)] + var[E(y|z)],Example 2.2Define a group variableI such thatP(I=1) = 1-e,P(I =2) = .Conditionally onI,assume=~(μ1,),=2~N(μ2,).Thenf() = (1 -e)(2元)-1/2101+e(2元)-1/2 1exp20202
20 2. Random vectors “eigenvalues.” Accordingly, we may always “normalize” any x with Σ > 0 by letting z = D−1/2H (x − µ), which represents a three-stage transformation of x in which we first relocate by µ, then rotate by H , and, finally, rescale by λ−1/2 i independently along each axis. We find, of course, that E z = 0 and var z = I. The linear transformation z = Σ−1/2(x − µ) also satisfies E z = 0 and var z = I. When the vector x ∈ Rn is partitioned as x = (y , z ) , where y ∈ Rr, z ∈ Rs, and n = r + s, it is useful to define the covariance between two vectors. The covariance matrix between y and z is, by definition, cov(y, z) = (cov(yi, zj )) ∈ Rr s. Then, we may write var(x) = var(y) cov(y, z) cov(z, y) var(z) . Sometimes, expected value of y is easier to calculate by conditioning on another random vector z. In this regard, the conditional mean theorem and conditional variance theorem are stated. A general proof of the conditional mean theorem can be found in Billingsley (1995, Section 34). Proposition 2.7 (Conditional mean formula) E[E(y|z)] = E y. An immediate consequence is the conditional variance formula. Proposition 2.8 (Conditional variance formula) var y = E[var(y|z)] + var[E(y|z)]. Example 2.2 Define a group variable I such that P(I = 1) = 1 − , P(I = 2) = . Conditionally on I, assume x|I = 1 ∼ N(µ1, σ2 1), x|I = 2 ∼ N(µ2, σ2 2). Then fx(x) = (1 − )(2π) −1/2 1 σ1 exp − 1 2σ2 1 (x − µ1) 2 +(2π) −1/2 1 σ2 exp − 1 2σ2 2 (x − µ2) 2
212.7.Characteristic functionsis a mirture or e-contaminated normal density. It follows from theconstruction of thatE=E[E(|)]=(1-e)μ1+μ2=μ,var =E[var()] + var[E(|)]=(1-e)+2+(1-)(μ1-μ)?+e(μ2-μ)22.7 Characteristic functionsWe require only the most basic facts about characteristic functionsDefinition 2.5 The characteristic function of x is the function c:Rn-C defined byc(t) = cx(t) = E eitx.Note:1. c(0) = 1, c(t)/ ≤ 1 and c(-t) = c(t)2. c(t) is uniformly continuous:E (ei(t-s)×-1)[c(t) -c(s)/ =≤ E[e(t-s)×-1]Since ei(t-s)x - 1≤ 2, continuity fllows by the D.C.T. Uniformityholds since ei(t-s)x - 1 depends only on t -s.The main result is perhaps the "inversion formula" proven in Appendix A:1e-it*c(t)e-tt/2N°dtdx,Px(a, b) = limN=00 (2元)nJ(a,b) JRVa,b such that Px(o(a,b)) = 0. Thus, the C.E.T. may be appliedimmediately to produce the technically equivalent:Proposition 2.9 (Uniqueness) x = y cx(t) = cy(t), Vt E R".Now if we consider the linear functionals of x: t'x with t e Rn, it is clearthat ct'x(s) = Cx(st), Vs e R, t e Rn, so that the characteristic functionof x determines all those of t'x, t E R" and vice versa.Let Sn-1 = (s ERn : |sl = 1] be the“unit sphere" in IR", and we haveProposition 2.10 (Cramer-Wold) x兰y t'x兰 t'y, Vt e sn-1
2.7. Characteristic functions 21 is a mixture or -contaminated normal density. It follows from the construction of x that E x = E[E(x|I)] = (1 − )µ1 + µ2 ≡ µ, var x = E[var(x|I)] + var[E(x|I)] = (1 − )σ2 1 + σ2 2 + (1 − )(µ1 − µ) 2 + (µ2 − µ) 2. 2.7 Characteristic functions We require only the most basic facts about characteristic functions. Definition 2.5 The characteristic function of x is the function c : Rn → C defined by c(t) = cx(t) = E eit x. Note: 1. c(0) = 1, |c(t)| ≤ 1 and c(−t) = c(t). 2. c(t) is uniformly continuous: |c(t) − c(s)| = E ei(t−s) x − 1 eis x ≤ E ei(t−s) x − 1 . Since ei(t−s) x − 1 ≤ 2, continuity follows by the D.C.T. Uniformity holds since ei(t−s) x − 1 depends only on t − s. The main result is perhaps the “inversion formula” proven in Appendix A: Px(a, b] = lim N→∞ 1 (2π)n (a,b] Rn e−it xc(t)e−t t/2N2 dtdx, ∀a, b such that Px (∂(a, b]) = 0. Thus, the C.E.T. may be applied immediately to produce the technically equivalent: Proposition 2.9 (Uniqueness) x d = y ⇐⇒ cx(t) = cy(t), ∀t ∈ Rn. Now if we consider the linear functionals of x: t x with t ∈ Rn, it is clear that ctx(s) = cx(st), ∀s ∈ R, t ∈ Rn, so that the characteristic function of x determines all those of t x, t ∈ Rn and vice versa. Let Sn−1 = {s ∈ Rn : |s| = 1} be the “unit sphere” in Rn, and we have Proposition 2.10 (Cram´er-Wold) x d = y ⇐⇒ t x d = t y, ∀t ∈ Sn−1
222.RandomvectorsProof. Since ct'x(s) = cx(st), Vs E R, t e Rn, it is clear thatx=ytx=ty.vteRnSince t'x=|tl (青x,Vt o, it is also clear thatt'x=t'y, VteRn tx=t'y, Vte sn-1口By this result, it is clear that one may reduce a good many issues concerningrandom vectorsto theunivariatelevel.In thespecific matterof computation,the reader should know that inthe special case of a univariate random variable X:If E e+sx < o for any 8 > O, the Laplace transform of X is determined in the strip [Re(z)l ≤ as the (absolutely convergent)power seriesLx(2) = E X"z" /n!,n=0and since such a power series is completely determined byits coefficients, we find that one may legitimately obtain thecharacteristicfunctionCx(t)=Lx(it), VtER,by merelyobserving the coefficients inan expansion of themoment-generating function since they are necessarily the sameas those of the Laplace transform:mx(t)=Lx(t), Vt/≤s.Example 2.3 Suppose f(s) = (2)-1/2e-s/2. One easily computes themoment-generatingfunction(m.g.f.),findingm(t)=et"/2which has the obvious erpansion for every t, whereuponc (t)=e-t/2(2.1)2.8Absolutely continuous distributionsLebesgue measure, A, is the extension to all borel sets of our natural senseof volume measure in Rn.Thus,we define入(a,bl=(bi-ai),Va<b in R
22 2. Random vectors Proof. Since ctx(s) = cx(st), ∀s ∈ R, t ∈ Rn, it is clear that x d = y ⇐⇒ t x d = t y, ∀t ∈ Rn. Since t x = |t| t |t| x, ∀t = 0, it is also clear that t x d = t y, ∀t ∈ Rn ⇐⇒ t x d = t y, ∀t ∈ Sn−1. ✷ By this result, it is clear that one may reduce a good many issues concerning random vectors to the univariate level. In the specific matter of computation, the reader should know that in the special case of a univariate random variable X: If E e±δX < ∞ for any δ > 0, the Laplace transform of X is determined in the strip |Re(z)| ≤ δ as the (absolutely convergent) power series LX(z) = ∞ n=0 E Xnzn/n!, and since such a power series is completely determined by its coefficients, we find that one may legitimately obtain the characteristic function cX(t) = LX(it), ∀t ∈ R, by merely observing the coefficients in an expansion of the moment-generating function since they are necessarily the same as those of the Laplace transform: mX(t) = LX(t), ∀|t| ≤ δ. Example 2.3 Suppose fz(s) = (2π)−1/2e−s2/2. One easily computes the moment-generating function (m.g.f.), finding mz(t) = et2/2, which has the obvious expansion for every t, whereupon cz(t) = e−t2/2. (2.1) 2.8 Absolutely continuous distributions Lebesgue measure, λ, is the extension to all borel sets of our natural sense of volume measure in Rn. Thus, we define λ(a, b] = n i=1 (bi − ai), ∀a < b in Rn
232.8.Absolutelycontinuousdistributions入(G) =(ai,bil,VG=(a,b]in gn,i=1i=1and入(A) = inf_ >(G), VA in Bn.ACGAs before,the C.E.T.guarantees that is a measure on Bn.We will oftendenote Lebesgue measure explicitly as volume: (A) = vol(A). Incidentally,something is said to happen “almost everywhere"(a.e.) if the set where itfails to happen has zero volume.Now,thegeneral conception of a random vector continuously distributedin space is that the probabilities of events will depend continuously on thevolume of the events.Thus,Definition 2.6 x is absolutely continuous, denoted x < >, iffVe>0. >0suchthatvol(A)<P(xEA)<EBut, in that case,Proposition 2.11 x≤ > vol(A)= 0 P(x EA) = 0.Proof.Assumex ≤).If vol(A)=ObutP(x EA)±0 wemaytakeE = P(x E A)/2 to find the contradiction that P(x E A) < e. Conversely.suppose vol(A)=0 P(x E A)=0 but thatx≤ 入.Then,Feo >0such that Vn, An with vol(An) <1/2n but P(x E An)≥ Eo.LettingA- limAn=n=i Uk≥n Ak, sinceUk≥nAr,is amonotone sequencewefind口the contradiction that vol(A)=0 but P(x E A) ≥ Eo.Thus, a distribution which depends continuously on volume satisfies therelatively simple criterionx≤>Vol(A)=0P(xEA)=0However, it is on this particular criterion, by the theorem of Radon-Nikodym, that absolute continuity is characterized finally in terms ofdensities:Proposition2.12(Radon-Nikodym)xisabsolutelycontinuousthere is a (a.e.-unique)probability density function (p.d.f.) f:Rn→[O,oo)suchthatP(xEA)= f(t)dt, VAEB"But since the p.d.f. then determines such a distribution completely, we maysimply writex~f or x~ fx.It is, of course, by the extension process that defines expectation (instages)thatautomaticallyE g(x) = / g(t)f(t)dt, Vg measurable
2.8. Absolutely continuous distributions 23 λ(G) = ∞ i=1 λ(ai, bi], ∀G = ∞ i=1 (ai, bi] in Gn, and λ(A) = inf A⊂G λ(G), ∀A in Bn. As before, the C.E.T. guarantees that λ is a measure on Bn. We will often denote Lebesgue measure explicitly as volume: λ(A) = vol(A). Incidentally, something is said to happen “almost everywhere” (a.e.) if the set where it fails to happen has zero volume. Now, the general conception of a random vector continuously distributed in space is that the probabilities of events will depend continuously on the volume of the events. Thus, Definition 2.6 x is absolutely continuous, denoted x λ, iff ∀ > 0, ∃δ > 0 such that vol(A) < δ =⇒ P(x ∈ A) < . But, in that case, Proposition 2.11 x λ ⇐⇒ vol(A)=0=⇒ P(x ∈ A)=0. Proof. Assume x λ. If vol(A) = 0 but P(x ∈ A) = 0 we may take = P(x ∈ A)/2 to find the contradiction that P(x ∈ A) < . Conversely, suppose vol(A)=0=⇒ P(x ∈ A) = 0 but that x λ. Then, ∃0 > 0 such that ∀n, ∃An with vol(An) < 1/2n but P(x ∈ An) ≥ 0. Letting A = limAn = ∩∞ n=1 ∪k≥n Ak, since ∪k≥nAk is a monotone sequence we find the contradiction that vol(A) = 0 but P(x ∈ A) ≥ 0. ✷ Thus, a distribution which depends continuously on volume satisfies the relatively simple criterion x λ ⇐⇒ vol(A)=0=⇒ P(x ∈ A)=0. However, it is on this particular criterion, by the theorem of RadonNikodym, that absolute continuity is characterized finally in terms of densities: Proposition 2.12 (Radon-Nikodym) x is absolutely continuous ⇐⇒ there is a (a.e.-unique) probability density function (p.d.f.) f : Rn → [0,∞) such that P(x ∈ A) = A f(t)dt, ∀A ∈ Bn. But since the p.d.f. then determines such a distribution completely, we may simply write x ∼ f or x ∼ fx. It is, of course, by the extension process that defines expectation (in stages) that automatically E g(x) = g(t)f(t)dt, ∀g measurable
242.Randomvectorssuch that E g(x) is defined.Nowinparticular,the distributionfunctionmayitself beexpressed asf(s)ds, Vt e Rn.F(t)=P(x≤t) =In practice, we will often be ableto invoke the fundamental theoremsof calculus to obtain an explicit representation of the p.d.f.by simplydifferentiating the d.f.:1.Bythefirst fundamental theorem of calculus,f(t)=o"F(t)/ot1...Otnat everyt where f(t)is continuous.2. Also, by the second fundamental theorem, iff(t)=onF(t)/ot1...otnexists and is continuous (a.e.) on some rectangle I, thenP(xE A) = / f(t)dt, VACI.Finally,in relation to the inversion formula, when the characteristic func-tion c(t) is absolutely integrable, i.e., Je|c(t)|dt < oo, the correspondingdistribution function is absolutely continuous with p.d.f. (v.Appendix A):1e-it'sc(t)dt.(2.2)f(s) =(2元)nJRm2.9UniformdistributionsThemost fundamental absolutely continuous distribution would,of coursebe conveyed by volume measure itself. Consider any event C for which0 < vol(C) < 00.Definition 2.7 x is uniformly distributed on C, denoted x ~ unif(C), iffP(x E A) = vol(AC)/vol(C), VA E Bn.If vol(oC) - 0, as is often the case, we may just as well include as excludeit, so if x ~ unif(C), y ~ unif(C) and z ~ unif(C), then x,y, and z areequidistributed:xy兰z.Now, for x ~ unif(C)we may immediately reexpress each probability asan integral:P(xE A) = / k.Ic(t)dt, VAe Bn
24 2. Random vectors such that E g(x) is defined. Now in particular, the distribution function may itself be expressed as F(t) = P(x ≤ t) = t −∞ f(s)ds, ∀t ∈ Rn. In practice, we will often be able to invoke the fundamental theorems of calculus to obtain an explicit representation of the p.d.f. by simply differentiating the d.f.: 1. By the first fundamental theorem of calculus, f(t) = ∂nF(t)/∂t1 ··· ∂tn at every t where f(t) is continuous. 2. Also, by the second fundamental theorem, if f(t) = ∂nF(t)/∂t1 ··· ∂tn exists and is continuous (a.e.) on some rectangle I, then P(x ∈ A) = A f(t)dt, ∀A ⊂ I. Finally, in relation to the inversion formula, when the characteristic function c(t) is absolutely integrable, i.e., Rn |c(t)|dt < ∞, the corresponding distribution function is absolutely continuous with p.d.f. (v. Appendix A): f(s) = 1 (2π)n Rn e−it s c(t)dt. (2.2) 2.9 Uniform distributions The most fundamental absolutely continuous distribution would, of course, be conveyed by volume measure itself. Consider any event C for which 0 < vol(C) < ∞. Definition 2.7 x is uniformly distributed on C, denoted x ∼ unif(C), iff P(x ∈ A) = vol(AC)/vol(C), ∀A ∈ Bn. If vol(∂C) = 0, as is often the case, we may just as well include as exclude it, so if x ∼ unif(C), y ∼ unif(C◦) and z ∼ unif(C¯), then x, y, and z are equidistributed: x d = y d = z. Now, for x ∼ unif(C) we may immediately reexpress each probability as an integral: P(x ∈ A) = A k · IC (t)dt, ∀A ∈ Bn