1.4 Convergence in rth mean A stronger condition than convergence in probability is mean square convergence Definition Let (brl be a sequence of real-valued random variables such that for some r>0 r<∞. If there exists a real number b such that e(|br-b)→0asT→∞, then br converge in the rth mean to b, written as br - b The most commonly encountered situation is that of in which r=2. in which case convergence is said to occur in quadratic mean, denoted br =b, or con- vergence in mean square, denoted br - b Proposition(Generalized Chebyshev inequality) Let Z be a random variable such that EZ< oo, r>0. Then for every E>0, Pr(z>e EZr When r= l we have markov's inequality and when r=2 we have the familiar Chebyshev inequality Theorem. If br -b for some r>0 then br -P,b Proof: Since e(b-b)→0asT→∞,E(|br-b)< oo for all T sufficiently large It follows from the Generalized Chebyshev inequality that, for every E>0 Pr(s: or(s)-bl>e)<Elbr-bI- ence br(s)-b<e)≥1-→1asT→∞, since br"b.It follows that Without further conditions, no necessary relationship holds between conver- gence in the rth mean and almost sure convergence
1.4 Convergence in rth mean A stronger condition than convecrgence in probability is mean square convergence. Definition: Let {bT } be a sequence of real-valued random variables such that for some r > 0, E|bT | r < ∞. If there exists a real number b such that E(|bT −b| r ) → 0 as T → ∞, then bT converge in the rth mean to b, written as bT r.m. −→ b. The most commonly encountered situation is that of in which r = 2, in which case convergence is said to occur in quadratic mean, denoted bT q.m. −→ b, or convergence in mean square, denoted bT m.s −→ b. Proposition (Generalized Chebyshev inequality): Let Z be a random variable such that E|Z| r < ∞, r > 0. Then for every ε > 0, Pr(|Z| > ε) ≤ E|Z| r ε r . When r = 1 we have Markov’s inequality and when r = 2 we have the familiar Chebyshev inequality. Theorem: If bT r.m. −→ b for some r > 0, then bT p −→ b. Proof: Since E(|bT − b| r ) → 0 as T → ∞, E(|bT − b| r ) < ∞ for all T sufficiently large. It follows from the Generalized Chebyshev inequality that, for every ε > 0, Pr(s : |bT (s) − b| > ε) ≤ E|bT − b| r ε r . Hence Pr(s : |bT (s) − b| < ε) ≥ 1 − E|bT −b| r ε r → 1 as T → ∞, since bT r.m. −→ b. It follows that bT p −→ b. Without further conditions, no necessary relationship holds between convergence in the rth mean and almost sure convergence. 6
2 Convergence in Distribution The most fundamental concept is that of convergence in distribution Let ibr be a sequence of scalar random variables with cumulative distribution function Fr). If Fr(a)-F(z) as T-o for every continuity point z, where F is the(cumulative) distribution of a random variable Z, then br converge in distribution to the random variable Z, written as br -Z When br -Z, we also say that b converges in law to Z, written as br -Z, or that br is asymptotically distributed as F, denoted as br F Then F is called the limiting distribution of br Example Let (Zt be ii d. random variables with mean u and finite variance 02>0 Define bT 2-E(21)T-1∑1(z1-p)√T(21-p) (Var(zr))1/2 hen by the Lindeberg-Levy central limit theorem, bT A N(O, 1). See the plot of Hamilton p 185 The above definition are unchanged if the scalar br is replaced with an(kx 1) vector br. A simple way to verify convergence in distribution of a vector is the Proposition( Cramer-Wold device) Let br be a sequence of random k x l vector and suppose that for every real k×1 vector X( such that a'入=1?, the scalar Abr a'z where z is a k×1 vector with joint(emulative) distribution function F. Then the limitting distri bution function of bt exists and equals to F. O2(1)
2 Convergence in Distribution The most fundamental concept is that of convergence in distribution. Definition: Let {bT } be a sequence of scalar random variables with cumulative distribution function {FT }. If FT (z) → F(z) as T → ∞ for every continuity point z, where F is the (cumulative) distribution of a random variable Z, then bT converge in distribution to the random variable Z, written as bT d −→ Z. When bT d −→ Z, we also say that bT converges in law to Z, written as bT L−→ Z, or that bT is asymptotically distributed as F, denoted as bT A∼ F. Then F is called the limiting distribution of bT . Example: Let {Zt} be i.i.d. random variables with mean µ and finite variance σ 2 > 0. Define bT ≡ Z¯ T − E(Z¯ T ) (V ar(Z¯ T ))1/2 = T −1/2 PT t=1(Zt − µ) σ = √ T(Z¯ t − µ) σ . Then by the Lindeberg-Levy central limit theorem, bT A∼ N(0, 1). See the plot of Hamilton p.185. The above definition are unchanged if the scalar bT is replaced with an (k×1) vector bT. A simple way to verify convergence in distribution of a vector is the following. Proposition (Crame´r-Wold device): Let {bT} be a sequence of random k × 1 vector and suppose that for every real k × 1 vector λ (such that λ 0λ = 1 ?), the scalar λ 0bT A∼ λ 0 z where z is a k × 1 vector with joint (cmulative) distribution function F. Then the limitting distribution function of bT exists and equals to F. Lemma: If bT L−→ Z, then bT = Op(1). 7
Lemma(Product rule Recall that if AT =op(1)and br=Op(1), then Arbr=0p(1). Hence, if AT+0 and br -+Z, then Lemma(Asymptotic equivalence Let ar) and br be two sequence of random vectors. If ar- br -+0 and Z The results is helpful in situation in which we wish to find the asymptotic distribution of ar but cannot do so directly. Often, however, it is easy to find a br that has a known asymptotic distribution and that satisfies at-br-0 This Lemma then ensures that ar has the same limiting distribution as br and we say that ar is"asymptotically equivalent"to bT Lemma given g:Rk→R(k,l∈) and any sequence of random k× I vector br such that bm_L *Z, where z is k x 1, if g is continous(not dependent on T)at z, then b g(z) Suppose that Xr- N(O, 1)Then the square of Xr asymptotically behaves as the square of a N(O, 1)variables: XF-x( Let xr be a sequence of random(n x 1) vector with xT -+c, and let yrI ector with y constructed from the sum xr+yr converges in distribution to c+y and the sequence constructed from the product xryr converges in distribution to cy Ex Let iXr) be a sequence of random(m x n)matrixwith XT C, and let Then the limitting distribution of Xryr is the same as that of Cy; that is N(Cu, CQ2C/) (C1
Lemma (Product rule): Recall that if AT = op(1) and bT = Op(1), then AT bT = op(1). Hence, if AT p −→ 0 and bT d −→ Z, then AT bT p −→ 0. Lemma (Asymptotic equivalence): Let {aT } and {bT } be two sequence of random vectors. If aT − bT p −→ 0 and bT d −→ Z, then aT d −→ Z. The results is helpful in situation in which we wish to find the asymptotic distribution of aT but cannot do so directly. Often, however, it is easy to find a bT that has a known asymptotic distribution and that satisfies aT − bT p −→ 0. This Lemma then ensures that aT has the same limiting distribution as bT and we say that aT is ”asymptotically equivalent” to bT . Lemma: Given g : Rk → Rl (k, l ∈ N ) and any sequence of random k × 1 vector bT such that bT L−→ z, where z is k ×1, if g is continous (not dependent on T) at z, then g(bT) L−→ g(z). Example: Suppose that XT L−→ N(0, 1) Then the square of XT asymptotically behaves as the square of a N(0, 1) variables: X2 T L−→ χ 2 (1). Lemma: Let {xT } be a sequence of random (n × 1) vector with xT p −→ c , and let {yT } be a sequence of random (n × 1) vector with yT L−→ y. Then the sequence constructed from the sum {xT + yT } converges in distribution to c + y and the sequence constructed from the product {x 0 T yT } converges in distribution to c 0y. Example: Let {XT } be a sequence of random (m × n) matrixwith XT p −→ C , and let {yT } be a sequence of random (n × 1) vector with yT L−→ y ∼ N(µ, Ω). Then the limitting distribution of XT yT is the same as that of Cy; that is XT yT L−→ N(Cµ, CΩC0 ). Lemma (Cramer δ ): 8
Let xr be a sequence of random(n x 1) vector such that for some b>0. If g(x)is a real-valued function with gradient g' (a)(ag then T(g(xr)-g(a))-g(a)x Example Let Y, Y2, ,Yr be an ii d sample of size T deawn from a distribution with mean u+0 and variance o. Consider the distribution of the reciprocal of the sample mean, ST=1/ YT, where Yr=(1/T)2Y. We know from the CLT that VT(Yr -u)Y, where Y NN(0, o2). Also, g(y)=1/y is continous at y=u Let g(u(ag/ayly=u)=(1/u2). Then VTIST-(1/)1-g(u)r in other word, VTIST-(1/p)I-N(0,02/u)
Let {xT } be a sequence of random (n × 1) vector such that T b (xT − a) L−→ x for some b > 0. If g(x) is a real-valued function with gradient g 0 (a)(= ∂g ∂x0 x=c ), then T b (g(xT) − g(a)) L−→ g 0 (a)x. Example: Let {Y1, Y2, ..., YT } be an i.i.d. sample of size T deawn from a distribution with mean µ 6= 0 and variance σ 2 . Consider the distribution of the reciprocal of the sample mean, ST = 1/Y¯ T , where Y¯ T = (1/T) PT t=1 Yt . We know from the CLT that √ T(Y¯ T − µ) L−→ Y , where Y ∼ N(0, σ 2 ). Also, g(y) = 1/y is continous at y = µ. Let g 0 (u)(= ∂g/∂y|y = µ) = (−1/µ2 ). Then √ T[ST − (1/µ)] L−→ g 0 (µ)Y ; in other word, √ T[ST − (1/µ)] L−→ N(0, σ 2/µ4 ). 9