In words, the random walk has independent increments and those increments have a limiting normal distribution, with a variance reflecting the size of the interval (b-a) over which the increment is taken It should not be surprising, therefore, that the limit of the sequence of function r( constructed from the random walk preserves these properties in the limit n an appropriate sense. In fact, these properties form the basis of the definition of the Wiener process Definition Let(S, F, p) be a complete probability space. Then W: S[0, 1]Ris a standard Wiener process if each of r E [ 0, 1],w(, r) is F-measurable, and in addition (1). The process starts at zero: PW(, 0)=0=1 (2). The increments are independent:if0≤ao≤a1….≤ak≤1,then w(, ai)-W(, ai-1)is independent of w(, ai)-w(, ai-1),j=1,.,k ,j+ (3). The increments are normally distributed: For 0<a<b<1, the increment r(, b)-w(, a) is distribut In the definition, we have written W(, a) explicitness; whenever convenient however, we will write W(a) instead of w(, a), analogous to our notation else- where. The Wiener process is also called a brownian motion in honor of nor bert Wiener(1924), who provided the mathematical foundation for the theory of random motions observed and described by nineteenth century botanist Robert Brown in 1827 2.2 Functional central limit Theorems We earlier defined convergence in law for random variables, and now we need to extend the definition to cover random functions. Let s( represent a continuous- time stochastic process with S(r)representing its value at some date r for r E 0,1. Suppose, further, that any given realization, S() is a continuous function of r with probability 1. For Sr(T_1 a sequence of such continuous function
In words, the random walk has independent increments and those increments have a limiting normal distribution, with a variance reflecting the size of the interval (b − a) over which the increment is taken. It should not be surprising, therefore, that the limit of the sequence of function WT (·) constructed from the random walk preserves these properties in the limit in an appropriate sense. In fact, these properties form the basis of the definition of the Wiener process. Definition: Let (S, F,P) be a complete probability space. Then W : S × [0, 1] → R 1 is a standard Wiener process if each of r ∈ [0, 1], W(·,r) is F-measurable, and in addition, (1). The process starts at zero: P[W(·, 0) = 0] = 1. (2). The increments are independent: if 0 ≤ a0 ≤ a1... ≤ ak ≤ 1, then W(·,ai) − W(·,ai−1) is independent of W(·,aj ) − W(·,aj−1), j = 1,..,k, j 6= i for all i = 1,...,k. (3). The increments are normally distributed: For 0 ≤ a ≤ b ≤ 1, the increment W(·,b) − W(·,a) is distributed as N(0,b − a). In the definition, we have written W(·,a) explicitness; whenever convenient, however, we will write W(a) instead of W(·,a), analogous to our notation elsewhere. The Wiener process is also called a Brownian motion in honor of Norbert Wiener (1924), who provided the mathematical foundation for the theory of random motions observed and described by nineteenth century botanist Robert Brown in 1827. 2.2 Functional Central Limit Theorems We earlier defined convergence in law for random variables, and now we need to extend the definition to cover random functions. Let S(·) represent a continuoustime stochastic process with S(r) representing its value at some date r for r ∈ [0, 1]. Suppose, further, that any given realization, S(·) is a continuous function of r with probability 1. For {ST (·)} ∞ T =1 a sequence of such continuous function, 6
we say that the sequence of probability measure induced by ST(JT_I weakly converge to the probability measure induced by S(), denoted by Sr(=>SO if all of the following hold: (1). For any finite collection of k particular dates, 0≤1<r2<….<Tk≤1, the sequence of k-dimensional random vector yrta converges in distribution he vector y, wh ere Sr(ry S( S N/2 S(rk) (2). For each E>0, the probability that Sr(ri) differs from Sr(r2) for any dates rI and r2 within 8 of each other goes to zero uniformly in T as 8-0 (3).P{|Sr(0)|>A→0 uniformly in T as A→o This definition applies to sequences of continuous functions, though the func- tion in(9)is a discontinues step function. Fortunately, the discontinuities occur at a countable set of points. Formally, Sr( can be replaced with a similar con- tinuous function, interpolating between the steps The Function Central Limit Theorem(FCLT) provides conditions under which converges to the standard Wiener process, W. The simplest FCLT is a gen- eralization of the Lindeberg-levy clt, known as Donsker's theorem Theorem:(Donsker) Let Et be a sequence of i i d. random scalars with mean zero. If a= Var(Et)< oo, ≠0, then w Because pointwise convergence in distribution Wr( r)-w(, r)for each rE0, 1 is necessary(but not sufficient) for weak convergence Wr=w, the Lindeberg- Levy CLT(Wr(, 1)-w(, 1) follows immediately from Donsker's m is strictly stronger than both use identical assumptions, but Donsker's theorem delivers a much stronger
we say that the sequence of probability measure induced by {ST (·)} ∞ T =1 weakly converge to the probability measure induced by S(·), denoted by ST (·) =⇒ S(·) if all of the following hold: (1). For any finite collection of k particular dates, 0 ≤ r1 < r2 < ... < rk ≤ 1, the sequence of k-dimensional random vector {yT } ∞ T =1 converges in distribution to the vector y, where yT ≡ ST (r1) ST (r2) . . . ST (rk) y ≡ S(r1) S(r2) . . . S(rk) ; (2). For each ǫ > 0, the probability that ST (r1) differs from ST (r2) for any dates r1 and r2 within δ of each other goes to zero uniformly in T as δ → 0; (3). P{|ST (0)| > λ} → 0 uniformly in T as λ → ∞. This definition applies to sequences of continuous functions, though the function in (9) is a discontinues step function. Fortunately, the discountinuities occur at a countable set of points. Formally, ST (·) can be replaced with a similar continuous function, interpolating between the steps. The Function Central Limit Theorem (FCLT) provides conditions under which WT converges to the standard Wiener process, W. The simplest FCLT is a generalization of the Lindeberg-L´evy CLT, known as Donsker’s theorem. Theorem: (Donsker) Let εt be a sequence of i.i.d. random scalars with mean zero. If σ 2 ≡ V ar(εt) < ∞, σ 2 6= 0, then WT =⇒ W. Because pointwise convergence in distribution WT (·,r) L−→ W(·,r) for each r ∈ [0, 1] is necessary (but not sufficient) for weak convergence WT =⇒ W, the Lindeberg-L´evy CLT (WT (·, 1) L−→ W(·, 1)) follows immediately from Donsker’s theorem. Donsker’s theorem is strictly stronger than Lindeberg-L´evy however, as both use identical assumptions, but Donsker’s theorem delivers a much stronger 7
conclusion. Donsker called his result an invariance principle. Consequently, the FCLT is often referred as an invariance principle So far, we have assumed that the sequence Et used to construct Wr is i.i.d Nevertheless, just as we can obtain central limit theorems when Et is not necessary id. In fact, versions of the FCLT hold for each CLt previous given in Chapter 4 Theorem: Continuous Mapping Theorem If Sr(=>S( and g() is a continuous functional, then g(Sr()=>9(SO) In the above theorem, continuity of a functional g( means that for any s>0 there exist ad>0 such that if h(r)and k(r) are any continuous bounded functions on [ 0, 1], h:[ 0, 1-R and k: [0, 1]R, such that h(r)-k(r)l< 5 for all r 0, 1, then Ig(h())-g(k()<s
conclusion. Donsker called his result an invariance principle. Consequently, the FCLT is often referred as an invariance principle. So far, we have assumed that the sequence εt used to construct WT is i.i.d.. Nevertheless, just as we can obtain central limit theorems when εt is not necessary i.i.d.. In fact, versions of the FCLT hold for each CLT previous given in Chapter 4. Theorem: Continuous Mapping Theorem: If ST (·) =⇒ S(·) and g(·) is a continuous functional, then g(ST (·)) =⇒ g(S(·)). In the above theorem, continuity of a functional g(·) means that for any ς > 0, there exist a δ > 0 such that if h(r) and k(r) are any continuous bounded functions on [0, 1], h : [0, 1] → R 1 and k : [0, 1] → R 1 , such that |h(r) − k(r)| < δ for all r ∈ [0, 1], then |g(h(·)) − g(k(·))| < ς. 8
3 Regression with a Unit Root 3.1 Dickey-Fuller Test, Yt is AR(1) process Consider the following simple AR(1)process with a unit root Yt= BYt-1+ut B=1 where Yo =0 and ut is i.i.d. with mean zero and variance o We consider the three least square regression Yt= BYt-1+ut, and Yt=a+ BYt-1+ot+it, nd(a, B, 8)are the conventional least-squares regression coef- ficients. Dickey and Fuller(1979) were concerned with the limiting distribution of the regression in(13),(14), and(15)(B, (&, B), and(a, B, 8)) under the null hypothesis that the data are generated by(11) and(12 We first provide the following asymptotic results of the sample moments which are useful to derive the asymptotics of the Ols estimator Let ut be a i.i.d. sequence with mean zero and variance aand yt ut for t=l (16) with yo=0. Then (a)T-i →σW ∑Y21→→2J0u(r)]2dr
3 Regression with a Unit Root 3.1 Dickey-Fuller Test, Yt is AR(1) process Consider the following simple AR(1) process with a unit root, Yt = βYt−1 + ut , (11) β = 1 (12) where Y0 = 0 and ut is i.i.d. with mean zero and variance σ 2 . We consider the three least square regression Yt = βY˘ t−1 + ˘ut , (13) Yt = ˆα + βYˆ t−1 + ˆut , (14) and Yt = ˜α + βY˜ t−1 + ˜δt + ˜ut , (15) where β, ˘ (ˆα, βˆ), and (˜α, β, ˜ ˜δ) are the conventional least-squares regression coef- ficients. Dickey and Fuller (1979) were concerned with the limiting distribution of the regression in (13), (14), and (15) (β, ˘ (ˆα, βˆ), and (˜α, β, ˜ ˜δ)) under the null hypothesis that the data are generated by (11) and (12). We first provide the following asymptotic results of the sample moments which are useful to derive the asymptotics of the OLS estimator. Lemma: Let ut be a i.i.d. sequence with mean zero and variance σ 2 and yt = u1 + u2 + ... + ut for t = 1, 2,...,T, (16) with y0 = 0. Then (a) T − 1 2 P T t=1 ut L−→ σW(1), (b) T −2 P T t=1 Y 2 t−1 L−→ σ 2 R 1 0 [W(r)]2dr, 9
Jo w(r) t=1 (d)T-∑Y-1lo2(1)2-1 (e)T-i[w(1)-o w(r)drI (r∑=1nW() (g)T3∑tY2 ∫r{W(r)dr a joint weak convergence for the sample moments given above to their respective limits is easily established and will be used below Proof: (a) is a straightforward results of Donsker's Theorem with r= 1 (b) First rewrite T->Y2, in term of Wr(rt-1)=T-/Y-10=T-1/2>us/o Wr(r)is constant for(t-l)/T <r</T, we have 7 'vrIvri-1)2. Because where rt-1=(t-1)/T, so that T-2Y21=02T->(r ∑Wr(n-12=∑/Wr()3tr t=1 Wr(r)dr The continuous mapping theorem applies to h(Wr)=o wr(r)2dr. It follows that hi(Wr)→h(), so that t-2∑1Y21→→o2/bw(n)ab, as claimed (c). The proof of item(c)is analogous to that of(b). First rewrite T-3/2>tYt n term of Wr(rt-1=T-12Yi-1/o=T-1/2e us/ o, where rt-1=(t-1)/T so that T-3/2yT ta wr(rt-1). Because Wr(r) is constant for
(c) T − 3 2 P T t=1 Yt−1 L−→ σ R 1 0 W(r)dr, (d) T −1 P T t Yt−1ut L−→ 1 2 σ 2 [W(1)2 − 1], (e) T − 3 2 P T t=1 tut L−→ σ[W(1) − R 1 0 W(r)dr], (f) T − 5 2 P T t=1 tYt−1 L−→ σ R 1 0 rW(r)dr, (g) T −3 P T t=1 tY 2 t−1 L−→ σ 2 R 1 0 r[W(r)]2dr. A joint weak convergence for the sample moments given above to their respective limits is easily established and will be used below. Proof: (a) is a straightforward results of Donsker’s Theorem with r = 1. (b) First rewrite T −2 P T t=1 Y 2 t−1 in term of WT (rt−1) ≡ T −1/2Yt−1/σ = T −1/2 t P−1 s=1 us/σ, where rt−1 = (t − 1)/T, so that T −2 P T t=1 Y 2 t−1 = σ 2T −1 P T t=1 WT (rt−1) 2 . Because WT (r) is constant for (t − 1)/T ≤ r < t/T, we have T −1X T t=1 WT (rt−1) 2 = X T t=1 Z t/T (t−1)/T WT (r) 2 dr = Z 1 0 WT (r) 2 dr. The continuous mapping theorem applies to h(WT ) = R 1 0 WT (r) 2dr. It follows that h(WT ) =⇒ h(W), so that T −2 PT t=1 Y 2 t−1 =⇒ σ 2 R 1 0 W(r) 2dr, as claimed. (c). The proof of item (c) is analogous to that of (b). First rewrite T −3/2 PT t=1 Yt−1 in term of WT (rt−1) ≡ T −1/2Yt−1/σ = T −1/2 Pt−1 s=1 us/σ, where rt−1 = (t − 1)/T, so that T −3/2 PT t=1 Yt−1 = σT −1 PT t=1 WT (rt−1). Because WT (r) is constant for 10