3 The model and estimators 3.1 A generally nonstationary I(d)process Let yt be a nonstationary fractional difference process generated by There 0.5< d< 1.5, and Et, the fractional difference of yt, satisfy Assumption 1 The process yt can be also represented equivalently as 6=1 and(1-L)u where d=1+d, and.5< d <0.5. Initial condition for (3)are set at t=0 and yo=0. We consider the two least-squares regression equations yt= Byt-1+ it yt=&+ Byt-1+ut, re B and(a, B) are the conventional least-square regression coefficients. We shall be concerned with the limiting distribution of the regression coefficients in(6) and (7)under the hypothesis that the data are generated by 3) or equivalently by(4) and(5). Thus for the null values d= do, it will become B=l, a=0, and d= do Under(4)and(5), sample moments of yt and ut that are useful to derive the Ols estimator are collected in the following lemma Lemma2.AsT→∞,then (a)T l→VoBA(1) T 2-2d 2-1→Va2J[Ba(r)2d (c)T-1-2的→va2[Ba(1)2 T W-1→V=JBar 5
3 The model and estimators 3.1 A generally nonstationary I(d) process Let yt be a nonstationary fractional difference process generated by (1 − L) d˜ yt = εt , (3) where 0.5 < ˜d < 1.5, and εt , the fractional difference of yt , satisfy Assumption 1. The process yt can be also represented equivalently as yt = βyt−1 + ut ; (4) β = 1 and (1 − L) dut = εt , (5) where ˜d = 1 + d, and −0.5 < d < 0.5. Initial condition for (3) are set at t = 0 and y0 = 0. We consider the two least-squares regression equations: yt = βyˆ t−1 + ˆut , (6) yt = ˘α + βy˘ t−1 + ˘ut , (7) where βˆ and (˘α, β˘) are the conventional least-square regression coefficients. We shall be concerned with the limiting distribution of the regression coefficients in (6) and (7) under the hypothesis that the data are generated by (3) or equivalently by (4) and (5). Thus for the null values ˜d = ˜d0, it will become β = 1, α = 0, and d = d0. Under (4) and (5), sample moments of yt and ut that are useful to derive the OLS estimator are collected in the following lemma. Lemma 2. As T → ∞, then (a) T − 1 2 −d P T t=1 ut ⇒ V 1 2 d σεBd(1), (b) T −2−2d P T t=1 y 2 t−1 ⇒ Vdσ 2 ε R 1 0 [Bd(r)]2dr, (c) T −1−2d y 2 T ⇒ Vdσ 2 ε [Bd(1)]2 , (d) T − 3 2 −d P T t=1 yt−1 ⇒ V 1 2 d σε R 1 0 Bd(r)dr, 5
(e)T∑2 -1lt→是Va2[Ba(1)2ifd>0, (g)T1∑v-12-2ifd<0 (山)T--d∑tat→Vao[B4(1)-JBa(r)r] ()T=9-∑tm1→V2arBr)d )7=3-2∑t2-1→Va2Jr[Ba(r)2dr Joint weak convergence for the sample moments give above to their respective limits is easily established by Cramer-Wald Theorem and will be used below 3.2 Limiting Distributions of the Statistics In this section we characterize the limiting distribution of the coefficient estimator 6, a and B under the maintained hypothesis that the time series yt is generated by (4)and(5) For the regression model(6), then as T→∞ 1[Ba(1)]2 a)T(-1)→)2b when d>0 2Vao2Jo[Ba(r)]dr when d 0: and (c)T(B-1) 1{B(1)2-} when d=0 2 o [B(r)]2dr For the regression model(7), then as T-0o when d>0 then ()r+anB①(hB)h-号B(1)B) fo [Ba(r)2dr-lo Ba(r)dr)2 6
(e) T −1 P T t=1 u 2 t p −→ σ 2 u = E(u 2 t ), (f) T −1−2d P T t=1 yt−1ut ⇒ 1 2 Vdσ 2 ε [Bd(1)]2 if d > 0, (g) T −1 P T t=1 yt−1ut p −→ −1 2 σ 2 u if d < 0, (h) T − 3 2 −d P T t=1 tut ⇒ V 1 2 d σε[Bd(1) − R 1 0 Bd(r)dr], (i) T − 5 2 −d P T t=1 tyt−1 ⇒ V 1 2 d σε R 1 0 rBd(r)dr, (j) T −3−2d P T t=1 ty2 t−1 ⇒ Vdσ 2 ε R 1 0 r[Bd(r)]2dr. Joint weak convergence for the sample moments give above to their respective limits is easily established by Cramer-Wald Theorem and will be used below. 3.2 Limiting Distributions of the Statistics In this section we characterize the limiting distribution of the coefficient estimator βˆ, ˘α and β˘ under the maintained hypothesis that the time series yt is generated by (4) and (5). Theorem 1. For the regression model (6), then as T → ∞: (a) T(βˆ − 1) ⇒ 1 2 [Bd(1)]2 R 1 0 [Bd(r)]2dr , when d > 0; (b) T 1+2d (βˆ − 1) ⇒ − 1 2 σ 2 u Vdσ 2 ε R 1 0 [Bd(r)]2dr , when d < 0; and (c) T(βˆ − 1) ⇒ 1 2 {[B(1)]2 − σ 2 u σ2 ε } R 1 0 [B(r)]2dr , when d = 0. For the regression model (7), then as T → ∞: when d > 0, then (d) T 1 2 −dα˘ ⇒ V 1 2 d σεBd(1){ R 1 0 [Bd(r)]2dr − 1 2Bd(1) R 1 0 Bd(r)dr} R 1 0 [Bd(r)]2dr − [ R 1 0 Bd(r)dr] 2 , and 6
()7(-1)=B(1)2-B1)Ba JolBa(r)j2dr-lo Ba(r)dr] when d <0. then ojO Ba(r)dr ()T+→2B()h-[BB (g)m1+4(8-1)→-5VDB()Ph-bB()P when d=0 then o:{B(1)/lB()2d-3B(1)2-:}/bB()b (h)Ta→ and Jo[B(r)2dr-o B(r)dr]2 (i)T( 号{B(1)2-2}-B(1)/B(r)h Jo[B(r)dr-Uo B()dr]2 We first discuss the results from model(6). The convergence rates of (B-1) depend intrinsically on the degree of fractional difference in the ut process. The distribution of Tminl1, 1+2d(Br-1)is therefore called a generalized fractional unit root distribution. This fact is also discussed in Sowell(1990) and Tanaka(1999 Corollary 2. 4) where Et in(3)is assumed to be ii d. and to be infinite order moving average process respectively. It may be easily illustrated that when the innovation process Et is i.i.d. (0, 02), we have g [r(1-2d)/r(1-d)2]02,3 leading to the following simplification of part(b) and(c)of Theorem 1 (+dr(-d. when d<0 1+d ∫Ba(r)2dtr 7(6-1)=B()2-1 Jo[B(r)dr, when d=0 Result( 8)was first given by Sowell(1990)and result(9)was given by Dicke 3See for example Baillie(1996)
(e) T(β˘ − 1) ⇒ 1 2 [Bd(1)]2 − Bd(1) R 1 0 Bd(r)dr R 1 0 [Bd(r)]2dr − [ R 1 0 Bd(r)dr] 2 ; when d < 0, then (f) T 1 2 +dα˘ ⇒ 1 2 σ 2 u R 1 0 Bd(r)dr V 1 2 d σε{ R 1 0 [Bd(r)]2dr − [ R 1 0 Bd(r)dr] 2} , and (g) T 1+2d (β˘ − 1) ⇒ − 1 2 σ 2 u Vdσ 2 ε{ R 1 0 [Bd(r)]2dr − [ R 1 0 Bd(r)dr] 2} ; when d = 0, then (h) T 1 2α˘ ⇒ σε{B(1) R 1 0 [B(r)]2dr − 1 2 {[B(1)]2 − σ 2 u σ2 ε } R 1 0 B(r)dr} R 1 0 [B(r)]2dr − [ R 1 0 B(r)dr] 2 , and (i) T(β˘ − 1) ⇒ 1 2 {[B(1)]2 − σ 2 u σ2 ε } − B(1) R 1 0 B(r)dr R 1 0 [B(r)]2dr − [ R 1 0 B(r)dr] 2 . We first discuss the results from model (6). The convergence rates of (βˆ − 1) depend intrinsically on the degree of fractional difference in the ut process. The distribution of T min[1,1+2d] (βˆ T − 1) is therefore called a generalized fractional unit root distribution. This fact is also discussed in Sowell (1990) and Tanaka (1999, Corollary 2.4) where εt in (3) is assumed to be i.i.d. and to be infinite order moving average process respectively. It may be easily illustrated that when the innovation process εt is i.i.d.(0, σ2 ), we have σ 2 u = [Γ(1 − 2d)/Γ(1 − d) 2 ]σ 2 , 3 leading to the following simplification of part (b) and (c) of Theorem 1: T 1+2d (βˆ − 1) ⇒ − ( 1 2 + d) Γ(1+d) Γ(1−d) R 1 0 [Bd(r)]2dr , when d < 0; (8) and T(βˆ − 1) ⇒ 1 2 {[B(1)]2 − 1} R 1 0 [B(r)]2dr , when d = 0. (9) Result (8) was first given by Sowell (1990) and result (9) was given by Dickey 3See for example Baillie (1996). 7
and Fuller(1979). Theorem 1 therefore extends( 8 )and(9) to the very general case of weakly dependent distributed data after difference-d times It is interesting to note that when d>0, the assumption on Et did not play any role in determining this limiting distribution. It converges to the same distrib ution as that of Sowell(1990) and Tanaka(1999 ) When d<0, the distribution of 1) has the same general form for a very wide class of the innovation process Et. It reduces to be the distribution of Phillips(1987, Theorem 3.1,(c))when d=0 Similar conclusion applies to the results from model (7). The simplification of part (g)of Theorem I when Et is ii d (0, 02)is (+d JoB(r)]2dr-Lo B(r)dr)2 4 Statistical Inference of the fractional differ ence parameter 4. 1 Test for 0.5< d< 1 The limiting distribution of the regression coefficients when -05< d <0(0.5< d 1) given in last section depend upon the nuisance parameter on and a. These distributions are therefore not directly usable for statistical testing. However, since oa and o2 may be consistently estimated and the estimate may be used to construct modified statistics whose limiting distribution are independent of (o2, 02), there exist simple transformation of the test statistics which eliminate the nuisance parameters asymptotically This idea was first developed by Phillips(1987) and Phillips and Perron(1988 in the context of test for a unit root. Here we show how the similar procedure may be extended to apply to test for the fractional difference parameter value in a quite generally fractional integrated process. First due to the ergodicity assumption of ut, consistent estimation of o2 are provided by 02=T-2(yt-yt-12for data
and Fuller (1979). Theorem 1 therefore extends (8) and (9) to the very general case of weakly dependent distributed data after difference-d times. It is interesting to note that when d > 0, the assumption on εt did not play any role in determining this limiting distribution. It converges to the same distribution as that of Sowell (1990) and Tanaka (1999). When d < 0, the distribution of T 1+2d (βˆ−1) has the same general form for a very wide class of the innovation process εt . It reduces to be the distribution of Phillips (1987, Theorem 3.1, (c)) when d = 0. Similar conclusion applies to the results from model (7). The simplification of part (g) of Theorem 1 when εt is i.i.d.(0, σ2 ) is: T 1+2d (β˘ − 1) ⇒ − ( 1 2 + d) Γ(1+d) Γ(1−d) R 1 0 [B(r)]2dr − [ R 1 0 B(r)dr] 2 . (10) 4 Statistical Inference of the Fractional Difference Parameter 4.1 Test for 0.5 < ˜d < 1 The limiting distribution of the regression coefficients when −0.5 < d < 0 (0.5 < d < 1) given in last section depend upon the nuisance parameter σ 2 u and σ 2 ε . These distributions are therefore not directly usable for statistical testing. However, since σ 2 u and σ 2 ε may be consistently estimated and the estimate may be used to construct modified statistics whose limiting distribution are independent of (σ 2 u ,σ 2 ε ), there exist simple transformation of the test statistics which eliminate the nuisance parameters asymptotically. This idea was first developed by Phillips (1987) and Phillips and Perron (1988) in the context of test for a unit root. Here we show how the similar procedure may be extended to apply to test for the fractional difference parameter value in a quite generally fractional integrated process. First due to the ergodicity assumption of ut , consistent estimation of σ 2 u are provided by ˜σ 2 u = T −1 PT t=1(yt − yt−1) 2 for data 8
generated by(4)and(5). Since B and(&, B)are consistent by Theorem 1, we may also use a2=T->t(yt-Byt-1)2 and a ∑t=1(y as a consistent estimator of o2 for model(6)and(7), respectively Consistent estimation of o2 can be in the same spirit with that of Phillips and Perron(1988)by the following simple estimator based on truncated sample autocovariance, namely s1=re?+2∑mn∑s- (11) where Et=(1-L)(yt-yt-1)=(1-L)ut and wI=1-T/(L+1).We may also use Et=(1-L)(yt-Byt-1)and Et=(1-L(yt-a-Byt-1) as alternative estimate to Et in the construction of sT. Conditions for the consistency of sti are and ex- plored by Phillips(1987, Theorem 4.2). We now define some simple transformation of conventional test statistics from the regression(6) and(7) which eliminate the nuisance parameter dependencies asymptotically. Specifically, we define Z( T+2a(6-1 z(d)=71+2(3-1) (13) Z(d)is the transformation of the standard estimator T+2d(B-1)and Zu(d) is the transformation of T+2d(B-1). The limiting distribution of Z(d ) and Zu(d )is given bv Theoren2: Assume that l=0(T2), then as t→∞, z(d)→-2VBa(r)dr z(d0)→-1 2 Vdo o Bdo(r)]2dr-[o Bdo(r)dr]2)
generated by (4) and (5). Since βˆ and (˘α,β˘) are consistent by Theorem 1, we may also use ˆσ 2 u = T −1 PT t=1(yt − βyˆ t−1) 2 and ˘σ 2 u = T −1 PT t=1(yt − α˘ − βy˘ t−1) 2 as a consistent estimator of σ 2 u for model (6) and (7), respectively. Consistent estimation of σ 2 ε can be in the same spirit with that of Phillips and Perron (1988) by the following simple estimator based on truncated sample autocovariance, namely: s 2 T l = T −1X T t=1 ε˜ 2 t + 2T −1X l τ=1 wτ l X T t=τ+1 ε˜tε˜t−τ , (11) where ˜εt = (1−L) d (yt −yt−1) = (1−L) dut and wτ l = 1−τ/(l+ 1). We may also use εˆt = (1 − L) d (yt − βyˆ t−1) and ˘εt = (1 − L) d (yt − α˘ − βy˘ t−1) as alternative estimate to ˜εt in the construction of s 2 T l. Conditions for the consistency of s 2 T l are and explored by Phillips (1987, Theorem 4.2). We now define some simple transformation of conventional test statistics from the regression (6) and (7) which eliminate the nuisance parameter dependencies asymptotically. Specifically, we define Z(d) = s 2 T l σ˜ 2 u T 1+2d (βˆ − 1), (12) and Zµ(d) = s 2 T l σ˜ 2 u T 1+2d (β˘ − 1). (13) Z(d) is the transformation of the standard estimator T 1+2d (βˆ − 1) and Zµ(d) is the transformation of T 1+2d (β˘−1). The limiting distribution of Z(d) and Zµ(d) is given by: Theorem 2: Assume that l = o(T 1 2 ), then as T → ∞, Z(d0) ⇒ − 1 2 1 Vd0 R 1 0 [Bd0 (r)]2dr , and Zµ(d0) ⇒ − 1 2 1 Vd0 ( R 1 0 [Bd0 (r)]2dr − [ R 1 0 Bd0 (r)dr] 2 ) , 9