Theorem 2(Granger's Representation Theorem) Let the process yt satisfy the equation (3)for t= 1, 2,. and let for a and b of dimension kx h and rank hI and let BC(1)A⊥ have full rank k-h. We define y=A⊥(BC()A1)-B Then Ayt and A'y can be given initial distributions, such that (b)A'y is stationary, (c) yt is nonstationary, with linear trend Tt=yct Further (d)E(Ay)=(B'B)-B'c+B'B)-(B'C(1)A1)(B C(1)A1-B1c (e)E(△yt)=T If B c=0, then T=0 and the linear trend disappears. However, the cointe- grating vector still contain a constant term, i.e. E(A)=(BB)Bc, when B f)△yt=业(L)(et+c) with y(1)=业.For业1(L)=(业(L)-业(1)/(1-L) so that业(L)=业(1)+ (1- LVI(L, the process has the representation (g) yt=yo +y LEi+rt+St-So here St=业1(L)et Proof: See Johansen(1991), p 1559 I Define the orthogonal complements P1 of any matrix P of rank g and dimension n x g as follows(0<q<n) (a)P⊥ is of dimension n×(n-q); (b)P⊥P=0 n-q)×q P (c)Pi has rank n-4, and its column space lies in the null space of P
Theorem 2 (Granger’s Representation Theorem): Let the process yt satisfy the equation (3) for t = 1, 2, ..., and let ξ0 = −BA′ for A and B of dimension k × h and rank h, 1 and let B ′ ⊥C(1)A⊥ have full rank k − h. We define Ψ = A⊥(B ′ ⊥C(1)A⊥) −1B ′ ⊥. Then △yt and A′yt can be given initial distributions, such that (a) △yt is stationary, (b) A′yt is stationary, (c) yt is nonstationary, with linear trend τ t = Ψct. Further (d) E(A′yt) = (B′B) −1B′c + (B′B) −1 (B′C(1)A⊥)(B′ ⊥C(1)A⊥) −1B′ ⊥c, (e) E(△yt) = τ . If B′ ⊥c = 0, then τ = 0 and the linear trend disappears. However, the cointegrating vector still contain a constant term, i.e. E(A′yt) = (B′B) −1B′c, when B′ ⊥c = 0. (f) △yt = Ψ(L)(εt + c) with Ψ(1) = Ψ. For Ψ1(L) = (Ψ(L) − Ψ(1))/(1 − L) so that Ψ(L) = Ψ(1) + (1 − L)Ψ1(L), the process has the representation (g) yt = y0 + Ψ PT i=1 εi + τ t + St − S0, where St = Ψ1(L)εt . Proof: See Johansen (1991), p.1559. 1Define the orthogonal complements P⊥ of any matrix P of rank q and dimension n × q as follows (0 < q < n): (a) P⊥ is of dimension n × (n − q); (b) P′ ⊥P = 0(n−q)×q, P′P⊥ = 0q×(n−q) ; (c) P⊥ has rank n − q, and its column space lies in the null space of P′ . 6
3 Maximum likelihood estimation of a gaussian VAR for Cointegration and the test for Coin tegration Rank Consider a general V AR model for the k x 1 vector yt with Gaussian error t=c+重1y-1+重2yt-2+…+重yt-p+Et h E(Et)=0 Ees) Q for t 0 othe We may rewrite(4)in the error correction form △yt=51△y-1+52△y-2+…+5p-1△y4-p+1+c+0y-1+ where 0≡-(I-重1-更2 更 Suppose that y is I(1) with h cointegrating relationship which implies that Ba for B and A an(k x h) matrix. That is, under the hypothesis of h cointegrat ing relations, only h separate linear combination of the level of yt-1 appears in(5) Consider a sample of size T+p observations on y, denoted (y-p+1,y-p+2,. If the disturbance Et are Gaussian, then the log(conditional) likelihood of (y1, y2, .,yr) this V AR model are not necessary I(1) variates and are not necessary cointe prate
3 Maximum Likelihood Estimation of a Gaussian V AR for Cointegration and the Test for Cointegration Rank Consider a general V AR model 2 for the k × 1 vector yt with Gaussian error yt = c + Φ1yt−1 + Φ2yt−2 + ... + Φpyt−p + εt , (4) where E(εt) = 0 E(εtε ′ s ) = Ω for t = s 0 otherwise. We may rewrite (4) in the error correction form: △yt = ξ1△yt−1 + ξ2△yt−2 + ... + ξp−1△yt−p+1 + c + ξ0yt−1 + εt , (5) where ξ0 ≡ −(I − Φ1 − Φ2 − ... − Φp) = −Φ(1). Suppose that yt is I(1) with h cointegrating relationship which implies that ξ0 = −BA′ (6) for B and A an (k × h) matrix. That is, under the hypothesis of h cointegrating relations, only h separate linear combination of the level of yt−1 appears in (5). Consider a sample of size T+p observations on y, denoted (y−p+1, y−p+2, ..., yT ). If the disturbance εt are Gaussian, then the log (conditional) likelihood of (y1, y2, ..., yT ) 2Here, yt in this V AR model are not necessary I(1) variates and are not necessary cointegrated. 7
conditional on(y-p+1, y-p+2, ., yo)is given by C(2.51,E2,…,P-1,C,0)=(-Tk/2)log(2x)-(T/2)logs2-(1/2) ∑[(△y-6:y-1-62 n-1△ !(△y-51△y-1-52△y-2-…-5p-1△y-p+1-c-5oy-1) The goal is to chose(Q2, 51, 52, Sp-1,C, So)so as to maximize(7)subject to the constraint that So can be written in the form of (6) 3.1 Concentrated Log-likelihood Function 3.1.1 Concentrated Likelihood Function We often encounter in practice the situation where the parameter vector 0o can be naturally partitioned into two sub-vectors oo and Bo as 00=(ao Boy Let the likelihood function be L(a B). The MLE is obtained by maximizing L simultaneously for a and B: i.e aIn L aIn L However, sometimes it is easier to maximize L in two step. First, maximize it with respect to B by taking o as given, insert the maximizing value of B back into L; second, maximize L with respect to o. More precisely, define L'(a)=Lo, B(a) (10) where B(a) is defined as the solution to aIn L
conditional on (y−p+1, y−p+2, ..., y0) is given by L(Ω, ξ1 , ξ2 , ..., ξp−1 , c, ξ0 ) = (−T k/2) log(2π) − (T/2) log |Ω| − (1/2) X T t=1 (△yt − ξ1△yt−1 − ξ2△yt−2 − ... − ξp−1△yt−p+1 − c − ξ0yt−1) ′ ×Ω −1 (△yt − ξ1△yt−1 − ξ2△yt−2 − ... − ξp−1△yt−p+1 − c − ξ0yt−1) . (7) The goal is to chose (Ω, ξ1 , ξ2 , ..., ξp−1 , c, ξ0 ) so as to maximize (7) subject to the constraint that ξ0 can be written in the form of (6). 3.1 Concentrated Log-likelihood Function 3.1.1 Concentrated Likelihood Function We often encounter in practice the situation where the parameter vector θ0 can be naturally partitioned into two sub-vectors α0 and β0 as θ0 = (α′ 0 β ′ 0 ) ′ . Let the likelihood function be L(α β). The MLE is obtained by maximizing L simultaneously for α and β: i.e. ∂ ln L ∂α = 0; (8) ∂ ln L ∂β = 0. (9) However, sometimes it is easier to maximize L in two step. First, maximize it with respect to β by taking α as given, insert the maximizing value of β back into L; second, maximize L with respect to α. More precisely, define L ∗ (α) = L[α, βˆ(α)], (10) where βˆ(α) is defined as the solution to ∂ ln L ∂β βˆ = 0, (11) 8