Ch. 24 Johansen's mle for Cointegration We have so far considered only single-equation estimation and testing for cointe- gration. While the estimation of single equation is convenient and often consis- tent, for some purpose only estimation of a system provides sufficient information This is true, for example, when we consider the estimation of multiple cointe- grating vectors, and inference about the number of such vectors. This chapter examines methods of finding the cointegrating rank and derive the asymptotic distributions. To develop these results, we first begin with a discussion of canon- ical correlation analys 1 Canonical Correlation 1.1 Population Canonical Correlations Let the(n1 x 1) vector yt and the(n2 x 1)vector xt denote stationary ran- dom vector that are measured as deviation from their population means, so that e(yty represent the variance-covariance matrix of yt. In general, there might be complicated correlations among the element of yt and xt, i.e yt yu E(yty E(yx E(xty E(xtx' -[ If the two set are very large, the investigator may wish to study only a few of linear combination of y and xt which yield most highly correlated. He may find that the interrelation is completely described by the correlation between the first few canonical variate We now define two new(n x 1)random vectors, n, and $t, where n the smaller of ni and n2. These vectors are linear combinations of yt and xt, respectively Here, K and A' are(n xni) and(n n2)matrices, respectively. The matrices K and A' are chosen such that the following conditions holds
Ch. 24 Johansen’s MLE for Cointegration We have so far considered only single-equation estimation and testing for cointegration. While the estimation of single equation is convenient and often consistent, for some purpose only estimation of a system provides sufficient information. This is true, for example, when we consider the estimation of multiple cointegrating vectors, and inference about the number of such vectors. This chapter examines methods of finding the cointegrating rank and derive the asymptotic distributions. To develop these results, we first begin with a discussion of canonical correlation analysis. 1 Canonical Correlation 1.1 Population Canonical Correlations Let the (n1 × 1) vector yt and the (n2 × 1) vector xt denote stationary random vector that are measured as deviation from their population means, so that E(yty ′ t ) represent the variance-covariance matrix of yt . In general, there might be complicated correlations among the element of yt and xt , i.e. E yt xt yt xt ′ = E(yty ′ t ) E(ytx ′ t ) E(xty ′ t ) E(xtx ′ t ) = Σyy Σyx Σxy Σxx . If the two set are very large, the investigator may wish to study only a few of linear combination of yt and xt which yield most highly correlated. He may find that the interrelation is completely described by the correlation between the first few canonical variate. We now define two new (n×1) random vectors, ηt and ξt , where n the smaller of n1 and n2. These vectors are linear combinations of yt and xt , respectively: ηt ≡ K′yt , ξt ≡ A′xt . Here, K′ and A′ are (n×n1) and (n×n2) matrices, respectively. The matrices K′ and A′ are chosen such that the following conditions holds. 1
(a)E(n nt)=KEyyK= In and E(5,5t)=A2xxA=In (b)E(Sint)=A'ExyK=R, where 0 0 0 0 R andr;≥0,i=1,2,,n (c) The elements of n, and st are ordered in such a way that 1≥n1≥r2≥….≥Tn≥0. The correlation ri is known as the ith population canonical correlation be- tween yt ane The population canonical correlations and the value of and K can be cal Theorem 1 Let be a positive definite symmetric matrix and let(A1, A2, . ni)be the eigenvalue of∑yyx∑ xX 2xy ordered A1≥A22…≥Amn,Let(k1,k2,…,kn) be the associated(n1 x 1)eigenvectors as normalized by Kyrk i=1 fo 1,2, Let(H1,p2,…,pn2) be the eigenvalue of 2xx 2xy 2yy2yx ordered ur≥p2…2 Any. Let(a1, a2, . . an,)be the associated(n2 x 1) eigenvectors as normalized by a∑ 1fori=1,2, Let n be the smaller of ni and n2, and collect the first n vectors k i and the first n vectors a in matrices K=k1 k2
(a) E(ηtη ′ t ) = K′ΣyyK = In and E(ξtξ ′ t ) = A′ΣxxA = In. (b) E(ξtη ′ t ) = A′ΣxyK = R, where R = r1 0 . . . 0 0 r2 . . . 0 . . . . . . . . . . . . . . . . . . 0 0 . . . rn , and ri ≥ 0, i = 1, 2, ..., n. (c) The elements of ηt and ξt are ordered in such a way that 1 ≥ r1 ≥ r2 ≥ ... ≥ rn ≥ 0. The correlation ri is known as the ith population canonical correlation between yt and xt . The population canonical correlations and the value of A and K can be calculated as follows. Theorem 1: Let Σ = Σyy Σyx Σxy Σxx be a positive definite symmetric matrix and let (λ1, λ2, ..., λn1 ) be the eigenvalue of Σ −1 yyΣyxΣ −1 xxΣxy ordered λ1 ≥ λ2 ≥ ... ≥ λn1 . Let (k1, k2, ..., kn1 ) be the associated (n1 × 1) eigenvectors as normalized by k ′ iΣyyki = 1 for i = 1, 2, ..., n1. Let (µ1, µ2, ..., µn2 ) be the eigenvalue of Σ −1 xxΣxyΣ −1 yyΣyx ordered µ1 ≥ µ2 ≥ ... ≥ µn2 . Let (a1, a2, ..., an2 ) be the associated (n2 × 1) eigenvectors as normalized by a ′ iΣxxai = 1 for i = 1, 2, ..., n2. Let n be the smaller of n1 and n2, and collect the first n vectors ki and the first n vectors aj in matrices K = [k1 k2 ... kn], A = [a1 a2 ... an]. 2
Assuming that A1, A2, . An are distinct, then (a)0≤A<1fori=1,2,…,m1and0≤<1forj=1,2,…,n (b)A= i for i=1,2,…,n; (c)K2yyk= In and A'ExxA= In (d)AΣxK=R, where 0 R 00 We may interpret the canonical correlations as follows. The first canonical variates nIt and Eit can be interpreted as those linear combination of yt and x respectively, such that the correlation between nit and Sit is as large as possible The variates n2t and E2t gives those linear combination of yt and xt that are un- correlated with nit and SIt and yield the largest remaining correlation between met and E2t, and so on 1.2 Sample Canonical Correlations The canonical correlations r: calculated by the procedure just described are pop- ulation parameters-they are functions of the population moments >yy, 2xy, and 2xx. To find their sample analogs, all we have to do is to start from the sample moment of∑yy,Sy,andx Suppose we have a sample of T observations on the(ni 1) vector yt and the (n2 x 1) vector xt, whose sample moment are given by Sy=(1/m∑yy ∑yx=(1T>yx
Assuming that λ1, λ2, ..., λn are distinct, then (a) 0 ≤ λi < 1 for i = 1, 2, ..., n1 and 0 ≤ µj < 1 for j = 1, 2, ..., n2; (b) λi = µi for i = 1, 2, ..., n; (c) K′ΣyyK = In and A′ΣxxA = In; (d) A′ΣxyK = R, where R2 = λ1 0 . . . 0 0 λ2 . . . 0 . . . . . . . . . . . . . . . . . . 0 0 . . . λn . We may interpret the canonical correlations as follows. The first canonical variates η1t and ξ1t can be interpreted as those linear combination of yt and xt , respectively, such that the correlation between η1t and ξ1t is as large as possible. The variates η2t and ξ2t gives those linear combination of yt and xt that are uncorrelated with η1t and ξ1t and yield the largest remaining correlation between η2t and ξ2t , and so on. 1.2 Sample Canonical Correlations The canonical correlations ri calculated by the procedure just described are population parameters–they are functions of the population moments Σyy, Σxy, and Σxx. To find their sample analogs, all we have to do is to start from the sample moment of Σyy, Σxy, and Σxx. Suppose we have a sample of T observations on the (n1 ×1) vector yt and the (n2 × 1) vector xt , whose sample moment are given by Σˆ yy = (1/T) X T t=1 yty ′ t Σˆ yx = (1/T) X T t=1 ytx ′ t Σˆ xx = (1/T) X T t=1 xtx ′ t . 3
Again, in many applications, yt and xt would be measured in deviations from their sample means. Then all the sample canonical correlations can be calculated from∑yy,y 2xx as the procedures described oren
Again, in many applications, yt and xt would be measured in deviations from their sample means. Then all the sample canonical correlations can be calculated from Σˆ yy, Σˆ yx and Σˆ xx as the procedures described in Theorem 1. 4
2 Johansens granger Representation Theorem Consider a general k-dimensional V AR model with Gaussian error written in the error correction form y +0yt-1+ (1) E(Et E(ees) Q for t 0 other The model defined by(1) is rewritten as ∈(L)yt=-5 (D)=(1-Dn-∑:(1-DD-50L C(D=((L)-(1)(1-L) L 5oyt+C(L)△yt=-5oyt+(Lyt-∈(1)yt Soyt+E(L)yt+So E(LY from the fact in(2) that S(1)=-5 Johansen(1991) provide the following fundamental result about error correc- tion models of order 1 and their structure. The basic results is due to granger (1983) and Engle and Granger(1987). In addition he provide dition for the process to be integrated of order 1 and he clarify the role of the onstant te
2 Johansen’s Granger Representation Theorem Consider a general k-dimensional V AR model with Gaussian error written in the error correction form: △yt = ξ1△yt−1 + ξ2△yt−2 + ... + ξp−1△yt−p+1 + c + ξ0yt−1 + εt , (1) where E(εt) = 0 E(εtε ′ s ) = Ω for t = s 0 otherwise. The model defined by (1) is rewritten as ξ(L)yt = −ξ0yt + C(L)△yt = c + εt , where ξ(L) = (1 − L)I − X p−1 i=1 ξi (1 − L)L i − ξ0L 1 (2) and C(L) = (ξ(L) − ξ(1))/(1 − L) = I − X p−1 i=1 ξiL i . (3) Note that −ξ0yt + C(L)△yt = −ξ0yt + ξ(L)yt − ξ(1)yt = −ξ0yt + ξ(L)yt + ξ0yt = ξ(L)yt from the fact in (2) that ξ(1) = −ξ0 . Johansen (1991) provide the following fundamental result about error correction models of order 1 and their structure. The basic results is due to Granger (1983) and Engle and Granger (1987). In addition he provide an explicit condition for the process to be integrated of order 1 and he clarify the role of the constant term. 5