Ch. 16 Stochastic Model Building Unlike linear regression model which usually has an economic theoretic model built somewhere in economic literature, the time series analysis of a stochastic process needs the ability to relating a stationary ARMA model to real data. It is usually best achieved by a three-stage iterative procedure based on identi fication estimation, and diagnostic checking as suggested by Box and Jenkins(1976) 1 Model identification By identification we mean the use of the data, and of any information on how the series was generated, to suggest a subclass of parsimonious model worthy to be entertained. We usually transform the data, if necessary, so the assumption of covariance stationarity is a reasonable one. We then at this stage make an initial guess of small values of p and q for an ARM A(p, q) model that might describe the transformed data 1.1 Identifying the degree of Difference Trend stationary or difference stationary See Ch. 19 1.2 Use of the autocorrelation and partial autocorrela tion function in identification 1.2.1 Autocorrelation Recall that if the data really follow an MA(q) process, then its(population) autocorrelation Ti(1i/7o)will be zero for j>q. By contrast, if the data follow an AR(p)process, then r, will gradually decay toward zero as a mixture ial or damped sinusoids. On guide for distinguishing MA and AR representation, then, would be the decay properties of ri. It is useful to have a rough check on whether ri is effectively zero beyond a certain lag
Ch. 16 Stochastic Model Building Unlike linear regression model which usually has an economic theoretic model built somewhere in economic literature, the time series analysis of a stochastic process needs the ability to relating a stationary ARMA model to real data. It is usually best achieved by a three-stage iterative procedure based on identification, estimation, and diagnostic checking as suggested by Box and Jenkins (1976). 1 Model Identification By identification we mean the use of the data, and of any information on how the series was generated, to suggest a subclass of parsimonious model worthy to be entertained. We usually transform the data, if necessary, so the assumption of covariance stationarity is a reasonable one. We then at this stage make an initial guess of small values of p and q for an ARMA(p, q) model that might describe the transformed data. 1.1 Identifying the degree of Difference Trend stationary or difference stationary ? See Ch. 19. 1.2 Use of the Autocorrelation and Partial Autocorrelation Function in Identification 1.2.1 Autocorrelation Recall that if the data really follow an MA(q) process, then its (population) autocorrelation rj(= γj/γ0) will be zero for j > q. By contrast, if the data follow an AR(p) process, then rj will gradually decay toward zero as a mixture of exponential or damped sinusoids. On guide for distinguishing MA and AR representation, then, would be the decay properties of rj . It is useful to have a rough check on whether rj is effectively zero beyond a certain lag. 1
A natural estimate of the population autocorrelation r, is provided by the corresponding sample moment: (remember at this stage, you still have no"model to estimate, so it is natural to use moment estimator where ∑(Y1-F)(Y--F)forj=0,1,2, If the data were really generated by a Gauss MA(g process, then the covariance of the estimated autocorrelation ii, could be approximated by(see Box et al (1994),p.33) var()271+2∑+}oj=q+1yq+2 To use(1)in practice, the estimated autocorrelation F, G= 1, 2,,q)are ubstituted for the theoretical autocorrelation ri, and when this is done we shall refer to the square root of (1)as the large - lag standard error. In particu- ar, if we suspect that the data were generated by Gaussian white noise, then ~N(0,1/T)forj≠0, that is r, should lie between±2/√ T about95% of the time The following estimated autocorrelations were obtained from a time series of length T= 200 observations, generated from a stochastic process for which it was known that r1=-04 and r,=0 for j> 2 1=-0.38,72=-0.08,3=0.11,4=-0.08,5=0.02,6=0.00,77=0.00 On the assumption that the series is complete random: Ho: 9=0, then for all lags, (1) (1) 0.005
A natural estimate of the population autocorrelation rj is provided by the corresponding sample moment: (remember at this stage, you still have no ”model” to estimate, so it is natural to use moment estimator) rˆj = γˆj γˆ0 , where γˆj = 1 T X T t=j+1 (Yt − Y¯ )(Yt−j − Y¯ ) for j = 0, 1, 2, ..., T − 1 Y¯ = 1 T X T t=1 Yt . If the data were really generated by a Gauss MA(q) process, then the covariance of the estimated autocorrelation rˆj , could be approximated by (see Box et al. (1994), p. 33) V ar(rˆj) ∼= 1 T ( 1 + 2 X q i=1 r 2 i ) for j = q + 1, q + 2, ... (1) To use (1) in practice, the estimated autocorrelation rˆj (j = 1, 2, ..., q) are substituted for the theoretical autocorrelation rj , and when this is done we shall refer to the square root of (1) as the large − lag standard error. In particular, if we suspect that the data were generated by Gaussian white noise, then rˆj ∼ N(0, 1/T) for j 6= 0, that is rˆj should lie between ±2/ √ T about 95% of the time. Example: The following estimated autocorrelations were obtained from a time series of length T = 200 observations, generated from a stochastic process for which it was known that r1 = −0.4 and rj = 0 for j ≥ 2: rˆ1 = −0.38, rˆ2 = −0.08, rˆ3 = 0.11, rˆ4 = −0.08, rˆ5 = 0.02, rˆ6 = 0.00, rˆ7 = 0.00, rˆ8 = 0.00, rˆ9 = 0.07 and rˆ10 = −0.08. On the assumption that the series is complete random: H0 : q = 0, then for all lags, (1) yields V ar(rˆ1) ∼= 1 T = 1 200 = 0.005. 2
Under the null hypothesis 1~N(0.,0.005) or the 95% confidence interval is 2 √0.005 0.14<f1<0.14. Since the value of estimated Fi is-0.38, which is outside the confidence interval it can be conclude that the hypothesis that q=0 is rejected It might be reasonable to ask next whether the series was compatible with the hypothesis that q= 1. Using (1) with q= 1, the estimated large-lag variance under this assumption is var()≌1+2(-0389 0.0064 Under the null hypothesis 2~N(0,0.0064) or the 95% confidence interval is 2 <2 0.0064 三-0.16<2<0.16 Since the value of estimated F2 is-0.08, which is lying in the confidence interval it can be conclude that the hypothesis that g= l is accepted 1.2.2 Partial Autocorrelation Function Another useful measures is the partial autocorrelation which is a device to exploits the fact that whereas an AR(p) has an autocorrelation function which is infinite in extent, it can by its very nature be described in terms of p nonzero functions of the autocorrelations. The mth population partial autocorrelation
Under the null hypothesis, rˆ1 ∼ N(0, 0.005) or the 95% confidence interval is −2 < rˆ1 √ 0.005 < 2 ≡ −0.14 < rˆ1 < 0.14. Since the value of estimated rˆ1 is −0.38, which is outside the confidence interval, it can be conclude that the hypothesis that q = 0 is rejected. It might be reasonable to ask next whether the series was compatible with the hypothesis that q = 1. Using (1) with q = 1, the estimated large-lag variance under this assumption is V ar(rˆ2) ∼= 1 200 [1 + 2(−0.38)2 ] = 0.0064. Under the null hypothesis, rˆ2 ∼ N(0, 0.0064) or the 95% confidence interval is −2 < rˆ2 √ 0.0064 < 2 ≡ −0.16 < rˆ2 < 0.16. Since the value of estimated rˆ2 is −0.08, which is lying in the confidence interval, it can be conclude that the hypothesis that q = 1 is accepted. 1.2.2 Partial Autocorrelation Function Another useful measures is the partial autocorrelation which is a device to exploits the fact that whereas an AR(p) has an autocorrelation function which is infinite in extent, it can by its very nature be described in terms of p nonzero functions of the autocorrelations. The mth population partial autocorrelation 3
(denoted aim))is defined as the last coefficient in a linear projection of y on its m most recent value Yt+-=a(Y2-p)+a2m)(Y1-1-p)+…+a(m)(Y1=m+1-1).(2) We saw in(15 )of Chapter 15 that the vector a(m) can be calculated from 70 71 71 71 Recall that if the data were really generated by an AR(p) process, only p most recent values of Y would be useful for forecasting. In this case. A projection coefficients on Y's more than p periods in the past are equal to zeros a(m)=0 for m=p+1,p+2, By contrast, if the data really were generated by an MA(g process with q>1 then the partial autocorrelation a m)asymptotically approaches zero instead of cutting off abruptly Since forecast error Et+1 is uncorrelated with xt, we could rewrite(2)as Y+1-4=am(Y-p)+a2m)(Y-1-)+…+am)(Y1=m+1-p)+e+,t∈T Y-=a(Y-1-p)+a2(Y-2-p)+…+am)(Y1-m-p)+e,t∈T.(3) The reason why the quantity am 'defined through(2)is called the partial autocorrelation of the process Yt at lag m is clear from(3), since it is actually equal to the partial correlation between the variable Yt and Yi-m adjusted for the interme- diate variables yl.yo,Y-m. and a(m)measures the correlation between Yt and Y-m after adjusting for the effect of Yt-1, Yt-2,., lation between Yt and Yi-m not account for by Yt-1, Yt-2,.,Yt-m+1). See the
(denoted α (m) m ) is defined as the last coefficient in a linear projection of Y on its m most recent value: Yˆ t+1|t − µ = α (m) 1 (Yt − µ) + α (m) 2 (Yt−1 − µ) + ... + α (m) m (Yt−m+1 − µ). (2) We saw in (15) of Chapter 15 that the vector α(m) can be calculated from α (m) 1 α (m) 2 . . . α (m) m = γ0 γ1 . . . γm−1 γ1 γ0 . . . γm−2 . . . . . . . . . . . . . . . . . . γm−1 γm−2 . . . γ0 −1 γ1 γ2 . . . γm . Recall that if the data were really generated by an AR(p) process, only the p most recent values of Y would be useful for forecasting. In this case, the projection coefficients on Y ’s more than p periods in the past are equal to zeros: α (m) m = 0 for m = p + 1, p + 2, ... By contrast, if the data really were generated by an MA(q) process with q ≥ 1, then the partial autocorrelation α (m) m asymptotically approaches zero instead of cutting off abruptly. Since forecast error εt+1 is uncorrelated with xt , we could rewrite (2) as Yt+1 − µ = α (m) 1 (Yt − µ) + α (m) 2 (Yt−1 − µ) + ... + α (m) m (Yt−m+1 − µ) + εt+1, t ∈ T or Yt − µ = α (m) 1 (Yt−1 − µ) + α (m) 2 (Yt−2 − µ) + ... + α (m) m (Yt−m − µ) + εt , t ∈ T . (3) The reason why the quantity α (m) m defined through (2) is called the partial autocorrelation of the process {Yt} at lag m is clear from (3), since it is actually equal to the partial correlation between the variable Yt and Yt−m adjusted for the intermediate variables Yt−1, Yt−2, ..., Yt−m+1, and α (m) m measures the correlation between Yt and Yt−m after adjusting for the effect of Yt−1, Yt−2, ..., Yt−m+1 (or the correlation between Yt and Yt−m not account for by Yt−1, Yt−2, ..., Yt−m+1). See the 4
counterpart-result from sample on p. 6 of Chapter 6 A natural estimate of the mth partial autocorrelations is the last coefficients in an OLS regression of y on a constant and its m most recent values Y=+amY2-1+a2)Y=2+…+am)Y-m+e, where et denotes the Ols regression residual. If the data were really generated by an AR(p) process, then the sample estimate am would have a variance around the true value(0)that could be approximated by(see Box et al. 1994, p. 68) Var(am ))e for m=p+1,p+ 1. 3 Use of model selection Criteria Another approach to model selection is the use of information criteria such as AlC proposed by Akaike(1974)or the BIC of Schwarz(1978). In the implementation of this approach, a range of potential ARMA models is estimated by maximum likelihood methods to be discussed in Chapter 17, and for each, a criterion such AIC (normalized by sample size T, given by 2In(ma. timized likelihood)+2m 2m AlCp. 9 T or the related Bic given by mIn(t BIO In(0)+ is evaluated. where az denotes the maximum likelihood estimate of m=p+q+l denotes the number of parameters estimated in the model, in- cluding a constant term. In the criteria above the first term essentially corre- ponds to minus 2/T times the log of the maximized likelihood, while the second term is a" penalty factor"for inclusion of additional parameters in the model. In the information criteria approach, models that yield a minimum value for the criterion are to be preferred, and the AIC or BIC values are compared among various model as the basis for selection of the models. However. one immediate
counterpart-result from sample on p.6 of Chapter 6. A natural estimate of the mth partial autocorrelations is the last coefficients in an OLS regression of Y on a constant and its m most recent values: Yt = cˆ+ αˆ (m) 1 Yt−1 + αˆ (m) 2 Yt−2 + ... + αˆ (m) m Yt−m + eˆt , (4) where eˆt denotes the OLS regression residual. If the data were really generated by an AR(p) process, then the sample estimate αˆ (m) m would have a variance around the true value (0) that could be approximated by (see Box et al. 1994, p.68) V ar(αˆ (m) m ) ∼= 1 T for m = p + 1, p + 2, ... 1.3 Use of Model Selection Criteria Another approach to model selection is the use of information criteria such as AIC proposed by Akaike (1974) or the BIC of Schwarz (1978). In the implementation of this approach, a range of potential ARMA models is estimated by maximum likelihood methods to be discussed in Chapter 17, and for each, a criterion such as AIC (normalized by sample size T, given by AICp,q = −2 ln(maximized likelihood) + 2m T ≈ ln(σˆ 2 ) + 2m T or the related BIC given by BICp,q = ln(σˆ 2 ) + m ln(T) T is evaluated, where σˆ 2 denotes the maximum likelihood estimate of σ 2 , and m = p + q + 1 denotes the number of parameters estimated in the model, including a constant term. In the criteria above, the first term essentially corresponds to minus 2/T times the log of the maximized likelihood, while the second term is a ”penalty factor” for inclusion of additional parameters in the model. In the information criteria approach, models that yield a minimum value for the criterion are to be preferred, and the AIC or BIC values are compared among various model as the basis for selection of the models. However, one immediate 5