Ch. 15 Forecasting Having considered in Chapter 14 some of the properties of ARMA models, we now show how they may be used to forecast future values of an observed time series. For the present we proceed as if the model were known ecactly Forecasting is an important concept for the studies of time series analysis. In the scope of regression model we usually has an existing economic theory model for us to estimate their parameters. The estimated coefficients have already a role to play such as to confirm some economic theories. Therefore, to forecast or not from this estimated model depends on researchers own interest. However the estimated coefficients from a time series model have no significant meaning to economic theory. An important role that a time series analysis is therefore to be able to forecast precisely from this pure mechanical model 1 Principle of Forecasting 1. 1 For s Based on Conditional Expectations Suppose we are interested in forecasting the value of a variables Yi+1 based on a set of variables xt observed at date t. For example, we might want to forecast Yi+1 based on its m most recent values. In this case, xt=Yt, Yt-1,.,Yt-m+1l Let Yi+llt denote a forecast of Yi+1 based on xt(a function of xt, depending on how they are realized). To evaluate the usefulness of this forecast, we need to specify a loss function. a quadratic loss function means choosing the forecast t+1lt so as to minimize MSE(Y(+u)=E(Y+1-Y1+) which is known as the mean squared error Theorem The smallest mean squared error of in the forecast Y*ul is the expectation of Yt+1 conditional on x t+1|=E(Yt+1x)
Ch. 15 Forecasting Having considered in Chapter 14 some of the properties of ARMA models, we now show how they may be used to forecast future values of an observed time series. For the present we proceed as if the model were known exactly. Forecasting is an important concept for the studies of time series analysis. In the scope of regression model we usually has an existing economic theory model for us to estimate their parameters. The estimated coefficients have already a role to play such as to confirm some economic theories. Therefore, to forecast or not from this estimated model depends on researcher’s own interest. However, the estimated coefficients from a time series model have no significant meaning to economic theory. An important role that a time series analysis is therefore to be able to forecast precisely from this pure mechanical model. 1 Principle of Forecasting 1.1 Forecasts Based on Conditional Expectations Suppose we are interested in forecasting the value of a variables Yt+1 based on a set of variables xt observed at date t. For example, we might want to forecast Yt+1 based on its m most recent values. In this case, xt = [Yt , Yt−1, ...., Yt−m+1] 0 . Let Y ∗ t+1|t denote a forecast of Yt+1 based on xt (a function of xt , depending on how they are realized). To evaluate the usefulness of this forecast, we need to specify a loss function. A quadratic loss function means choosing the forecast Y ∗ t+1|t so as to minimize MSE(Y ∗ t+1|t ) = E(Yt+1 − Y ∗ t+1|t ) 2 , which is known as the mean squared error. Theorem: The smallest mean squared error of in the forecast Y ∗ t+1|t is the expectation of Yt+1 conditional on xt : Y ∗ t+1|t = E(Yt+1|xt). 1
Let g(xt) be a forecasting function of Yi+1 other then the conditional expectation E(Y+1lx). Then the MSE associated with g(xt) would be 1Y+1-9(x)2=EY+1-E(Yi+1x)+E(Y+l|x E(Y+1|xt)2 +2E{[Yt+1-E(Yt+1x)E(Yt+1x)-9(x +E{E[(Yt+1|x)-9(x)]2} Denote mu+1= EY++1-E(Yi+lxuIE(Y++1xt)-g(xt)I we have E(mh+1|xt)=[E(Yt+|x)-9(x×E(Yt+1-E(Y+1|x川)x) [E(Yt+1xt)-9(x) By laws of iterated expectation, it follows that E(m+1)=Ex,E(E[n+1xd])=0. Therefore we have EY+1-9(x)2=E+1-E(Y+1|x)]2+E{E[(Y+|x)-9(x)2}(1) The second term on the right hand side of (1)cannot be made smaller than zero and the first term does not depend on g(xt). The function g(xt) that can makes the mean square error(1) as small as possible is the function that sets the second term in(1)to zero xt)=e(i+ilx) The mse of this optimal forecast is 9(x)2=EY+1-E(Yt+1|x)2
Proof: Let g(xt) be a forecasting function of Yt+1 other then the conditional expectation E(Yt+1|xt). Then the MSE associated with g(xt) would be E[Yt+1 − g(xt)]2 = E[Yt+1 − E(Yt+1|xt) + E(Yt+1|xt) − g(xt)]2 = E[Yt+1 − E(Yt+1|xt)]2 +2E{[Yt+1 − E(Yt+1|xt)][E(Yt+1|xt) − g(xt)]} +E{E[(Yt+1|xt) − g(xt)]2 }. Denote ηt+1 ≡ E{[Yt+1 − E(Yt+1|xt)][E(Yt+1|xt) − g(xt)]} we have E(ηt+1|xt) = [E(Yt+1|xt) − g(xt)] × E([Yt+1 − E(Yt+1|xt)]|xt) = [E(Yt+1|xt) − g(xt)] × 0 = 0. By laws of iterated expectation, it follows that E(ηt+1) = ExtE(E[ηt+1|xt ]) = 0. Therefore we have E[Yt+1 − g(xt)]2 = E[Yt+1 − E(Yt+1|xt)]2 + E{E[(Yt+1|xt) − g(xt)]2 }. (1) The second term on the right hand side of (1) cannot be made smaller than zero and the first term does not depend on g(xt). The function g(xt) that can makes the mean square error (1) as small as possible is the function that sets the second term in (1) to zero: g(xt) = E(Yt+1|xt). The MSE of this optimal forecast is E[Yt+1 − g(xt)]2 = E[Yt+1 − E(Yt+1|xt)]2 . 2
1.2 Forecasts Based on Linear Projection Suppose we now consider only the class of forecast that Yt+1 is a linear function Definition The forecast axt is called the linear projection of Yt+1 on xt if the forecast error (Yt+1-a'xt) is uncorrelated with xt E[(Y+1-ax)xl]=0 Theo The linear projection produces the smallest mean squared error among the class of linear forecasting rule Proof Let gx be any arbitrary linear forecasting function of Yt+1. Then the mse associated with gx would be E[Y++1-g'x1=ElY++1-a'xt+a'xt-g'x EYt x -2EY++1-a'xia'xt-g'xI Ela Denote nt+1= EY+1-a'xtlla'xt-g'x we have E(++1)= ElY xia-gxtI (E[Y++1-a'xi xla-g Therefore we have E(Yu+1-g'x 2=EY+1-a'xt)12+Elaxt-gx 12 The second term on the right hand side of (3) cannot be made smaller than zero and the first term does not depend on gx. The function gx t that can makes
1.2 Forecasts Based on Linear Projection Suppose we now consider only the class of forecast that Yt+1 is a linear function of xt : Y ∗ t+1|t = α 0xt . Definition: The forecast α0xt is called the linear projection of Yt+1 on xt if the forecast error (Yt+1 − α0xt) is uncorrelated with xt : E[(Yt+1 − α 0xt)x 0 t ] = 0 0 . (2) Theorem: The linear projection produces the smallest mean squared error among the class of linear forecasting rule. Proof: Let g 0xt be any arbitrary linear forecasting function of Yt+1. Then the MSE associated with g 0xt would be E[Yt+1 − g 0xt ] 2 = E[Yt+1 − α 0xt + α 0xt − g 0xt ] 2 = E[Yt+1 − α 0xt ] 2 +2E{[Yt+1 − α 0xt ][α 0xt − g 0xt ]} +E[α 0xt − g 0xt ] 2 . Denote ηt+1 ≡ E{[Yt+1 − α0xt ][α0xt − g 0xt ]} we have E(ηt+1) = E{[Yt+1 − α 0xt ][α 0 − g 0 ]xt} = (E[Yt+1 − α 0xt ]x 0 t )[α − g] = 0 0 [α − g] = 0 0 . Therefore we have E[Yt+1 − g 0xt ] 2 = E[Yt+1 − α 0xt)]2 + E[α 0xt − g 0xt ] 2 . (3) The second term on the right hand side of (3) cannot be made smaller than zero and the first term does not depend on g 0xt . The function g 0xt that can makes 3
the mean square error 3) as small as possible is the function that sets the second term in (3)to zero gx,=ax, The MSE of this optimal forecast is EY g x E a X For axt is a linear projection of Yt+1 on xt, we will use the notation P(Y++1xt) to indicate the linear projection of Yt+1 on xt. Notice that MSEIP(Y++1x I> MSEE(Y++1xt) since the conditional expectation offers the best possible forecast For most applications a constant term will be included in the projection. We will use the symbol e to indicate a linear projection on a vector of random variables xt along a constant term E(Y++1lx=P(Yt+lll, x)
the mean square error (3) as small as possible is the function that sets the second term in (3) to zero: g 0xt = α 0xt . The MSE of this optimal forecast is E[Yt+1 − g 0xt ] 2 = E[Yt+1 − α 0xt ] 2 . For α0xt is a linear projection of Yt+1 on xt , we will use the notation Pˆ(Yt+1|xt) = α 0xt , to indicate the linear projection of Yt+1 on xt . Notice that MSE[Pˆ(Yt+1|xt)] ≥ MSE[E(Yt+1|xt)], since the conditional expectation offers the best possible forecast. For most applications a constant term will be included in the projection. We will use the symbol Eˆ to indicate a linear projection on a vector of random variables xt along a constant term: Eˆ(Yt+1|xt) ≡ Pˆ(Yt+1|1, xt ). 4
2 Forecasts based on an infinite Number of ob servation Recall that a general stationary and invertible ARMA(p, g) process is written in O(L(Y-u)=a(LEt where O(L)=1-o1L-o2L Pp LP, 6(L)=1+01L+82L2+.+0qL9 and all the roots of (L)=0 and 0(L)=0 lie outside the unit 2.1 Forecasting Based on Lagged E's, MA(o)form Consider an MA(o) form of (4) PP(L with Et white noise and D)=B)o-(L)=∑yD Suppose that we have an infinite number of observations on e through date t that is Et, Et-1, Et-2,, and further know the value of u and (+1, 2, . Say we want to forecast the value of Yi+s from now. Note that(5) implies u+Et+s + p1 +…+ys-1Et+1+y The best linear forecast takes the form EC
2 Forecasts Based on an Infinite Number of Observation Recall that a general stationary and invertible ARMA(p, q) process is written in this form: φ(L)(Yt − µ) = θ(L)εt , (4) where φ(L) = 1 − φ1L − φ2L 2 − ... − φpL p , θ(L) = 1 + θ1L + θ2L 2 + ... + θqL q and all the roots of φ(L) = 0 and θ(L) = 0 lie outside the unit circle. 2.1 Forecasting Based on Lagged ε 0 s, MA(∞) form Consider an MA(∞) form of (4): Yt − µ = ϕ(L)εt (5) with εt white noise and ϕ(L) = θ(L)φ −1 (L) = X∞ j=0 ϕjL j , ϕ0 = 1, X∞ j=0 |ϕj | < ∞. Suppose that we have an infinite number of observations on ε through date t, that is {εt , εt−1, εt−2, ....}, and further know the value of µ and {ϕ1, ϕ2, ...}. Say we want to forecast the value of Yt+s from now. Note that (5) implies Yt+s = µ + εt+s + ϕ1εt+s−1 + ... + ϕs−1εt+1 + ϕsεt +ϕs+1εt−1 + .... The best linear forecast takes the form Eˆ(Yt+s|εt , εt−1, ...) = µ + ϕsεt + ϕs+1εt−1 + .... (6) = [µ, ϕs, ϕs+1, ...][1, εt , εt−1, ...] 0 (7) = α 0xt . (8) 5