Ch. 17 Maximum likelihood estimation e identica ation process having led to a tentative formulation for the model, we then need to obtain efficient estimates of the parameters. After the parameters have been estimated, the fitted model will be subjected to diagnostic checks This chapter contains a general account of likelihood method for estimation of the parameters in the stochastic model Consider an ARMA (from model identification) model of the form Y=c+φ1Yt-1+2Yt-2+…+qYt-p+et+61et-1 62t-2+…+6et-9 with Et white E(t)=0 E(EtEr) g- for t=T 0 otherwise This chapter explores how to estimate th ne value (c,q,…,p,6…,,a2) on the basis of observations on y The primary principle on which estimation will be based is macimum likelihood estimation. Let 8=(c,o1,.p, 01, , q, 0) denote the vector of population parameters. Suppose we have observed a sample of size T(y1, 92,.,r). The approach will be to calculate the joint probability density f,r11(ym,y-1,…,;) which might loosely be viewed as the probability of having observed this particular sample. The maximum likelihood estimate(MLE)of 8 is the value for which this sample is most likely to have been observed; that is, it is the value of 0 that Maximizes This approach requires specifying a particular distribution for the white noise process Et. Typically we will assume that Et is gaussian white noise
Ch. 17 Maximum Likelihood Estimation The identification process having led to a tentative formulation for the model, we then need to obtain efficient estimates of the parameters. After the parameters have been estimated, the fitted model will be subjected to diagnostic checks . This chapter contains a general account of likelihood method for estimation of the parameters in the stochastic model. Consider an ARMA (from model identification) model of the form Yt = c + φ1Yt−1 + φ2Yt−2 + ... + φpYt−p + εt + θ1εt−1 +θ2εt−2 + ... + θqεt−q, with εt white noise: E(εt) = 0 E(εtετ ) = σ 2 for t = τ 0 otherwise . This chapter explores how to estimate the value of (c, φ1, ..., φp, θ1, ..., θq, σ 2 ) on the basis of observations on Y . The primary principle on which estimation will be based is maximum likelihood estimation. Let θ = (c, φ1, ..., φp, θ1, ..., θq, σ 2 ) 0 denote the vector of population parameters. Suppose we have observed a sample of size T (y1, y2, ..., yT ). The approach will be to calculate the joint probability density fYT ,YT −1,...,Y1 (yT , yT −1, ..., y1; θ), (1) which might loosely be viewed as the probability of having observed this particular sample. The maximum likelihood estimate (MLE) of θ is the value for which this sample is most likely to have been observed; that is, it is the value of θ that maximizes (1). This approach requires specifying a particular distribution for the white noise process εt . Typically we will assume that εt is gaussian white noise: εt ∼ i.i.d. N(0, σ 2 ). 1
1 MLE of a Gaussian AR(1)Process 1.1 Evaluating the Likelihood Function Using(Scalar)Con ditional Density A stationary Gaussian AR(1) process takes the form Yt=c+oYt-1+ with Et w i.i.d. N(0, o2)and | l 1(How do you know at this stage ). For this e=(c, Consider the p.d. f of Y1, the first observations in the sample. This is a random variable with mean and variance E(Y1)= Var(ri) Since Et]oo_oo is Gaussian, Yi is also Gaussian. Hence, f1(m;0)=f1(i;c,,a2) v2r√a2/(1-a2 2 Next consider the distribution of the second observation y conditional on the serving Yi=y1. From(2) Y2=c+Y1+E2 Conditional on Yi= y1 means treating the random variable Y1 as if it were the deterministic constant y1. For this case, (3)gives Y2 as the constantc+ oy1) plus the N(0, a2)variable E2. Hence (Y2Y1=y)~N(c+om),o2), meaning that f1(y29y;e) The joint density of observations 1 and 2 is then just f2n(y2,1;6)=fy(y2lyn;)fn(y1;6)
1 MLE of a Gaussian AR(1) Process 1.1 Evaluating the Likelihood Function Using (Scalar) Conditional Density A stationary Gaussian AR(1) process takes the form Yt = c + φYt−1 + εt , (2) with εt ∼ i.i.d. N(0, σ 2 ) and |φ| < 1 (How do you know at this stage ?). For this case, θ = (c, φ, σ 2 ) 0 . Consider the p.d.f of Y1, the first observations in the sample. This is a random variable with mean and variance E(Y1) = µ = c 1 − φ and V ar(Y1) = σ 2 1 − φ2 . Since {εt} ∞ t=−∞ is Gaussian, Y1 is also Gaussian. Hence, fY1 (y1; θ) = fY1 (y1; c, φ, σ 2 ) = 1 √ 2π p σ 2/(1 − φ 2 ) exp − 1 2 · {y1 − [c/(1 − φ)]} 2 σ 2/(1 − φ 2 ) . Next consider the distribution of the second observation Y2 conditional on the observing Y1 = y1. From (2), Y2 = c + φY1 + ε2. (3) Conditional on Y1 = y1 means treating the random variable Y1 as if it were the deterministic constant y1. For this case, (3) gives Y2 as the constant (c + φy1) plus the N(0, σ 2 ) variable ε2. Hence, (Y2|Y1 = y1) ∼ N((c + φy1), σ 2 ), meaning that fY2|Y1 (y2|y1; θ) = 1 √ 2πσ 2 exp − 1 2 · (y2 − c − φy1) 2 σ 2 . The joint density of observations 1 and 2 is then just fY2,Y1 (y2, y1; θ) = fY2|Y1 (y2|y1; θ)fY1 (y1; θ). 2
Similarly, the distribution of the third observation conditional on the first two is fralY2, Y(33ly2, 91; 8) 2 form which faY21(3,y2,1;6)=fya2(yay,g1;e)fy2,1(y2,1;6) frslYa, i(y3l32, 11: 0)fraly (y2ly1: 8)fYi(91: 0) Yt-1 matter for Yt only through the value Yt-1, and the density of observation t conditional on the preceding t-l observa- given by fy 8 tY ogt-1)2 2 The likelihood of the complete sample can thus be calculated as -1=2-(0m,yx-,yx2…,:日)=/1(;)·Ⅱx=-:0).(4) The log likelihood function(denoted C(o)) is theref C(O)=lg1(0n;0)+∑og/xx-(y-;0) The log likelihood for a sample of size T from a Gaussian AR(1) process is seen to b C(6) g(2)-3log2/(1-92) {1-(c/(1-o)}2 (T-1)/2log(27)-(T-1)/21lg2)-∑
Similarly, the distribution of the third observation conditional on the first two is fY3|Y2,Y1 (y3|y2, y1; θ) = 1 √ 2πσ 2 exp − 1 2 · (y3 − c − φy2) 2 σ 2 form which fY3,Y2,Y1 (y3, y2, y1; θ) = fY3|Y2,Y1 (y3|y2, y1; θ)fY2,Y1 (y2, y1; θ) = fY3|Y2,Y1 (y3|y2, y1; θ)fY2|Y1 (y2|y1; θ)fY1 (y1; θ). In general, the value of Y1, Y2, ..., Yt−1 matter for Yt only through the value Yt−1, and the density of observation t conditional on the preceding t − 1 observations is given by fYt|Yt−1,Yt−2,...,Y1 (yt |yt−1, yt−2, ..., y1; θ) = fYt|Yt−1 (yt |yt−1; θ) = 1 √ 2πσ 2 exp − 1 2 · (yt − c − φyt−1) 2 σ 2 . The likelihood of the complete sample can thus be calculated as fYT ,YT −1,YT −2,...,Y1 (yT , yT −1, yT −2, ..., y1; θ) = fY1 (y1; θ) · Y T t=2 fYt|Yt−1 (yt |yt−1; θ). (4) The log likelihood function (denoted L(θ)) is therefore L(θ) = log fY1 (y1; θ) + X T t=2 log fYt|Yt−1 (yt |yt−1; θ). (5) The log likelihood for a sample of size T from a Gaussian AR(1) process is seen to be L(θ) = − 1 2 log(2π) − 1 2 log[σ 2 /(1 − φ 2 )] − {y1 − [c/(1 − φ)]} 2 2σ 2/(1 − φ 2 ) −[(T − 1)/2] log(2π) − [(T − 1)/2] log(σ 2 ) − X T t=2 (yt − c − φyt−1) 2 2σ 2 . (6) 3
1. 2 Evaluating the Likelihood Function Using(Vector) Joint Density a different description of the likelihood function for a sample of size T from a Gaussian AR(1) process is some time useful. Collect the full set of observations ina(T×1) vector, y≡(Y1,Y2,…,YT) The mean of this(Tx 1) vector is E(Y1) E(Y2) E(y) E(Yr) where u=c/(1-o. The variance-covariance of y is 92=E(y-p(y-p/=a-1 where (1-2) The sample likelihood function is therefore the multivariate Gaussian density y(y:)=(2n)-TP2-12ep|-2(y-p)(g21(y-p) with log likelihood C(0)=(-T/2)log(2)+log|9--(y-1)2-1(y-1) (6)and(7) must represent the identical likelihood function
1.2 Evaluating the Likelihood Function Using (Vector) Joint Density A different description of the likelihood function for a sample of size T from a Gaussian AR(1) process is some time useful. Collect the full set of observations in a (T × 1) vector, y ≡ (Y1, Y2, ..., YT ) 0 . The mean of this (T × 1) vector is E(y) = E(Y1) E(Y2) . . . E(YT ) = µ µ . . . µ = µ, where µ = c/(1 − φ). The variance -covariance of y is Ω = E[(y − µ)(y − µ) 0 ] = σ 2 1 (1 − φ 2 ) 1 φ . . . φ T −1 φ 1 φ . . φ T −2 . . . . . . . . . . . . . . . . . . φ T −1 . . . . 1 = σ 2V where V = 1 (1 − φ 2 ) 1 φ . . . φ T −1 φ 1 φ . . φ T −2 . . . . . . . . . . . . . . . . . . φ T −1 . . . . 1 . The sample likelihood function is therefore the multivariate Gaussian density: fY(y; θ) = (2π) −T/2 |Ω −1 | 1/2 exp − 1 2 (y − µ) 0Ω −1 (y − µ) , with log likelihood L(θ) = (−T/2)log(2π) + 1 2 log |Ω −1 | − 1 2 (y − µ) 0Ω −1 (y − µ). (7) (6) and (7) must represent the identical likelihood function. 4
It is easy to verify by direct multiplication that L'L=V-I, with 1-20 10 10 000 Then(7)becomes C(0=(T/2)log(2)+olog oll-oy-uoLl(y-u).(8) Define theT x 1)vector y to be y L(y-A) 1-020 0 10 0 0 0 T-H 1-02(Y1-1) (Y2-1)-0(Y1-p) o(Y2-F (Y-p)-(Yr-1- 1-02 Y oY The last term in(8)can thus be written 2ty-wo-LLly-w)=2 y'y =20|(1-02)1-/(1-)2+台2 1)2
It is easy to verify by direct multiplication that L 0L = V−1 , with L = p 1 − φ 2 0 . . . 0 −φ 1 0 . . 0 0 −φ 1 0 . 0 . . . . . . . . . . . . 0 0 . . −φ 1 . Then (7) becomes L(θ) = (−T/2)log(2π) + 1 2 log |σ −2L 0L| − 1 2 (y − µ) 0σ −2L 0L(y − µ). (8) Define the (T × 1) vector y˜ to be y˜ ≡ L(y − µ) = p 1 − φ 2 0 . . . 0 −φ 1 0 . . 0 0 −φ 1 0 . 0 . . . . . . . . . . . . 0 0 . . −φ 1 Y1 − µ Y2 − µ Y3 − µ . . YT − µ = p 1 − φ 2 (Y1 − µ) (Y2 − µ) − φ(Y1 − µ) (Y3 − µ) − φ(Y2 − µ) . . (YT − µ) − φ(YT −1 − µ) = p 1 − φ 2 [Y1 − c/(1 − φ)] Y2 − c − φY1 Y3 − c − φY2 . . YT − c − φYT −1 . The last term in (8) can thus be written 1 2 (y − µ) 0σ −2L 0L(y − µ) = 1 2σ 2 y˜ 0y˜ = 1 2σ 2 (1 − φ 2 )[Y1 − c/(1 − φ)]2 + 1 2σ 2 X T t=2 (Yt − c − φYt−1) 2 . 5