《数字导航技术》课程教学资源（书籍文献）Optimally Robust Kalman Filtering.pdf_P11-P15

General setup Time Discrete, Hidden Markov Models: Our results will in parts be valid in an even more general time-discrete setting which also covers Hidden Markov Models: we start with P(X0 ∈ A) = Z A p X0 0 (x) µ0(dx) (2.7) and assume that the unobservable state evolves according to a Markov transition: P(Xt ∈ A|Xt−1 = xt−1, . . . , X0 = x0) = P(Xt ∈ A|Xt−1 = xt−1) = = R A p Xt|Xt−1=xt−1 t (x) µt(dx), (2.8) Again we only have a transformation Yt of Xt available which in this case is distributed according to P(Yt ∈ B|Xt = xt) = Z B q Yt|Xt=xt t (y) νt(dy) (2.9) In this setting, we assume known (and existing) [conditional] densities p X0 0 , p Xt|Xt−1=xt−1 t , q Yt|Xt=xt t . Somewhere in-between the model formulation of this paragraph and the Euclidean SSM you may range the dynamic (generalized) linear models as discussed in West et al. (1985) and West and Harrison (1989). These are also covered by Theorem 3.3 as soon as in the state space a squared error makes sense. Continuous setting: In applications of Mathematical Finance we also need to cover continuous time settings as given by an unobservable state evolving according to an SDE dXt = f(t, Xt) dt + q(t, Xt) dWt (2.10) and where for consistency, we observe Yt according to dYt = z(t, Xt) dt + v(t) dW0 t , Y0 = 0 (2.11) For X0 we assume (2.7), while Wt , W0 t , are independent Wiener processes, and f, q, z, v are suitably measurable, known functions. This formulation with a time-continuous observation process as in (2.11) may be found in Tang (1998) and James (2005). More often, however, observations will be made discretely, so that a formulation like the one of Nielsen et al. (2000) and Singer (2002) is more adequate, i.e.; for discrete times t1 < . . . < tN we have observations Ytk = ztk (Xtk ) + εtk (2.12) In this context, a straightforward approach linearizes the corresponding functions f and z to give the (continuous-discrete) Extended Kalman Filter (EKF), or, improved to second order moment fitting 6

General setupinthesecondordernonlinearfilter(SNF)introducedinJazwinski(1970),alsoconferSinger(2002sec. 4.3.1). After this linearization we are again in the context of a (time-inhomogeneous) linearssM, hence the methodology we develop in the sequel applies to this setting as well.More recently,approaches to improve on this simple linearization have been introduced, notably theunscentedKalmanfilter(UKF)(Julieretal.,2000)andHermiteexpansionsasinAit-Sahalia(2002)We do not cover them here, though.Fora survey of thesemethods, confer Singer (2002, sec.4.3).For techniquestodeal with non-linear time-discrete situations, see Tanizaki (1996).Control: Going one more step ahead, to cover applications such as optimal portfolio selection,wemayallowforcontrolsUttobesetordeterminedbythestatistician,andwhicharefedbackinthe state equations. In the context of the continuous time model from (2.10) and (2.12), this is alsoknownasSDEX,conferNielsenetal.(200O).In this setting, the controls U, are assumed measurable w.r.t. o(Y,ls < t) or usually even measurableW.r.t. α(Yt-).To integrate these controls into our setting, we just have to generalize functions f, z, q and densitiesptl, qil to f = f(t, Xt, U.) (and z,q likewise) and modify p:l = pxilx-1=at-1,l=ut-1(r), and1: = rlX=a,Ui-i=u-1(y).aFor the application of stochastic control to portfolio optimization, confer Korn (1997)2.2DeviationsfromtheidealmodelAs usualwithRobust Statisticswedonotconfineourselvesto idealmodel assumptionsbutratherallowfor (small)deviationsfromtheseassumptions,mostprominentlygeneratedbyoutliers.Inournotation,sub/superscript id denotestheideal setting,di thedistorting(contaminating)situation,retherealistic,contaminatedsituationContrary to the independent setting, outlier may occur in quite different manors: Following theterminologyofFox(1972),wedistinguishinnovationoutliers(orlO's)andadditiveoutliers (orAO's)Historically, AO's denote gross errors affecting the observation errors, i.e.,(2.13)AO : et~ (1-TAO)C(ed)+rAoC(e)where C(edi) is arbitrary, unknown and uncontrollable and 0 ≤ rao ≤1 is the AO-contaminationradius, i.e.; the probability for an AO.IO's on the other hand are usually defined as outliers which affect the innovations,(2.14)I : ute~(1-rro)C(ud) +rroC(ut)7

General setup in the second order nonlinear filter (SNF) introduced in Jazwinski (1970), also confer Singer (2002, sec. 4.3.1). After this linearization we are again in the context of a (time-inhomogeneous) linear SSM, hence the methodology we develop in the sequel applies to this setting as well. More recently, approaches to improve on this simple linearization have been introduced, notably the unscented Kalman filter (UKF) (Julier et al., 2000) and Hermite expansions as in Aït-Sahalia (2002). We do not cover them here, though. For a survey of these methods, confer Singer (2002, sec. 4.3). For techniques to deal with non-linear time-discrete situations, see Tanizaki (1996). Control: Going one more step ahead, to cover applications such as optimal portfolio selection, we may allow for controls Ut to be set or determined by the statistician, and which are fed back in the state equations. In the context of the continuous time model from (2.10) and (2.12), this is also known as SDEX, confer Nielsen et al. (2000). In this setting, the controls Ut are assumed measurable w.r.t. σ(Ys|s < t) or usually even measurable w.r.t. σ(Yt−). To integrate these controls into our setting, we just have to generalize functions f, z, q and densities p · | · t , q · | · t to f = f(t, Xt , Ut) (and z,q likewise) and modify p · | · t = p Xt|Xt−1=xt−1,Ut−1=ut−1 t (x), and q · | · t = q Yt|Xt=xt,Ut−1=ut−1 t (y). For the application of stochastic control to portfolio optimization, confer Korn (1997). 2.2 Deviations from the ideal model As usual with Robust Statistics we do not confine ourselves to ideal model assumptions but rather allow for (small) deviations from these assumptions, most prominently generated by outliers. In our notation, sub/superscript id denotes the ideal setting, di the distorting (contaminating) situation, re the realistic, contaminated situation. Contrary to the independent setting, outlier may occur in quite different manors: Following the terminology of Fox (1972), we distinguish innovation outliers (or IO’s) and additive outliers (or AO’s). Historically, AO’s denote gross errors affecting the observation errors, i.e., AO :: ε re t ∼ (1 − rAO)L(ε id t ) + rAOL(ε di t ) (2.13) where L(ε di t ) is arbitrary, unknown and uncontrollable and 0 ≤ rAO ≤ 1 is the AO-contamination radius, i.e.; the probability for an AO. IO’s on the other hand are usually defined as outliers which affect the innovations, IO :: v re t ∼ (1 − rIO)L(v id t ) + rIOL(v di t ) (2.14) 7

General setupwhere again C(ut) is arbitrary, unknown and uncontrollable and 0 ≤ ro ≤1 is the correspondingIO-contaminationradius.We stick to this distinction for consistency with the literature, although we will rather use theseterms inthefollowingsense:lO's denoteendogenous outliersaffectingthestateequation ingeneral, hence distorting several subsequent states.This also covers level shifts or linear trends;ifFt|<1 theseare not included in the classical definition,as then iO's would then decay geometrically in t. We also extend the meaning of AO's to denote general exogenous outliers which entethe observation equation only and thus only cause distortions at single time points. This also coverssubstitutive outliers orSO'sdefined as(2. 15)SO : Yre ~ (1 -rso)C(Ytd)+rsoC(Yd)where again C(Yd") is arbitrary, unknown and uncontrollable and o ≤ rso ≤1 is the correspondingSO-contaminationradius.Apparently, the SO-ball of radius r consisting of all C(Yre) according to (2.15) contains the corre-sponding Ao-ball of the same radius when Yr = ZtXt + ete. However, for technical reasons, wemaketheadditionalassumptionthat(2.16)Yid,Ydistochastically independentandthenthe"contains"-relationno longerholds.The more general definition of AO's and IO's in the sequel will be labeled "wide-sense" to distin-guish it from the "narrow-sense" definitions (2.13) and (2.14).Remark 2.1. Whether (narrow-sense) AO's or SO's are better suited to capture model deviationswill dependontheactualapplication;seenfrommathematicaloperability,clearlysO'sareeasiertotreat, compare Remark 3.4(b).They will also lead to different least favorable situations, compareRemark 3.4(d).Different and competing goals are induced by endogenous and exogenous outliers: In thepresence of (wide-sense)AO's we would like to attenuate their effect to avoid"false alarms",whilewhenthereare (wide-sense)iO'sthe usual goal in onlineapplications wouldbetracking,i.e.; detectstructural changes as fast as possible and/or react on the changed situation.Obviouslywearefaced withanidentification problem here:Immediately after a suspicious observation we cannot tell (wide-sense) AO's from (wide-sense) IO'sSuch a simultaneous treatment will only be possible with a certain delay -see section 5.Inother,moreoff-linesituations,suchasspectralanalysisof lowflowestimationorinter-individuaheart frequency spectra, one would like to recover the situation without structural changes and8

General setup where again L(v di t ) is arbitrary, unknown and uncontrollable and 0 ≤ rIO ≤ 1 is the corresponding IO-contamination radius. We stick to this distinction for consistency with the literature, although we will rather use these terms in the following sense: IO’s denote endogenous outliers affecting the state equation in general, hence distorting several subsequent states. This also covers level shifts or linear trends; if |Ft | < 1 these are not included in the classical definition, as then IO’s would then decay geometrically in t. We also extend the meaning of AO’s to denote general exogenous outliers which enter the observation equation only and thus only cause distortions at single time points. This also covers substitutive outliers or SO’s defined as SO :: Y re t ∼ (1 − rSO)L(Y id t ) + rSOL(Y di t ) (2.15) where again L(Y di t ) is arbitrary, unknown and uncontrollable and 0 ≤ rSO ≤ 1 is the corresponding SO-contamination radius. Apparently, the SO-ball of radius r consisting of all L(Y re t ) according to (2.15) contains the corresponding AO-ball of the same radius when Y re t = ZtXt + ε re t . However, for technical reasons, we make the additional assumption that Y id t , Y di t stochastically independent (2.16) and then the “contains”-relation no longer holds. The more general definition of AO’s and IO’s in the sequel will be labeled “wide-sense” to distinguish it from the “narrow-sense” definitions (2.13) and (2.14). Remark 2.1. Whether (narrow-sense) AO’s or SO’s are better suited to capture model deviations will depend on the actual application; seen from mathematical operability, clearly SO’s are easier to treat, compare Remark 3.4(b). They will also lead to different least favorable situations, compare Remark 3.4(d). Different and competing goals are induced by endogenous and exogenous outliers: In the presence of (wide-sense) AO’s we would like to attenuate their effect to avoid “false alarms”, while when there are (wide-sense) IO’s the usual goal in online applications would be tracking, i.e.; detect structural changes as fast as possible and/or react on the changed situation. Obviously we are faced with an identification problem here: Immediately after a suspicious observation we cannot tell (wide-sense) AO’s from (wide-sense) IO’s. Such a simultaneous treatment will only be possible with a certain delay —see section 5. In other, more off-line situations, such as spectral analysis of low flow estimation or inter-individual heart frequency spectra, one would like to recover the situation without structural changes and 8

General setupDependingon thehorizon s of theobservations usedto reconstruct Xt,we speak of a predictiorproblem for s < t, of a filtering problem if s = t and of a smoothing problem if s > t. In the sequelwe will confine ourselves to the filtering problem.Kalman-Filter It is well-known that the general solution to (2.19) is the corresponding condi-tional expectation E[X:|Yi:s]. Except for the Gaussian case, this exact conditional expectation however is rather expensiveto to compute.Hence similar to the Gauss-Markov setting it is a naturalrestriction to confine oneself to linear filters. In this context, the seminal work of Kalman (1960)(discrete-time setting)andKalmanand Bucy(1961)(continuous-time setting)introducedarecursivescheme to compute this optimal linear filter:Initialization:(2.20)Xoo = ao,Zo0 = QoPrediction:(2.21)Ztt-1 = FZt-1t-1FT + QtXit-1 = FXt1|t-1,Correction:Xtt=Xt/t-1+MAYt,AYt =Yt-Ztatlt-1,M = Zt-1ztA-,Zt = (Ip - M Zt)Ztt-1,(2.22)At = ZtZtlt-1ZT + Vtwhere Et/t = Cov(Xt - Xtit), Ett-1 = Cov(Xt - Xtit-1), and M0 is the so-called Kalman gain.Using orthogonality of [△Ytt we may setup similar recursions for the corresponding best linearsmoother;see,e.g.AndersonandMoore(1979),DurbinandKoopman(2001)OptimalityoftheKalman-FilterToseethatthe(classical)Kalmanfiltersolvesproblem(2.19)(for s = t) among all linear filters, let us write(2.23)lin(x) := closed linear space generated by X(2.24)oP(X) := orthogonal projection onto lin(X)and define (recursively)(2.25)AYt = Yt - oP(YYi:t-1)Hence the AYt are mutually orthogonal and(2.26)Xt/t-1 = oP(Xt/Y1:t-1) = FtoP(Xt-1|Yi:t-1) = FtXt-1t-1Xtt = oP(Xt/Y1:t) = oP(Xt/Yi:t-1) +oP(Xt|AY) =(2.27)=Xtt-1+oP(Xt-Xtt-i/AYt)=Xtt-1+M△YtFor later purposes,we also introduce a symbol for the prediction error(2.28)△Xt = Xt - Xtlt-1Similar to the Gauss-Markov Theorem, under normality, i.e.; assuming (2.3), (2.4), (2.5), this op-timality extends as follows: Xtt-1] = E[Xi|Yi:t[-1], i.e. the Kalman filter is optimal among allYi:t(-1)-measurable filters. It also is the posterior mode of C(X:/Yi:t) and Xtit can also be seen tobe the ML estimator for a regression model with random parameter; for the last property, compareDuncanandHorn(1972)10

General setup Depending on the horizon s of the observations used to reconstruct Xt , we speak of a prediction problem for s < t, of a filtering problem if s = t and of a smoothing problem if s > t. In the sequel we will confine ourselves to the filtering problem. Kalman–Filter It is well-known that the general solution to (2.19) is the corresponding conditional expectation E[Xt |Y1:s]. Except for the Gaussian case, this exact conditional expectation however is rather expensive to to compute. Hence similar to the Gauss-Markov setting it is a natural restriction to confine oneself to linear filters. In this context, the seminal work of Kalman (1960) (discrete-time setting) and Kalman and Bucy (1961) (continuous-time setting) introduced a recursive scheme to compute this optimal linear filter: Initialization: X0|0 = a0, Σ0|0 = Q0 (2.20) Prediction: Xt|t−1 = FtXt−1|t−1 , Σt|t−1 = FtΣt−1|t−1F τ t + Qt (2.21) Correction: Xt|t = Xt|t−1 + M0 t ∆Yt , ∆Yt = Yt − Ztxt|t−1 , M0 t = Σt|t−1Z τ t ∆ −1 t , Σt|t = (Ip − M0 t Zt)Σt|t−1 , ∆t = ZtΣt|t−1Z τ t + Vt (2.22) where Σt|t = Cov(Xt − Xt|t ), Σt|t−1 = Cov(Xt − Xt|t−1 ), and M0 t is the so-called Kalman gain. Using orthogonality of {∆Yt}t we may setup similar recursions for the corresponding best linear smoother; see, e.g. Anderson and Moore (1979), Durbin and Koopman (2001). Optimality of the Kalman–Filter To see that the (classical) Kalman filter solves problem (2.19) (for s = t) among all linear filters, let us write lin(X) := closed linear space generated by X (2.23) oP(·|X) := orthogonal projection onto lin(X) (2.24) and define (recursively) ∆Yt = Yt − oP(Yt |Y1:t−1) (2.25) Hence the ∆Yt are mutually orthogonal and Xt|t−1 = oP(Xt |Y1:t−1) = Ft oP(Xt−1|Y1:t−1) = FtXt−1|t−1 (2.26) Xt|t = oP(Xt |Y1:t) = oP(Xt |Y1:t−1) + oP(Xt |∆Yt) = = Xt|t−1 + oP(Xt − Xt|t−1 |∆Yt) = Xt|t−1 + M0 t ∆Yt (2.27) For later purposes, we also introduce a symbol for the prediction error ∆Xt = Xt − Xt|t−1 . (2.28) Similar to the Gauss-Markov Theorem, under normality, i.e.; assuming (2.3), (2.4), (2.5), this optimality extends as follows: Xt|t[−1] = E[Xt |Y1:t[−1]], i.e. the Kalman filter is optimal among all Y1:t[−1]-measurable filters. It also is the posterior mode of L(Xt |Y1:t) and Xt|t can also be seen to be the ML estimator for a regression model with random parameter; for the last property, compare Duncan and Horn (1972). 10