The rLS as optimally robust filterFeaturesof theKalman-FilterThe Kalman filter standsoutforitseasyandunderstandablestructure:We have an initialization, a prediction, and a correction step, all steps are linear, hence easy evalu-able and interpretable.Dueto the strict recursivity/Markovian structureof thestateequation,allinformation from the past useful for the future may be captured in the value of Xtit-1, so there isonlyvery limited memory needed.From a Robustness point of view, this linearity at the same time is a weakness of this filter yentersunboundedintothecorrectionstepwhichhenceispronetooutliersA good robustification of this approach would try to retain as much as possible from these positiveproperties of the Kalman filter while revising the unboundedness in the correction step.3The rLS as optimally robust filter3.1Definitionrobustifying recursive Least Squares: rLsIn a first step we limit ourselves to (wide-sense) AO'sNotationally,whereclearfrom the context, we suppressthetime indext.As no (new) observations enter the initialization and prediction steps, these steps may be left un-changed. In the correction step, we will have to modify the orthogonal projection oP(△X|△Y)present in (2.27). Suggested by H.Rieder and worked out in Ruckdeschel (2001, ch.2), the follow-ing robustificationof the correction step is straightforward: Instead of M△Y we usea Huberiza-tion of this correction(3.1)H(MAY)=MYmin(1,b/MY)forsomesuitablychosenclippingheightb.Apparently,thisproposalremovestheunboundednessproblem of the classical Kalman filter while still remaining reasonably simple, in particular this mod-ification is non iterative, hence especiallyuseful for online-purposes.However it should be noted that, departing from the Kalman filter and at the same time insistingonstrictrecursivity,wepossiblyexclude"better"non-recursiveprocedures,compareRemark3.6.Theseprocedures on theotherhandwouldbemuchmoreexpensivetocompute.Remark 3.1.in expressionMoY denotes the Euclidean norm of R; instead, however youcouldalso useother norms likea Mahalanobis typenorm.Withrespect toTheorem3.3, optimalityis preserved when instead of the Euclidean norm used in the MsE, you use the correspondingalternativenorm.11
The rLS as optimally robust filter Features of the Kalman–Filter The Kalman filter stands out for its easy and understandable structure: We have an initialization, a prediction, and a correction step, all steps are linear, hence easy evaluable and interpretable. Due to the strict recursivity / Markovian structure of the state equation, all information from the past useful for the future may be captured in the value of Xt|t−1 , so there is only very limited memory needed. From a Robustness point of view, this linearity at the same time is a weakness of this filter — y enters unbounded into the correction step which hence is prone to outliers. A good robustification of this approach would try to retain as much as possible from these positive properties of the Kalman filter while revising the unboundedness in the correction step. 3 The rLS as optimally robust filter 3.1 Definition robustifying recursive Least Squares: rLS In a first step we limit ourselves to (wide-sense) AO’s. Notationally, where clear from the context, we suppress the time index t. As no (new) observations enter the initialization and prediction steps, these steps may be left unchanged. In the correction step, we will have to modify the orthogonal projection oP(∆X|∆Y ) present in (2.27). Suggested by H. Rieder and worked out in Ruckdeschel (2001, ch. 2), the following robustification of the correction step is straightforward: Instead of M0∆Y we use a Huberization of this correction Hb(M0∆Y ) = M0∆Y min{1, b/ M0∆Y } (3.1) for some suitably chosen clipping height b. Apparently, this proposal removes the unboundedness problem of the classical Kalman filter while still remaining reasonably simple, in particular this modification is non iterative, hence especially useful for online-purposes. However it should be noted that, departing from the Kalman filter and at the same time insisting on strict recursivity, we possibly exclude “better” non-recursive procedures, compare Remark 3.6. These procedures on the other hand would be much more expensive to compute. Remark 3.1. · in expression M0∆Y denotes the Euclidean norm of R q ; instead, however you could also use other norms like a Mahalanobis type norm. With respect to Theorem 3.3, optimality is preserved when instead of the Euclidean norm used in the MSE, you use the corresponding alternative norm. 11
The rLS as optimally robust filterChoiceoftheclippingheightbAstothechoiceoftheclippingheightb,wemakethesimpli-fying assumption that the conditional expectation Ea[△X]AYj is linear, which will turn out to onlybe approximately right. In this setting, we have two proposals:The first one is an Anscombe insurance criterium. To given "insurance premium" to be paidin terms of loss of efficiency in the ideal model compared to the optimal procedure in this (ideal)setting,i.e.; the classical Kalman filter,we chooseb=b()such thatEa|AX -He(MY)?= (1 +)Ea|AX -MY?(3.2)The other possibility will become clearer in the next section: To a given size of the (SO-) neighbor-hood uso(r) specified by a radius r e [0, 1], we determine b = b(r) such that(1 - r)Ea(IM°△Y|-b)+=rb(3.3)If this radius is unknown, we could follow the idea worked out in Rieder et al. (2008), that is,distinguish a least favorable radius ro defined in the following expressions(3.4)ro = argminse[0.ijPo(s),po(s) = max, p(r, s),maxuso() MSE(rLS(b(s)(3.5)p(r,s) =:maxuso(r) MSE(rLS(b(r))and use the corresponding b(ro).If we have limited knowledge about r, say r e [ri,rul, o < ri < ru < 1, we would restrict thevariation range of s and r in the respective optimization problems correspondingly.To this end, defineA, =EiatrCova[AX|Ayid] +(IMyid-b(r)?(3.6)B, = Ea[Mp?-(IM[-b(r)|+b(r)(3.7)Then we can show the following variant of Kohl (2005, Lemma 2.2.3)Lemma 3.2. In equations (3.4) and (3.5), let r, s vary in [ri,ru] with 0 ≤ ri < ru ≤1. Then(3.8)po(r) = max[Ar/Aru, B,/Bra)and there exists some ro e [ri, ru] such that(3.9)Aro/Ar =Bro/Bruand it holds(3.10)min ,Po(r) = po(ro),i.e.; ro =rorErt.Moreover, if ru = 1, ro = ru.12
The rLS as optimally robust filter Choice of the clipping height b As to the choice of the clipping height b, we make the simplifying assumption that the conditional expectation Eid[∆X|∆Y ] is linear, which will turn out to only be approximately right. In this setting, we have two proposals: The first one is an Anscombe insurance criterium. To given “insurance premium” δ to be paid in terms of loss of efficiency in the ideal model compared to the optimal procedure in this (ideal) setting, i.e.; the classical Kalman filter, we choose b = b(δ) such that Eid ∆X − Hb(M0∆Y ) 2 != (1 + δ) Eid ∆X − M0∆Y 2 (3.2) The other possibility will become clearer in the next section: To a given size of the (SO-) neighborhood U SO(r) specified by a radius r ∈ [0, 1], we determine b = b(r) such that (1 − r) Eid(|M0∆Y | − b)+ != rb (3.3) If this radius is unknown, we could follow the idea worked out in Rieder et al. (2008), that is, distinguish a least favorable radius r0 defined in the following expressions r0 = argmins∈[0,1]ρ0(s), ρ0(s) = max r∈[0,1] ρ(r, s), (3.4) ρ(r, s) = maxUSO(r) MSE(rLS(b(s))) maxUSO(r) MSE(rLS(b(r))) (3.5) and use the corresponding b(r0). If we have limited knowledge about r, say r ∈ [rl , ru], 0 < rl < ru < 1, we would restrict the variation range of s and r in the respective optimization problems correspondingly. To this end, define Ar = Eid h tr Covid[∆X|∆Y id] + (|M0∆Y id| − b(r))2 + i (3.6) Br = Eid h |M0∆Y id| 2 − (|M0∆Y id| − b(r))2 + i + b(r) 2 (3.7) Then we can show the following variant of Kohl (2005, Lemma 2.2.3): Lemma 3.2. In equations (3.4) and (3.5), let r, s vary in [rl , ru] with 0 ≤ rl < ru ≤ 1. Then ρ0(r) = max{Ar/Arl , Br/Bru } (3.8) and there exists some r˜0 ∈ [rl , ru] such that Ar˜0 /Arl = Br˜0 /Bru (3.9) and it holds min r∈[rl ,ru] ρ0(r) = ρ0(˜r0), i.e.; r0 = ˜r0 (3.10) Moreover, if ru = 1, r0 = ru. 12
The rLS as optimally robust filterIn particular, thelast equality shows that one should restrict ruto be strictlysmaller than 1 to get asensibleprocedure.3.2(One-Step)-Optimalityofthe rLSTheseeminglyad-hocrobustificationproposedinthe rLSfilter has someremarkable optimalityproperty,though. To see this, let us first forget about the time structure and instead consider thefollowing simplified, but general "Bayesian" model:We have an unobservable but interesting signal X ~ px(da), where for technical reasons we as-sumethat intheideal modelEX|?<Instead of X we rather observe a random variable Y of which we know the ideal transition proba-bilities; more specifically, we assume that these transition probabilities are dominated, again in theideal model, hence have densities w.r.t.some measure μ,pYIX="(dy) = r(y, r) μ(dy)(3.11)Our approach relies on the MSE so we assume that the range of X be such that MSE makes-which essentially amounts to saying that the range of X be a subset of some Hilbertsense,space.As (wide-sense) Ao model, we consider an SO outlier model, i.e.;(3.12)Yre =(1-U)Yid +UYd,U~Bin(1,r)for U independent of (X, yid, Ydi) and some distorting random variable Ydi for which, in a slightvariation of condition (2.16)weassumeydi, X independent(3.13)and the law of which is arbitrary, unknown and uncontrollable. As a first step consider the setauso(r)definedas(3.14)auso(r) = C(X,Yre) /Yre acc. to (3.12) and (3.13)Because of condition (3.13), in the sequel we refer to the random variables Yre and ydi instead oftheir respective (marginal) distributions only, while in the common gross error model, reference tothe respective distributions would suffice. Condition (3.13) also entails that in general, contrary tothegross errormodel,C(X,Yid)is not element of auso(r),i.e.; not representable itself as someC(X,Yre) in this neighborhood.As corresponding (convex) neighborhood we defineuso(r) = U auso(s)(3.15)0≤sShencethesymbol"a"in auso,as the lattercanbe interpretedasthecorrespondingsurfaceofthisball.Ofcourse,uso(r)containsC(X,Yid)13
The rLS as optimally robust filter In particular, the last equality shows that one should restrict ru to be strictly smaller than 1 to get a sensible procedure. 3.2 (One-Step)-Optimality of the rLS The seemingly ad-hoc robustification proposed in the rLS filter has some remarkable optimality property, though. To see this, let us first forget about the time structure and instead consider the following simplified, but general “Bayesian” model: We have an unobservable but interesting signal X ∼ P X(dx), where for technical reasons we assume that in the ideal model E|X| 2 < ∞. Instead of X we rather observe a random variable Y of which we know the ideal transition probabilities; more specifically, we assume that these transition probabilities are dominated, again in the ideal model, hence have densities w.r.t. some measure µ, P Y |X=x (dy) = π(y, x) µ(dy) (3.11) Our approach relies on the MSE — so we assume that the range of X be such that MSE makes sense, — which essentially amounts to saying that the range of X be a subset of some Hilbert space. As (wide-sense) AO model, we consider an SO outlier model, i.e.; Y re = (1 − U)Y id + UY di, U ∼ Bin(1, r) (3.12) for U independent of (X, Y id, Y di) and some distorting random variable Y di for which, in a slight variation of condition (2.16) we assume Y di, X independent (3.13) and the law of which is arbitrary, unknown and uncontrollable. As a first step consider the set ∂U SO(r) defined as ∂U SO(r) = n L(X, Y re)| Y re acc. to (3.12) and (3.13) o (3.14) Because of condition (3.13), in the sequel we refer to the random variables Y re and Y di instead of their respective (marginal) distributions only, while in the common gross error model, reference to the respective distributions would suffice. Condition (3.13) also entails that in general, contrary to the gross error model, L(X, Y id) is not element of ∂U SO(r), i.e.; not representable itself as some L(X, Y re) in this neighborhood. As corresponding (convex) neighborhood we define U SO(r) = [ 0≤s≤r ∂U SO(s) (3.15) hence the symbol “∂” in ∂U SO, as the latter can be interpreted as the corresponding surface of this ball. Of course, U SO(r) contains L(X, Y id). 13