ZHAO,ZHANG,ZHANG,ZHOU both quantities are at most (T).Furthermore,because the universal dynamic regret studied in this paper holds against any comparator sequence,it specializes the static regret by setting all comparators as the best fixed decision in hindsight,i.e.,u= =UT=x e arg minxef(x).Under such a circumstance,the path length Pr=2u-u so the regret bound in Theorem 3 implies an (vI+Vr)gra static regret bound.recovering the result of Chiang et al.(2012). 4.3 One-Gradient Feedback:Sword++ So far.we have designed an online algorithm (sword)with the gradient-variation dynamic regret.While it achieves favorable regret veat is that Sword r N=O(logT)base-learners simultaneou learner requires her own gra dient direction for the update.Conse erall algorithn sitates O(lo m it tim gradicnt due applicable to the d for static et minimiza tion typically work well under the r adient feedback he the gradient info ion ask whether iti f(x it is nat ossible to desig online al s that can achi e favorable dy mi regret while only adient query pe er iter ation ,making ap plicable to the solve the arm ely by designin ng an algo rithm that only one gradient query po erationand oy the n e ne h called s tion onlin efully in ction e los ms to a surrogate By the whic levels he overall algori within the m thereb achie ing the favorable gradient-vari on dynamic regret w ith only be t the details of Sword++ The orithm maintains multiple se-lea oted BN:wh witl d b step grid is adopted as the schedule. db i∈W}with logT)f constant c>0(usually scaling with poly(1/T)),whose specific setting will be given later lgorithm. Instead of performing update over the original shown in(11) the of th 9(x) =(f(x), reover 91-1(X-1,4 for each base with∈W definition,we have Vgt()=Vf(xt),so each base-learner B;essentially performs the following update at each iteration xd=lxRi-n7-1(xt-1月,+,d=Πx[风i-7f(xl. (19) Using above update steps,we no longer need to evaluate the gradient Vft(xt)over the local decisions for every base-learner,as was done by Sword(see its update rule in (11)). Instead,a single call of Vf(x:)is sufficient at each round for the update in Sword++. 16
Zhao, Zhang, Zhang, Zhou both quantities are at most O(T). Furthermore, because the universal dynamic regret studied in this paper holds against any comparator sequence, it specializes the static regret by setting all comparators as the best fixed decision in hindsight, i.e., u1 = . . . = uT = x ∗ ∈ arg minx∈X PT t=1 ft(x). Under such a circumstance, the path length PT = PT t=2∥ut−1−ut∥2 becomes zero, so the regret bound in Theorem 3 implies an O( √ 1 + VT ) gradient-variation static regret bound, recovering the result of Chiang et al. (2012). ¶ 4.3 One-Gradient Feedback: Sword++ So far, we have designed an online algorithm (Sword) with the gradient-variation dynamic regret. While it achieves a favorable regret guarantee, one caveat is that Sword runs N = O(log T) base-learners simultaneously and each base-learner requires her own gradient direction for the update. Consequently, the overall algorithm necessitates O(log T) gradient queries at each iteration, making it time-consuming and only applicable to the multi-gradient feedback model. In contrast, algorithms designed for static regret minimization typically work well under the more challenging one-gradient feedback model, namely, they only require the gradient information ∇ft(xt) for updates. Given this, it is natural to ask whether it is possible to design online algorithms that can achieve favorable dynamic regret while only using one gradient query per iteration, making them applicable to the one-gradient feedback online learning. We resolve the question affirmatively by designing an algorithm that requires only one gradient query per iteration and provably enjoys the same gradient-variation dynamic regret as Sword. The new algorithm, called Sword++, also implements an online ensemble structure. Comparing to Sword presented in Section 4.2, the key novel ingredient is the framework of collaborative online ensemble. We carefully introduce correction terms to the online loss and optimism, forming a biased surrogate loss and a surrogate optimism, which are then fed to the meta-algorithm. By further exploiting the negative terms in the meta and base levels, the overall algorithm ensures effective collaboration within the meta and base two layers, thereby achieving the favorable gradient-variation dynamic regret with only one gradient query per iteration. In the following, we describe the details of Sword++. The algorithm maintains multiple base-learners denoted by B1, . . . , BN , which are performed with different step sizes and then combined by a meta-algorithm to track the best one. An exponential step size grid is adopted as the schedule, denoted by H = {ηi = c · 2 i | i ∈ [N]} with N = O(log T) for some constant c > 0 (usually scaling with poly(1/T)), whose specific setting will be given later. Base-algorithm. Instead of performing updates over the original loss ft as shown in (11), all the base-learners of Sword++ update over the linearized surrogate loss gt : X 7→ R defined gt(x) = ⟨∇ft(xt), x⟩, and moreover, the optimism is chosen as Mt = ∇gt−1(xt−1,i) for each base-learner Bi with i ∈ [N]. By definition, we have ∇gt(xt,i) = ∇ft(xt), so each base-learner Bi essentially performs the following update at each iteration: xt,i = ΠX [xbt,i − ηi∇ft−1(xt−1)] , xbt+1,i = ΠX [xbt,i − ηi∇ft(xt)] . (19) Using above update steps, we no longer need to evaluate the gradient ∇ft(xt,i) over the local decisions for every base-learner, as was done by Sword (see its update rule in (11)). Instead, a single call of ∇ft(xt) is sufficient at each round for the update in Sword++. 16
ADAPTIVITY AND NON-STATIONARITY bas e structure achie 1 emo yed in the meta 0n( ,201 tion alo ar from e ugh ent dynamic regret. e this we can chec regret o base pdate with th 9(x argument of I shows that the ≤%G2+2)++2DB+42 27 厂%台x4-x-1服 In the analysis of Sword,because the gradient Vfr(xt)is evaluated on every base-learner's wn decision x. the additional positive term(the third one)is4L2∑T2x which can be when the b ncelled by the negative ,2/m wher the is set ap her decisi r the su omes 4L2x 3,which c t be handld gative term of any base uired to roblem-dep endent dy ret under the is to facilitate ollaborati ry mo meta and base levels.Specifically the p aim rm both le 11 sitive term.Ho t that the ve t tu rns out rely ative t s this i o the f edh o m in th tha th e meta lev ert ve sure can b rm from the T a res oveforms the main ea of prop en the etup thm oo the general framewok I pro gn of correct n R ark cta and off s the We stil训 met ative d simply surrogate l :(f e Sword th rule in (1 construct the surroat loss in the following way and send it tothema , carefully The feedback loss RN is constructed as follows:for each i [N],1. (Vfi(x1),x1.i)and for t>2,it composes the linearized surrogate loss Vf(x),xi) with a decision-deviation correction term,namely, =(f(x),x,)+x-X-1, (20) ‘e6 u时solow8m 1=0 and for t≥2and mism also admits a decis sion-deviation correction term,namely, mti (Mt,xi)+Allxti-xt-1ill2 =(V ft-1(xt-1),xti)+Allxti -xt-1illg.(21)
Adaptivity and Non-stationarity We note that although the linearized trick has previously been employed in the metabase structure for achieving the minimax dynamic regret O( p T(1 + PT )) with one gradient per iteration (Zhang et al., 2018a), this modification alone is far from enough to obtain problem-dependent dynamic regret. To see this, we can check the regret of the base-learner updated with the surrogate loss gt(x). A similar argument of Lemma 1 shows that the base-regret over the linearized loss (#) ≜ PT t=1 gt(xt,i) − PT t=1 gt(ut) satisfies (#) ≤ ηi(G 2 + 2VT ) + D2 + 2DPT 2ηi + 4L 2X T t=2 ∥xt − xt−1∥ 2 2 − 1 ηi X T t=2 ∥xt,i − xt−1,i∥ 2 2 . In the analysis of Sword, because the gradient ∇ft(xt,i) is evaluated on every base-learner’s own decision xt,i, the additional positive term (the third one) is 4L 2 PT t=2∥xt,i − xt−1,i∥ 2 2 , which can be cancelled by the negative term − PT t=2∥xt,i−xt−1,i∥ 2 2 /ηi whenever the step size is set appropriately. However, when the base-learner updates her decision over the surrogate loss, the additional positive term becomes 4L 2∥xt − xt−1∥ 2 2 , which cannot be handled by the negative term of any base-learner. Thus, more advanced mechanisms are required to achieve the problem-dependent dynamic regret under the one-gradient query model. To tackle the difficulty, our primary idea is to facilitate collaboration between the meta and base levels. Specifically, we aim to leverage negative terms from both levels to handle the positive term. However, it turns out that the positive term cannot be entirely offset by the combined negative terms from meta and base levels. To address this issue, we introduce correction terms to the feedback loss and optimism in the meta-algorithm. This generates a new negative term that, together with the negative term from the meta level, effectively cancels out the positive term. Nevertheless, another new positive term emerges due to the injected correction, which we ensure can be managed by the negative term from the base level. As a result, the overall undesired positive term is finally addressed. The above forms the main idea of our proposed collaborative online ensemble framework. In the following, we outline the specific setup of the meta-algorithm. We will provide a brief explanation of the design of corrections in Remark 4 and offer a more comprehensive elaboration on the general framework of collaborative online ensemble in Section 5. Meta-algorithm. We still employ Optimistic Hedge (OMD with the negative-entropy regularizer) as the meta-algorithm, but nevertheless require innovative design in the feedback loss and optimism. Specifically, instead of simply using the linearized surrogate loss ℓt,i ≜ ⟨∇ft(xt), xt,i⟩ as the feedback loss like Sword (see the update rule in (14)), we carefully construct the surrogate loss in the following way and send it to the meta-algorithm. • The feedback loss ℓt ∈ R N is constructed as follows: for each i ∈ [N], ℓ1,i = ⟨∇f1(x1), x1,i⟩ and for t ≥ 2, it composes the linearized surrogate loss ⟨∇ft(xt), xt,i⟩ with a decision-deviation correction term, namely, ℓt,i = ⟨∇ft(xt), xt,i⟩ + λ∥xt,i − xt−1,i∥ 2 2 . (20) • The optimism mt ∈ R N is similarly configured as follows: m1 = 0 and for t ≥ 2 and i ∈ [N], the optimism also admits a decision-deviation correction term, namely, mt,i = ⟨Mt , xt,i⟩ + λ∥xt,i − xt−1,i∥ 2 2 = ⟨∇ft−1(xt−1), xt,i⟩ + λ∥xt,i − xt−1,i∥ 2 2 . (21) 17
ZHAO,ZHANG,ZHANG,ZHOU Both feedback loss and optimism admit an additional correction termlxxthat measures the stability of the local decisions returned by the base-learner,where>0 is the correction coefficient to be determined later.Overall,the meta-algorithm of Sword++ updates the weight p+ERN as follows:for any i[N], .i oc exp(+m) (22) 多公咖u4a成c and (21), respecti ithm only information off(x)at round t and thus Remark 4(design of correction term).We emphasize that the correction term x x-1il2,appearing in the construction of both feedback loss and optimism,is crucial for the design and is the most challenging part in this method.We briefly explain the moti- vation.As we mentioned earlier,the use of linearized surrogate loss gt(x)will introduce an additional term 2llx-x1l3,which cannot be canceled by the negative term of any base-regret,namely,-xti-x-1.2.To address the difficulty,we scrutinize the positive term and find that actually it can be further expanded as: =1 Pxt-ld-∑-1X-l 2 ≤2(mk-x-i+2(-m-d-i) ≤2∑,x,4--1,l3+2D2lpt-p-1l1, which concludes that ∑Ix-x-l服≤2∑∑pKa-x-1l服+2D2∑Ip-pm-l (23) t= 2 t=2 Consequently,it is crucial to exploit negative terms in both meta-regret and base-regret to cancel out the positive term.The second positive term (deviation of meta-algorithm's weights)can be readily canceled by the negative term of meta-regret.However,addressing the first positive term is more intricate,as it is essentially a weighted combination of the base-learners'decision variation.We tackle this positive term by algorithmically adding the decision-variation correction term in the feedback loss and optimism of the meta-algorithm, as well as leveraging the negative term of base-regret.The underlying intuition is to penalize base-learners with large decision variations,so as to ensure a small enough variation of final decisions.As such,we have facilitated the collaborations between the base and meta levels 18
Zhao, Zhang, Zhang, Zhou Both feedback loss and optimism admit an additional correction term λ∥xt,i − xt−1,i∥ 2 2 that measures the stability of the local decisions returned by the base-learner, where λ > 0 is the correction coefficient to be determined later. Overall, the meta-algorithm of Sword++ updates the weight pt+1 ∈ R N as follows: for any i ∈ [N], pt+1,i ∝ exp −ε X t s=1 ℓs,i + mt+1,i ! , (22) where ε > 0 is (for simplicity) chosen as a fixed learning rate of the meta-algorithm, and the feedback loss ℓt and optimism mt are defined in (20) and (21), respectively. Notably, the meta-algorithm only requires the gradient information of ∇ft(xt) at round t and thus is feasible in the one-gradient feedback model. Remark 4 (design of correction term). We emphasize that the correction term λ∥xt,i − xt−1,i∥ 2 2 , appearing in the construction of both feedback loss and optimism, is crucial for the design and is the most challenging part in this method. We briefly explain the motivation. As we mentioned earlier, the use of linearized surrogate loss gt(x) will introduce an additional term PT t=2∥xt − xt−1∥ 2 2 , which cannot be canceled by the negative term of any base-regret, namely, − PT t=2∥xt,i − xt−1,i∥ 2 2 . To address the difficulty, we scrutinize the positive term and find that actually it can be further expanded as: ∥xt − xt−1∥ 2 2 = X N i=1 pt,ixt,i − X N i=1 pt−1,ixt−1,i 2 2 ≤ 2 X N i=1 pt,ixt,i − X N i=1 pt,ixt−1,i 2 2 + 2 X N i=1 pt,ixt−1,i − X N i=1 pt−1,ixt−1,i 2 2 ≤ 2 X N i=1 pt,i∥xt,i − xt−1,i∥2 !2 + 2 X N i=1 |pt,i − pt−1,i|∥xt−1,i∥2 !2 ≤ 2 X N i=1 pt,i∥xt,i − xt−1,i∥ 2 2 + 2D2 ∥pt − pt−1∥ 2 1 , which concludes that X T t=2 ∥xt − xt−1∥ 2 2 ≤ 2 X T t=2 X N i=1 pt,i∥xt,i − xt−1,i∥ 2 2 + 2D2X T t=2 ∥pt − pt−1∥ 2 1 . (23) Consequently, it is crucial to exploit negative terms in both meta-regret and base-regret to cancel out the positive term. The second positive term (deviation of meta-algorithm’s weights) can be readily canceled by the negative term of meta-regret. However, addressing the first positive term is more intricate, as it is essentially a weighted combination of the base-learners’ decision variation. We tackle this positive term by algorithmically adding the decision-variation correction term in the feedback loss and optimism of the meta-algorithm, as well as leveraging the negative term of base-regret. The underlying intuition is to penalize base-learners with large decision variations, so as to ensure a small enough variation of final decisions. As such, we have facilitated the collaborations between the base and meta levels 18