ZHAO,ZHANG,ZHANG,ZHOU to as tracking regret or shifting regret in the prediction with expert advice setting(Herbster and Warmuth,1998,2001:Bousquet and Warmuth,2002;Cesa-Bianchi et al.,2012;Gyorgy and Szepesvari,2016).It is known that in the worst case,a sublinear dynamic regret is not attainable unless imposing certain regularities on the comparator sequence or the function sequence(Besbes et al.,2015;Jadbabaie et al.,2015).This paper focuses the most common regularity called the path length introduced by Zinkevich(2003),which measures fluctuation of the comparators defined by Pr=∑u-1-l2 (5) Note that we simply focus on the Euclidean norm throughout this paper,and it is straight- d to extend u prima orms introdu Zhang et al (2017), ST =f2lu-1-ull3,and the functio variation introduced by Besbes et al. 2015 二2先2pkx-I冈pect to tea There are two kinds of dynamic regret notions in the previous studies.The universa dynamic regret,as defined in (2),aims to compare with any feasible comparator sequence. while the worst-case e dynamic regret defined in (3)specifies the comparator sequence to be the sequence of minimizers of online functions.In the following,we present the related works respectively.Notice that we will use notations Pr and Sr for path length and squared path length of the comparator sequence fu)=1r,while adopt the notations Pi and Si for that of the sequence {xh=1..r where xi is the minimizer of the online function f namely,P7=∑t-2xt-1-x对l2andS=∑i-2lx-1-xtl2. Universal dynamic regret.The seminal work of Zinkevich (2003)demonstrates that online gradient descent (OGD)enjoys an o(vT(1+Pr))universal dynamic regret,which holds against any feasible comparator sequence.Nevertheless,the result is far from the (VT(1+Pr))lower bound established by Zhang et al.(2018a),who further r close the gap by proposing a novel online algorithm that attains an optimal rate of (VT(1+Pr)) for convex functions(Zhang et al.,2018a).Our work further exploits the easiness of the problem instances and achieve problem-dependent regret guarantees,hence better than the minimax rate.Zhao et al.(2021a)study the universal dynamic regret for bandit convex optimization under both one-point and two-point feedback models.The universal dynamic regret is also studied for variants of the standard OCO model such as OCO with mem- ory (Zhao et al.,2022)and OCO with switching cost (Zhang et al.,2021).We note that the aforementioned works and ours are all building on the two-laver meta-base structure Concurrent to our conference version paper (Zhao et al.,2020b),Cutkosky (2020)pro poses a novel online algorithm that achieves the same minimax optimal dynamic regret for convex functions as (Zhang et al.,2018a),without relying on meta-base aggregation Instead,their method employs the combination strategy developed in parameter-free online learning (Cutkosky and Orabona,2018;Cutkosky,2019).We note that it may be possible to modify the algorithm of Cutkosky (2020)to achieve small-loss bounds;however,it is generally challend ng to attain gradient-variation bounds.especially under the one-gradient feedback model.More specifically,it is not hard to modify their framework to be compat- 6
Zhao, Zhang, Zhang, Zhou to as tracking regret or shifting regret in the prediction with expert advice setting (Herbster and Warmuth, 1998, 2001; Bousquet and Warmuth, 2002; Cesa-Bianchi et al., 2012; Gy¨orgy and Szepesv´ari, 2016). It is known that in the worst case, a sublinear dynamic regret is not attainable unless imposing certain regularities on the comparator sequence or the function sequence (Besbes et al., 2015; Jadbabaie et al., 2015). This paper focuses the most common regularity called the path length introduced by Zinkevich (2003), which measures fluctuation of the comparators defined by PT = X T t=2 ∥ut−1 − ut∥2. (5) Note that we simply focus on the Euclidean norm throughout this paper, and it is straightforward to extend the notions and results to general primal-dual norms. Other regularities include the squared path length introduced by Zhang et al. (2017), which is defined as ST = PT t=2∥ut−1 − ut∥ 2 2 , and the function variation introduced by Besbes et al. (2015) that measures the cumulative variation with respect to the function value and is defined as V f T = PT t=2 supx∈X |ft−1(x) − ft(x)|. There are two kinds of dynamic regret notions in the previous studies. The universal dynamic regret, as defined in (2), aims to compare with any feasible comparator sequence, while the worst-case dynamic regret defined in (3) specifies the comparator sequence to be the sequence of minimizers of online functions. In the following, we present the related works respectively. Notice that we will use notations PT and ST for path length and squared path length of the comparator sequence {ut}t=1,...,T , while adopt the notations P ∗ T and S ∗ T for that of the sequence {x ∗ t }t=1,...,T where x ∗ t is the minimizer of the online function ft , namely, P ∗ T = PT t=2∥x ∗ t−1 − x ∗ t ∥2 and S ∗ T = PT t=2∥x ∗ t−1 − x ∗ t ∥ 2 2 . Universal dynamic regret. The seminal work of Zinkevich (2003) demonstrates that online gradient descent (OGD) enjoys an O( √ T(1 + PT )) universal dynamic regret, which holds against any feasible comparator sequence. Nevertheless, the result is far from the Ω(p T(1 + PT )) lower bound established by Zhang et al. (2018a), who further close the gap by proposing a novel online algorithm that attains an optimal rate of O( p T(1 + PT )) for convex functions (Zhang et al., 2018a). Our work further exploits the easiness of the problem instances and achieve problem-dependent regret guarantees, hence better than the minimax rate. Zhao et al. (2021a) study the universal dynamic regret for bandit convex optimization under both one-point and two-point feedback models. The universal dynamic regret is also studied for variants of the standard OCO model such as OCO with memory (Zhao et al., 2022) and OCO with switching cost (Zhang et al., 2021). We note that the aforementioned works and ours are all building on the two-layer meta-base structure. Concurrent to our conference version paper (Zhao et al., 2020b), Cutkosky (2020) proposes a novel online algorithm that achieves the same minimax optimal dynamic regret for convex functions as (Zhang et al., 2018a), without relying on meta-base aggregation. Instead, their method employs the combination strategy developed in parameter-free online learning (Cutkosky and Orabona, 2018; Cutkosky, 2019). We note that it may be possible to modify the algorithm of Cutkosky (2020) to achieve small-loss bounds; however, it is generally challenging to attain gradient-variation bounds, especially under the one-gradient feedback model. More specifically, it is not hard to modify their framework to be compat- 6
ADAPTIVITY AND NON-STATIONARITY ible with optimistic online learning,but one usually needs to exploits additional negativ opt quantity 修to e gradi dif x:and xi- caref n met bas algorithm as th as far as we c with only one gradler feedback per round it is nging for th framework of Cut sky(2020 achieve the gradient-variation bound due to the lack of negative terms in the regret analysis Worst-case dynamic regret. There are many efforts devoted to studying the worst-case dynamic regret.Yang et al.(2016)prove that OGD enjoys an O(T(1+P))worst-case dynamic regret for convex functions when the path length Pt is known.For strongly con- vex and smooth functions,Mokhtari et al.(2016)show that an (P)dynamic regret is achievable,and Zhang et al.(2017)further propose the online multiple gradient descent algorithm with an O(min{p,S})guarantee.Yang et al.(2016)show that O(Pr)rate is attainable for convex and smooth functions,provided that all the minimizers x's lie in the interior of the domain &The above results mainly use the (squared)path length as the non-stationarity measure,which measures the cumulative variation of the compara- tor sequence.In another line of research,researchers use the variation with respect to the function values as the measur Besbes et al.(2015)show that OGD with a restart- ing strategy attains n()regret for functions when the function varia- tion V is available,which is improved to (T/V2/3)for 1-dim square loss (Baby and Wang,2019).Chen et al.(2019)extend the results of Besbes et al.(2015)to more general function-variation measures capable of capturing local temporal and spatial changes.To take advantage of variations in both comparator sequences and function values,(Zhao and Zhang,2021)provide an improved analysis for online multiple gradient descent and prove an O(min Pr,St,Vf)worst-case dynamic regret for strongly convex and smooth functions. For convex and smooth functions,they also demonstrate that the simple greedy strategy (i.e..x=x:E arg min. f(x))can effectively optimize the worst-case dynamic re- gret (Zhao and Zhang,2021,Section 4.2). 3.Problem Setup and Algorithmic Framework In this section,we first formally state the problem setup,then introduce the foundational algorithmic frame ork for dy inimization and finally list several assumptions that might be used in the th etical 3.1 Problem Setup Online Convex Optimization(OCO)can be modeled as an iterated game between the player and the environments.At iteration tE[T],the player first chooses the decision x from a convex feasible set CRd,then the environments reveal the loss function f:R and the player suffers the loss f(x)and observes a certain information about the function fi().According to the revealed information,the online learning problems are typically
Adaptivity and Non-stationarity ible with optimistic online learning, but one usually needs to exploits additional negative terms to convert the optimistic quantity ∥∇ft(xt) − ∇ft−1(xt−1)∥ 2 2 to the gradient variation supx∈X ∥∇ft(x) − ∇ft−1(x)∥ 2 2 , in order to eliminate the difference between decisions xt and xt−1. Our proposed algorithms are built upon the meta-base two-layer framework, which involves a careful exploitation of negative terms in the regret analysis of both meta and base algorithms, as well as the introduction of additional correction terms. However, as far as we can see, with only one gradient feedback per round, it is challenging for the framework of Cutkosky (2020) to achieve the gradient-variation bound due to the lack of negative terms in the regret analysis. Worst-case dynamic regret. There are many efforts devoted to studying the worst-case dynamic regret. Yang et al. (2016) prove that OGD enjoys an O( q T(1 + P ∗ T )) worst-case dynamic regret for convex functions when the path length P ∗ T is known. For strongly convex and smooth functions, Mokhtari et al. (2016) show that an O(P ∗ T ) dynamic regret is achievable, and Zhang et al. (2017) further propose the online multiple gradient descent algorithm with an O(min{P ∗ T , S∗ T }) guarantee. Yang et al. (2016) show that O(P ∗ T ) rate is attainable for convex and smooth functions, provided that all the minimizers x ∗ t ’s lie in the interior of the domain X . The above results mainly use the (squared) path length as the non-stationarity measure, which measures the cumulative variation of the comparator sequence. In another line of research, researchers use the variation with respect to the function values as the measure. Besbes et al. (2015) show that OGD with a restarting strategy attains an O(T 2/3V f1/3 T ) regret for convex functions when the function variation V f T is available, which is improved to O(T 1/3V f2/3 T ) for 1-dim square loss (Baby and Wang, 2019). Chen et al. (2019) extend the results of Besbes et al. (2015) to more general function-variation measures capable of capturing local temporal and spatial changes. To take advantage of variations in both comparator sequences and function values, (Zhao and Zhang, 2021) provide an improved analysis for online multiple gradient descent and prove an O(min P ∗ T , S∗ T , V f T ) worst-case dynamic regret for strongly convex and smooth functions. For convex and smooth functions, they also demonstrate that the simple greedy strategy (i.e., xt+1 = x ∗ t ∈ arg minx∈X ft(x)) can effectively optimize the worst-case dynamic regret (Zhao and Zhang, 2021, Section 4.2). 3. Problem Setup and Algorithmic Framework In this section, we first formally state the problem setup, then introduce the foundational algorithmic framework for dynamic regret minimization, and finally list several assumptions that might be used in the theoretical analysis. 3.1 Problem Setup Online Convex Optimization (OCO) can be modeled as an iterated game between the player and the environments. At iteration t ∈ [T], the player first chooses the decision xt from a convex feasible set X ⊆ R d , then the environments reveal the loss function ft : X 7→ R and the player suffers the loss ft(xt) and observes a certain information about the function ft(·). According to the revealed information, the online learning problems are typically 7
ZHAO,ZHANG,ZHANG,ZHOU classified into full-information online learning and partial-information online learning (or called bandit online learning).In this paper,we focus on the full-information one,which can be further categorized into the following two setups: (i)multi-gradient feedback:the plaver can access the entire gradient function vf(. and thus can evaluate the gradient multiple times: (ii)one-gradient feedback: the player can observe the gradient(x) after submitting the decision x. In Section 4.2,we develop an online algorithm called Sword with gradient-variation dynamic regret bounds under the multi-gradient feedback model.In Section 4.3,we present an improved algorithm called Sword++that can achieve the same guarantee under the more challenging one-gradient feedback model. The tvpical performance measure is the static regret,which benchmarks the algorithm with a fixed comparator.To handle non-stationary environments.we focus on the strength ened measure called dymnamic regret,which compares the online algorithm to a sequence of time-varying comparators ul ur e&,as defined in (2).An upper bound of dynamic regret should be a function of comparators,and typically the bound depends on some reg- ularities that measure the fluctuation of the comparator sequence,such as the path length Pr=u-u2.Throughout the pa per we focus on the Euclidean norm for simplicity,and it is straightforward to extend our results to general primal-dual norms. n addition to the r rret measure we further consider the ndient query complerity Note that algorithms designed for the multi-gradient feedback model may nery the gra Hients for multiple times at each round.How er n nost algorithms designed for the static uire 又f(xt for the kt update c only.Ther efor it is more desirable to achieve the favorable regret nder the c adient feedback model.In other words. elop first-order methods for our aim is to deve vmamic regret ninimization that require only gradient ou per iterati 3.2 Optimistic Mirror Descen ent (OMD)(Chian et al and Sridhar rk of Optimistic Mir g block for desig for on-statio online lea ning.OMD is gy for optimist Ito the M∈Rdt the the initial point ofthe future g哥 and perform the following two-step upd te x:=argmin m(M). +1=ar竖x,+De,刘) (6) which firstly updates by the optimism and then updates by the received gradient.In above, n>0 is a(potentially)time-varying step size,and D(,)denotes the Bregman divergence ssociated with the regularizer defined as D(x,y)=(x)-v(y)-(vu(y).x-y).We have the following general result regarding dynamic regret of optimistic mirror descent. 8
Zhao, Zhang, Zhang, Zhou classified into full-information online learning and partial-information online learning (or called bandit online learning). In this paper, we focus on the full-information one, which can be further categorized into the following two setups: (i) multi-gradient feedback: the player can access the entire gradient function ∇ft(·) and thus can evaluate the gradient multiple times; (ii) one-gradient feedback: the player can observe the gradient information ∇ft(xt) after submitting the decision xt . In Section 4.2, we develop an online algorithm called Sword with gradient-variation dynamic regret bounds under the multi-gradient feedback model. In Section 4.3, we present an improved algorithm called Sword++ that can achieve the same guarantee under the more challenging one-gradient feedback model. The typical performance measure is the static regret, which benchmarks the algorithm with a fixed comparator. To handle non-stationary environments, we focus on the strengthened measure called dynamic regret, which compares the online algorithm to a sequence of time-varying comparators u1, . . . , uT ∈ X , as defined in (2). An upper bound of dynamic regret should be a function of comparators, and typically the bound depends on some regularities that measure the fluctuation of the comparator sequence, such as the path length PT = PT t=2∥ut − ut−1∥2. Throughout the paper, we focus on the Euclidean norm for simplicity, and it is straightforward to extend our results to general primal-dual norms. In addition to the regret measure, we further consider the gradient query complexity. Note that algorithms designed for the multi-gradient feedback model may query the gradients for multiple times at each round. However, most algorithms designed for the static regret minimization only require one gradient per iteration, namely, using ∇ft(xt) for the next update only. Therefore, it is more desirable to achieve the favorable regret guarantees under the one-gradient feedback model. In other words, our aim is to develop first-order methods for dynamic regret minimization that require only one gradient query per iteration. 3.2 Optimistic Mirror Descent We employ the algorithmic framework of Optimistic Mirror Descent (OMD) (Chiang et al., 2012; Rakhlin and Sridharan, 2013) as a generic building block for designing algorithms for non-stationary online learning. OMD is a methodology for optimistic online learning. Compared to the standard online learning setup, the player will now additionally receive an optimistic vector Mt ∈ R d at each round, which serves as a hint of the future gradient. OMD starts from the initial point xb1 ∈ X and performs the following two-step updates: xt = arg min x∈X ηt⟨Mt , x⟩ + Dψ(x, xbt), xbt+1 = arg min x∈X ηt⟨∇ft(xt), x⟩ + Dψ(x, xbt), (6) which firstly updates by the optimism and then updates by the received gradient. In above, ηt > 0 is a (potentially) time-varying step size, and Dψ(·, ·) denotes the Bregman divergence associated with the regularizer ψ defined as Dψ(x, y) = ψ(x) − ψ(y) − ⟨∇ψ(y), x − y⟩. We have the following general result regarding dynamic regret of optimistic mirror descent. 8
ADAPTIVITY AND NON-STATIONARITY Theorem 1.Suppose that the regularizerR is -strongly conver with let ll-be the du he regret of Optimistic Mirror Descent whose update rule is specified in (6)is bounded as follous. 台 盖a+a.刘 which holds for any comparator sequence u,....u Remark 1.The dynamic regret upper bound in Theorem 1 consists of three terms: (i)the first term rnft(xt)-M:l2 is the adaptivity term that measures the devi- ation between gradient and optimism; (ii)the second term D(ut-D(u,)reflects the non-stationarity of environments a andwl the path length of comparators (iii)the last negative term-D(x)+D(x,))is crucial and will be greatly useful for problem-dependent bounds,particularly the gradient-variation one. Moreover,we emphasize that the above regret guarantee is very general due to the flexibil- ity in choosing the regularizer and comparators futh=1.r.For example,by choosing the negative-entropy regularizer and competing with the best fixed prediction,the result recovers the static regret bound of Optimistic Hedge(Syrgkanis et al.,2015);by choosing the Euclidean regularizer and competing with time-varying compactors,it recovers the dy- namic regret bound of Online Gradient Descent (Zinkevich,2003).The versatility of this optimistic mirror descent framework motivates us to use it as a unified building block for both algorithm design and theoretical analysis. 3.3 Assumptions In this part,we list several common assumptions that might be used in the theorems. Assumption 1.The norm of the gradients of online functions over the domain is bounded by G,i.e.,IV ft(x)12 G,for all xEX and t E [T]. Assumption 2.The domain &R contains the origin 0,and the diameter of the domain is at most D,i.e,lx-xl2≤D for any x,x∈X. Assumption 4.All the online functions are non-negative over Rd. We have the following remarks regarding the assumptions.The generic dynamic regret of OMD(Theorem 1)does not require the smoothness assumption(Assumption 3).Never- theless,it is required in attaining the problem-dependent dynamic regret bounds.In fact, 9
Adaptivity and Non-stationarity Theorem 1. Suppose that the regularizer ψ : X 7→ R is 1-strongly convex with respect to the norm ∥ · ∥, and let ∥ · ∥∗ be the dual norm of ∥ · ∥. The dynamic regret of Optimistic Mirror Descent whose update rule is specified in (6) is bounded as follows: X T t=1 ft(xt) − X T t=1 ft(ut) ≤ X T t=1 ηt∥∇ft(xt) − Mt∥ 2 ∗ + X T t=1 1 ηt Dψ(ut , xbt) − Dψ(ut , xbt+1) − X T t=1 1 ηt Dψ(xbt+1, xt) + Dψ(xt , xbt) , (7) which holds for any comparator sequence u1, . . . , uT ∈ X . Remark 1. The dynamic regret upper bound in Theorem 1 consists of three terms: (i) the first term PT t=1 ηt∥∇ft(xt) − Mt∥ 2 ∗ is the adaptivity term that measures the deviation between gradient and optimism; (ii) the second term PT t=1 1 ηt Dψ(ut , xbt) − Dψ(ut , xbt+1) reflects the non-stationarity of environments and will be converted to the path length of comparators; (iii) the last negative term − PT t=1 1 ηt Dψ(xbt+1, xt) + Dψ(xt , xbt) is crucial and will be greatly useful for problem-dependent bounds, particularly the gradient-variation one. Moreover, we emphasize that the above regret guarantee is very general due to the flexibility in choosing the regularizer ψ and comparators {ut}t=1,...,T . For example, by choosing the negative-entropy regularizer and competing with the best fixed prediction, the result recovers the static regret bound of Optimistic Hedge (Syrgkanis et al., 2015); by choosing the Euclidean regularizer and competing with time-varying compactors, it recovers the dynamic regret bound of Online Gradient Descent (Zinkevich, 2003). The versatility of this optimistic mirror descent framework motivates us to use it as a unified building block for both algorithm design and theoretical analysis. ¶ 3.3 Assumptions In this part, we list several common assumptions that might be used in the theorems. Assumption 1. The norm of the gradients of online functions over the domain X is bounded by G, i.e., ∥∇ft(x)∥2 ≤ G, for all x ∈ X and t ∈ [T]. Assumption 2. The domain X ⊆ R d contains the origin 0, and the diameter of the domain X is at most D, i.e., ∥x − x ′∥2 ≤ D for any x, x ′ ∈ X . Assumption 3. All the online functions are L-smooth, i.e., ∥∇ft(x) − ∇ft(x ′ )∥2 ≤ L∥x − x ′∥2 for any x, x ′ ∈ R d and t ∈ [T]. Assumption 4. All the online functions are non-negative over R d . We have the following remarks regarding the assumptions. The generic dynamic regret of OMD (Theorem 1) does not require the smoothness assumption (Assumption 3). Nevertheless, it is required in attaining the problem-dependent dynamic regret bounds. In fact, 9
ZHAO.ZHANG.ZHANG.ZHOU even in the static regret analysis,smoothness is demonstrated to be necessary for the first- order methods to achieve gradient-variation bounds (cf.Lemma 9 of Chiang et al.(2012) and Theorem 1 of Yang et al.(2014)).Therefore,throughout the paper we focus on the problem-dependent dynamic regret of convex and smooth functions.Moreover,note that in Assumption 4 we require the online functions to be non-negative outside the domain which is a precondition for establishing the self-bounding property for smooth functions and is commonly used to establish small-loss bounds (Srebro et al.,2010;Zhang et al.,2019; Zhang and Zhou,2019).Meanwhile,we treat double logarithmic factors in T as a constant, following previous studies(Adamskiy et al.,2012;Luo and Schapire,2015). 4.Gradient-Variation Dynamic Regret Our paper aims to develop online algorithms that achieve problem regret nt quantities he gradlent-var ation term Vr and the smal s term Fr,as defined in (4).As we wil emon rate in the next section the grad t-variation bour d is more ru amental tha he small s bound.Consequently,we start by focusing on the gradient-variation dynami regret in thi section on 6,we wi I then present the small-loss bound and the best- of-both-worlds bound as implications of the results obtained in this section. 4.1 A Gentle Start In the study of static regret,Chiang et al.(2012)propose the online extra-gradient descent (OEGD)algorithm and prove that the algorithm enjoys gradient-variation static regret. Specifically,OEGD starts from x.xE A and then updates by x=ΠxR-n7f-1(xt-1,+1=Πx区-nVf(x] (8) We here consider the algorithm with a fixed step size n for simplicity,and II denotes the Euclidean projection onto the nearest point in t.For convex and smooth functions,Chiang et al.(2012)prove that OEGD can achieve an O(v1+Vr)static regret bound,where Vr=2supxexlvfi(x)-Vf-1(x)is the gradient variation. In the following,we further demonstrate that OEGD also enjoys the gradient-variation dynamic regret guarantee.Actually,OEGD can be viewed as a specialization of the opti mistic mirror descent (6)presented in Section 3.2,by choosing the regularizer (x)=x and the optimism M=Vf-1(x-1)as well as a fixed step size n>0.Then,the general result of Theorem 1 directly implies the following dynamic regret bound for OEGD,and the proof can be found in Appendix A. Lemma 1.Under Assumptions 1,2,and 3,by choosing n s,the dynamie regret of OMD with a reqularizer(x)=x and optimism M:=Vft-1(xt-1)is bounded as ∑f(x)-∑fi(u)≤n(G2+2)+2(D2+2DPr, (9 f=1 where吹=∑T 气-1xl服is the.9 drnPr=∑ ul2 is the path length.The result holds for any comparator sequenceu, 10
Zhao, Zhang, Zhang, Zhou even in the static regret analysis, smoothness is demonstrated to be necessary for the firstorder methods to achieve gradient-variation bounds (cf. Lemma 9 of Chiang et al. (2012) and Theorem 1 of Yang et al. (2014)). Therefore, throughout the paper we focus on the problem-dependent dynamic regret of convex and smooth functions. Moreover, note that in Assumption 4 we require the online functions to be non-negative outside the domain X , which is a precondition for establishing the self-bounding property for smooth functions and is commonly used to establish small-loss bounds (Srebro et al., 2010; Zhang et al., 2019; Zhang and Zhou, 2019). Meanwhile, we treat double logarithmic factors in T as a constant, following previous studies (Adamskiy et al., 2012; Luo and Schapire, 2015). 4. Gradient-Variation Dynamic Regret Our paper aims to develop online algorithms that can simultaneously achieve problemdependent dynamic regret bounds, which scale with two problem-dependent quantities: the gradient-variation term VT and the small-loss term FT , as defined in (4). As we will demonstrate in the next section, the gradient-variation bound is more fundamental than the small-loss bound. Consequently, we start by focusing on the gradient-variation dynamic regret in this section. In Section 6, we will then present the small-loss bound and the bestof-both-worlds bound as implications of the results obtained in this section. 4.1 A Gentle Start In the study of static regret, Chiang et al. (2012) propose the online extra-gradient descent (OEGD) algorithm and prove that the algorithm enjoys gradient-variation static regret. Specifically, OEGD starts from xb1, x1 ∈ X and then updates by xt = ΠX [xbt − η∇ft−1(xt−1)] , xbt+1 = ΠX [xbt − η∇ft(xt)] . (8) We here consider the algorithm with a fixed step size η > 0 for simplicity, and ΠX [·] denotes the Euclidean projection onto the nearest point in X . For convex and smooth functions, Chiang et al. (2012) prove that OEGD can achieve an O( √ 1 + VT ) static regret bound, where VT = PT t=2 supx∈X ∥∇ft(x) − ∇ft−1(x)∥ 2 2 is the gradient variation. In the following, we further demonstrate that OEGD also enjoys the gradient-variation dynamic regret guarantee. Actually, OEGD can be viewed as a specialization of the optimistic mirror descent (6) presented in Section 3.2, by choosing the regularizer ψ(x) = 1 2 ∥x∥ 2 2 and the optimism Mt = ∇ft−1(xt−1) as well as a fixed step size η > 0. Then, the general result of Theorem 1 directly implies the following dynamic regret bound for OEGD, and the proof can be found in Appendix A. Lemma 1. Under Assumptions 1, 2, and 3, by choosing η ≤ 1 4L , the dynamic regret of OMD with a regularizer ψ(x) = 1 2 ∥x∥ 2 2 and optimism Mt = ∇ft−1(xt−1) is bounded as X T t=1 ft(xt) − X T t=1 ft(ut) ≤ η(G 2 + 2VT ) + 1 2η (D2 + 2DPT ), (9) where VT = PT t=2 supx∈X ∥∇ft(x)−∇ft−1(x)∥ 2 2 is the gradient variation and PT = PT t=2∥ut−1− ut∥2 is the path length. The result holds for any comparator sequence u1, . . . , uT ∈ X . 10