当前位置：和泉文库 > 计算机 > 浏览文档

南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 03 GD Methods I - GD method, Lipschitz optimization

• Gradient Descent • Convex and Lipschitz • Polyak Step Size • Convergence without Optimal Value • Optimal Time-Varying Step Sizes • Strongly Convex and Lipschitz

文件格式：PDF，文件大小：11.14MB，售价：11.02元

共46页，可试读17页，点击往前阅读 ↑↑

文档详细内容（约46页）

Goal In general,there are two performance measures (essentially same) Convergence:f(xr)-f(x*)<e(T), Qualitatively::e(T)→0 when T→o Quantitatively:O(元)/o()/O()/o()/.… Complexity: Definition:number of iterations required to achieve f(r)-f(x*)<s. Quantitatively:()/()/()/0(In())/... corresponds to(元)/O()/O(是)/O()/ Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 6

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 6 • In general, there are two performance measures (essentially same). Goal

Gradient Descent ·GD Template: xt+1=Πxxt-EVf(xt)] -x1 can be an arbitrary point inside the domain. -n>0 is the potentially time-varying step size (or called learning rate). -ProjectionⅡxy]=arg minx∈x‖x-y ensures the feasibility. Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 7

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 7 • GD Template: Gradient Descent

Why Gradient Descent? Let's simply focus on the unconstrained setting. Idea:surrogate optimization We aim to find a sequence of local upper bounds U1,...,Ur,where the surrogate function U::RdR may depend on xt such that ()f(x)=U(xt); (i)f(x)≤U(x)holds for all x∈Ra, (iii)U(x)should be simple enough to minimize. Then,our proposed algorithm would bex+1=arg minx U:(x) Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 8

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 8 Why Gradient Descent? Let’s simply focus on the unconstrained setting. • Idea: surrogate optimization

Why Gradient Descent? Following the "surrogate optimization"principle,let's invent GD for convex and smooth functions. Proposition 1.Suppose that f is convex and differentiable.Moreover,suppose that f is L-smooth with respect to l2-norm.Define the surrogate U:RRas 凶f)+fxx-x+51x-xg Then,we have (①)f(xt)=U(xt)； ()f(x)≤U(x)holds for all x∈R, (iii)xt+1=arg minx Ui(x)is equivalent to x+=x-Vf(x). Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 9

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 9 Why Gradient Descent? • Following the “surrogate optimization” principle, let’s invent GD for convex and smooth functions

Gradient Descent ·GD Template: xt+1=Πx[xt-:Vf(xt)] -xI can be an arbitrary point inside the domain. nt>0 is the potentially time-varying step size(or called learning rate). -Projection IIy]=arg minxex-yll ensures the feasibility. This lecture will focus on GD analysis for Lipschitz functions, and next lecture will discuss smooth functions. Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 10

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 10 • GD Template: Gradient Descent This lecture will focus on GD analysis for Lipschitz functions, and next lecture will discuss smooth functions

点击进入文档下载页（PDF格式）

共46页，可试读17页，点击继续阅读 ↓↓

您可能感兴趣的文档

南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 02 Convex Optimization Basics; Function Properties
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 01 Introduction; Mathematical Background
南京大学：《数字图像处理》课程教学资源（课件讲义）11 图像特征分析
南京大学：《数字图像处理》课程教学资源（课件讲义）10 图像分割
南京大学：《数字图像处理》课程教学资源（课件讲义）09 形态学及其应用
南京大学：《数字图像处理》课程教学资源（课件讲义）08 压缩编码
南京大学：《数字图像处理》课程教学资源（课件讲义）07 频域滤波器
南京大学：《数字图像处理》课程教学资源（课件讲义）06 图像频域变换
南京大学：《数字图像处理》课程教学资源（课件讲义）05 代数运算与几何变换
南京大学：《数字图像处理》课程教学资源（课件讲义）04 图像复原及锐化
南京大学：《数字图像处理》课程教学资源（课件讲义）03 灰度直方图与点运算
南京大学：《数字图像处理》课程教学资源（课件讲义）02 二值图像与像素关系
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 04 GD Methods II - GD method, smooth optimization, Nesterov’s AGD, composite optimization
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 05 Online Convex Optimization - OGD, convex functions, strongly convex functions, online Newton step, exp-concave functions
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 06 Prediction with Expert Advice - Hedge, minimax bound, lower bound; mirror descent（motivation and preliminary）
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 07 Online Mirror Descent - OMD framework, regret analysis, primal-dual view, mirror map, FTRL, dual averaging
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 08 Adaptive Online Convex Optimization - problem-dependent guarantee, small-loss bound, self-confident tuning, small-loss OCO, self-bounding property bound
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 09 Optimistic Online Mirror Descent - optimistic online learning, predictable sequence, small-loss bound, gradient-variance bound, gradient-variation bound
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 10 Online Learning in Games - two-player zero-sum games, repeated play, minimax theorem, fast convergence
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 11 Adversarial Bandits - MAB, IW estimator, Exp3, lower bound, BCO, gradient estimator, self-concordant barrier
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 12 Stochastic Bandits - MAB, UCB, linear bandits, self-normalized concentration, generalized linear bandits
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 13 Advanced Topics - non-stationary online learning, universal online learning, online ensemble, base algorithm, meta algorithm
南京大学：《组合数学》课程教学资源（课堂讲义）课程简介 Combinatorics Introduction（主讲：尹一通）
南京大学：《组合数学》课程教学资源（课堂讲义）基本计数 Basic enumeration

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录