当前位置：和泉文库 > 计算机 > 浏览文档

南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 11 Adversarial Bandits - MAB, IW estimator, Exp3, lower bound, BCO, gradient estimator, self-concordant barrier

• Problem Setup • Multi-Armed Bandits • Bandit Convex Optimization • Advanced Topics

文件格式：PDF，文件大小：15.78MB，售价：14.64元

文档详细内容（约63页）

Formulation At each round t=1,2,... (1)the player first picks an arm at E [K]from K candidate arms; (2)and simultaneously environments pick a loss vector eE[0,1]K; (3)the player suffers and only observes loss (.then updates the model. Goal:to minimize expected regret T EfRegretrE REIK -min where the expectation is taken over the randomness of algorithms. deterministic algorithms will suffer (T)regret in the worst case under bandit setting! Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 11

Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 11 Formulation Goal: to minimize expected regret where the expectation is taken over the randomness of algorithms. deterministic algorithms will suffer regret in the worst case under bandit setting!

Comparison Full-Information Problem Domain Loss Functions Feedback Prediction with Experts'Advice △d f(pt)=(t,pt〉 fi(pi),et Online Convex Optimization X f() fi(x),Vfi(xt).... Bandit Problem Domain Loss Functions Feedback Multi-Armed Bandits {e1,.,eK} ft(eat)=(et,eat） ft(eat）=lt,at Bandit Convex Optimization X f() fi(xt) Notation:e;ERK is the one-hot vector,with i-th entry being 1. (simplex is the convex hull of [e1,...,ex)) Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 12

Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 12 Comparison

Comparison Full-Information Problem Domain Loss Functions Feedback Prediction with Experts'Advice △d fi(pt)=(et,p） fi(Pt),et Online Convex Optimization X f() fi(x);Vfi(xt);... Bandit Problem Domain Loss Functions Feedback Multi-Armed Bandits {e1,,er} ft(eat)=(et,eat》 ft(eat）=t,at Bandit Convex Optimization X f() fi(x) Notation:e;ERK is the one-hot vector,with i-th entry being 1. (simplex is the convex hull of {e1,...ex}) Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 13

Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 13 Comparison

A Natural Solution for MAB MAB bares much similarity with the PEA problem(except for the amount feedback information). Deploying Hedge to MAB problem. Hedge for PEA At each round t=1,2,... (I)compute p:∈△<such that pt,x exp(-nLt-l,i)fori∈[K] (2)the player submits p,suffers loss (p,e),and observes loss eERK (3)update Lt=Lt-1+e Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 14

Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 14 A Natural Solution for MAB • MAB bares much similarity with the PEA problem (except for the amount feedback information). Hedge for PEA Deploying Hedge to MAB problem

A Natural Solution for MAB However,Hedge does not fit for MAB setting due to limited feedback Hedge requires t for all i [K],but only t is available in MAB. Hedge for PEA At each round t=l,2,… (I)compute p:∈△such that pt,.:x exp(-Lt-1,i)fori∈[K] (2)the player submits p,suffers loss(p,e),and observes losseERK (3)update Lt=Lt-1+et Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 15

Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 15 A Natural Solution for MAB • However, Hedge does not fit for MAB setting due to limited feedback. Hedge for PEA

点击进入文档下载页（PDF格式）

共63页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 10 Online Learning in Games - two-player zero-sum games, repeated play, minimax theorem, fast convergence
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 09 Optimistic Online Mirror Descent - optimistic online learning, predictable sequence, small-loss bound, gradient-variance bound, gradient-variation bound
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 08 Adaptive Online Convex Optimization - problem-dependent guarantee, small-loss bound, self-confident tuning, small-loss OCO, self-bounding property bound
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 07 Online Mirror Descent - OMD framework, regret analysis, primal-dual view, mirror map, FTRL, dual averaging
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 06 Prediction with Expert Advice - Hedge, minimax bound, lower bound; mirror descent（motivation and preliminary）
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 05 Online Convex Optimization - OGD, convex functions, strongly convex functions, online Newton step, exp-concave functions
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 04 GD Methods II - GD method, smooth optimization, Nesterov’s AGD, composite optimization
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 03 GD Methods I - GD method, Lipschitz optimization
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 02 Convex Optimization Basics; Function Properties
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 01 Introduction; Mathematical Background
南京大学：《数字图像处理》课程教学资源（课件讲义）11 图像特征分析
南京大学：《数字图像处理》课程教学资源（课件讲义）10 图像分割
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 12 Stochastic Bandits - MAB, UCB, linear bandits, self-normalized concentration, generalized linear bandits
南京大学：《高级优化 Advanced Optimization》课程教学资源（讲稿）Lecture 13 Advanced Topics - non-stationary online learning, universal online learning, online ensemble, base algorithm, meta algorithm
南京大学：《组合数学》课程教学资源（课堂讲义）课程简介 Combinatorics Introduction（主讲：尹一通）
南京大学：《组合数学》课程教学资源（课堂讲义）基本计数 Basic enumeration
南京大学：《组合数学》课程教学资源（课堂讲义）生成函数 Generating functions
南京大学：《组合数学》课程教学资源（课堂讲义）筛法 Sieve methods
南京大学：《组合数学》课程教学资源（课堂讲义）Cayley公式 Cayley's formula
南京大学：《组合数学》课程教学资源（课堂讲义）Pólya计数法 Pólya's theory of counting
南京大学：《组合数学》课程教学资源（课堂讲义）Ramsey理论 Ramsey theory
南京大学：《组合数学》课程教学资源（课堂讲义）存在性问题 Existence problems
南京大学：《组合数学》课程教学资源（课堂讲义）极值图论 Extremal graph theory
南京大学：《组合数学》课程教学资源（课堂讲义）极值集合论 Extremal set theory

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录