NJUA 南京大学 人工智能学院 SCHOOL OF ARTFICIAL INTELUGENCE,NANJING UNFVERSITY Lecture 10.Online Learning in Games Advanced Optimization(Fall 2023) Peng Zhao zhaop@lamda.nju.edu.cn Nanjing University
Lecture 10. Online Learning in Games Peng Zhao zhaop@lamda.nju.edu.cn Nanjing University Advanced Optimization (Fall 2023)
Outline Two-player Zero-sum Games Minimax Theorem ·Repeated Play Faster Convergence via Adaptivity Advanced Optimization(Fall 2023) Lecture 10.Online Learning in Games 2
Advanced Optimization (Fall 2023) Lecture 10. Online Learning in Games 2 Outline • Two-player Zero-sum Games • Minimax Theorem • Repeated Play • Faster Convergence via Adaptivity
Classic Game:Rock-Paper-Scissors game Rock-Paper-Scissors game Game rules Rock Paper Scissors Seissoys apar Rock 01-1 Paper Paper 0 Scissors Strategy -Pure strategy:a fixed action,e.g.,"Rock". -Mixed strategy:a distribution on all actions,e.g., ("Rock","Paper","Scissors")=(1/3,1/3,1/3). Advanced Optimization(Fall 2023) Lecture 10.Online Learning in Games 3
Advanced Optimization (Fall 2023) Lecture 10. Online Learning in Games 3 Classic Game: Rock-Paper-Scissors game • Rock-Paper-Scissors game • Strategy Rock Paper Scissors Rock Paper Scissors
Two-Player Zero-Sum Games Terminology Rock Paper Scissors Rock 0 1 game/payoff matrix A[-1,1]mxm Paper -1 0 1 two players Scissors 1 -1 0 -player #1:x-player,row player,min player Game rules -player #2:y-player,colume player,max player action set(focusing on mixed strategy) -player#1:△m={p|∑1p:=1,andp:≥0,i∈[m}. -player#2:△n={g∑-1g=1,andg≥0,j∈m. Advanced Optimization(Fall 2023) Lecture 10.Online Learning in Games 4
Advanced Optimization (Fall 2023) Lecture 10. Online Learning in Games 4 Two-Player Zero-Sum Games • Terminology Rock Paper Scissors Rock Paper Scissors
Two-Player Zero-Sum Games ·The protocol: The repeated game is denoted by a(payoff)matrix A[-1,1]mxm. The x-player has m actions,and the y-player has n actions. The goal of x-player is to minimize her loss and the goal of y-player is to maximize her reward. ·Given the action(x,y)∈△m×△n,the loss and reward are the same. -expected loss of x-player isosAy. -expected reward of y-player is Ereward]AAy. Advanced Optimization(Fall 2023) Lecture 10.Online Learning in Games 5
Advanced Optimization (Fall 2023) Lecture 10. Online Learning in Games 5 Two-Player Zero-Sum Games • The protocol: •