LRWM:expected loss of our algorithm Lmin:loss of the best expert Theorem.For e<1/2,the loss on any sequence of {0,1}in time T satisfies LRWM (1+e)Lmin In(n)/E. ▣ n:number of experts.(The more experts,the harder to catch the best one.) 11
◼ 𝐿𝑅𝑊𝑀: expected loss of our algorithm ◼ 𝐿𝑚𝑖𝑛: loss of the best expert ◼ Theorem. For 𝜖 < 1/2, the loss on any sequence of 0,1 in time 𝑇 satisfies 𝐿𝑅𝑊𝑀 ≤ 1 + 𝜖 𝐿𝑚𝑖𝑛 + ln(𝑛)/𝜖. ❑ 𝑛: number of experts. (The more experts, the harder to catch the best one.) 11
Proof Key:Consider the total weight W()at time t. a Fact:Any time our algorithm has significant expected loss,the total weight drops substantially. 1 if expert i is wrong at step t(and 0 otherwise) Let F(=(②:ue=1wO)/w.Two meanings: 0 The fraction of the weight on wrong experts The expected loss of our algorithm at step t Note:W(t+1)=F(w((1-E)+(1-F()W() =W(t)(1-eF(G) 12
Proof ◼ Key: Consider the total weight 𝑊(𝑡) at time 𝑡. ◼ Fact: Any time our algorithm has significant expected loss, the total weight drops substantially. ◼ 𝑙 𝑖 (𝑡) : 1 if expert 𝑖 is wrong at step 𝑡 (and 0 otherwise) ◼ Let 𝐹 (𝑡) = (σ 𝑖:𝑙 𝑖 (𝑡)=1 𝑤𝑖 (𝑡) )/𝑊(𝑡) . Two meanings: ❑ The fraction of the weight on wrong experts ❑ The expected loss of our algorithm at step 𝑡 ◼ Note:𝑊(𝑡+1) = 𝐹 (𝑡)𝑊 𝑡 (1 − 𝜖) + (1 − 𝐹 (𝑡) )𝑊(𝑡) = 𝑊(𝑡) (1– 𝜖𝐹 (𝑡) ) 12
Last slide:W(t+1)=W()(1-EF() a S0W(T+1)=W(T(1-eF(T) =W(T-1(1-eF(T-1)(1-eF(T) =W(1)(1-eF(1).(1-eF(T) On the other hand, WT+1)≥max w+D=(1-)n S0(1-e)n≤w1(1-eF(1)(1-∈F四) Note:Lis the loss of the best expert. 13
◼ Last slide: 𝑊(𝑡+1) = 𝑊 𝑡 1– 𝜖𝐹 𝑡 ◼ So 𝑊(𝑇+1) = 𝑊(𝑇) (1– 𝜖𝐹 𝑇 ) = 𝑊 𝑇−1 1– 𝜖𝐹 𝑇−1 1– 𝜖𝐹 𝑇 = … = 𝑊(1) (1– 𝜖𝐹 1 ) … (1– 𝜖𝐹 𝑇 ) ◼ On the other hand, 𝑊(𝑇+1) ≥ max 𝑖 𝑤𝑖 𝑇+1 = 1 − 𝜖 𝐿𝑚𝑖𝑛 (𝑇) ◼ So 1 − 𝜖 𝐿𝑚𝑖𝑛 (𝑇) ≤ 𝑊 1 (1 − 𝜖𝐹 (1) ) …(1 − 𝜖𝐹 (𝑇) ) ◼ Note: 𝐿𝑚𝑖𝑛 (𝑇) is the loss of the best expert. 13
(1-e)m≤W①(1-eF1)(1-eFT) ■ Note that W()=nsince w=1,vi ■Take log: nn(1-e)<In()+n(1-eF() ≤ln(m-∑t悬reFd~ln(1-z)≤-z =ln(n)- (T) 1(T) ELRWM LWM=F is the loss of our algorithm. URWM ■ Rearranging the inequality and using -ln(1-z)≤z+z2, 0≤z≤1/2 we get the inequality in the theorem. LRWM (1+E)Lmin In(n)/E. 14
1 − 𝜖 𝐿𝑚𝑖𝑛 (𝑇) ≤ 𝑊 1 (1 − 𝜖𝐹 (1) ) … (1 − 𝜖𝐹 (𝑇) ) ◼ Note that 𝑊(1) = 𝑛 since 𝑤𝑖 (1) = 1, ∀𝑖 ◼ Take log: 𝐿𝑚𝑖𝑛 𝑇 ln 1 − 𝜖 ≤ ln 𝑛 + σ𝑡=1,…,𝑇 ln(1 − 𝜖𝐹 (𝑡) ) ≤ ln 𝑛 − σ𝑡=1,…,𝑇 𝜖𝐹 𝑡 ∵ ln 1 − 𝑧 ≤ −𝑧 = ln 𝑛 − 𝜖𝐿𝑅𝑊𝑀 𝑇 ∵ 𝐿𝑅𝑊𝑀 𝑇 = σ𝑡=1,…,𝑇 𝐹 𝑡 ❑ 𝐿𝑅𝑊𝑀 𝑇 is the loss of our algorithm. ◼ Rearranging the inequality and using – ln 1 − 𝑧 ≤ 𝑧 + 𝑧 2 , 0 ≤ 𝑧 ≤ 1/2 we get the inequality in the theorem. 𝐿𝑅𝑊𝑀 ≤ 1 + 𝜖 𝐿𝑚𝑖𝑛 + ln(𝑛)/𝜖. 14