nt= ac(et) Intuitive Reason vt+1 g= 0w How surprise it is 反差 特别大 g g1 g2 g3 g 0.001 0.001 0.003 0.002 0.1 ■金。0泰 g g g2 g g4 00”00 10.8 20.9 31.7 12.1 0.1 特别小 wt+1←wt- 9 2=(g 2一 造成反差的效果
Intuitive Reason • How surprise it is 𝑤𝑡+1 ← 𝑤𝑡 − 𝜂 σ𝑖=0 𝑡 𝑔𝑖 2 𝑔 𝑡 造成反差的效果 g 0 g 1 g 2 g 3 g 4 …… 0.001 0.001 0.003 0.002 0.1 …… g 0 g 1 g 2 g 3 g 4 …… 10.8 20.9 31.7 12.1 0.1 …… 反差 𝑔 𝑡 = 𝜕𝐶 𝜃 𝑡 𝜕𝑤 𝜂 𝑡 = 𝜂 𝑡 + 1 特別大 特別小
Larger gradient,larger steps? Best step: 12axo bl Larger 1st order lxo+2a 2a derivative means far from the minima Xo y=ax2+bx+c 2a 2axo+bl、 别 2ax bl
Larger gradient, larger steps? 𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐 𝜕𝑦 𝜕𝑥 = |2𝑎𝑥 + 𝑏| 𝑥0 |𝑥0 + 𝑏 2𝑎 | 𝑥0 |2𝑎𝑥0 + 𝑏| Best step: − 𝑏 2𝑎 |2𝑎𝑥0 + 𝑏| 2𝑎 Larger 1st order derivative means far from the minima
Larger 1st order Comparison between derivative means far different parameters from the minima Do not cross parameters 9.30018.000 15.000 ±3500 a>b 46.500 13.500 10.500 12.000 a 9.000 6.000 7,500 1.500 W1 W2 3.000 4.500 C> 9.000 7500 13.500 10.500 12.000 15.000 18.000 22.50%400m 6.500 21.000 19.500 2500 5A000 2 W2
Comparison between different parameters 𝑤1 𝑤2 𝑤1 𝑤2 a b c d c > d a > b Larger 1st order derivative means far from the minima Do not cross parameters