Output layer weight update output Backward Propagation output net outat Step 1: total cost E。n=%( target o· out o1)2 E。+E tal (traget Step 2: output ->hidden layer weights update a Etotal aTotal autol aneto 5 aout 01 anet 01 5 1 ou t(y)= 1+e-y net(x)=Wix+W2x+b aE 16 5 5 naws (Similar to single perceptron)
16 Output layer weight update Backward Propagation: Step 1 : total cost Step 2 : output ->hidden layer weights update 𝐸𝑡𝑜𝑡𝑎𝑙 = 𝑛 1 2 (𝑡𝑟𝑎𝑔𝑒𝑡 − 𝑜𝑢𝑡𝑝𝑢𝑡ሻ 2 ou𝑡(𝑦ሻ = 1 1+𝑒−𝑦 net x = 𝑤1𝑥 + 𝑤2𝑥 + 𝑏 𝑤5 = 𝑤5 + 𝜂 𝜕𝐸 𝜕𝑤5 (Similar to single perceptron) 𝜕𝐸𝑡𝑜𝑡𝑎𝑙 𝜕𝑤5 = 𝜕𝐸𝑡𝑜𝑡𝑎𝑙 𝜕𝑜𝑢𝑡𝑜1 ∗ 𝜕𝑜𝑢𝑡𝑜1 𝜕𝑛𝑒𝑡𝑜1 * 𝜕𝑛𝑒𝑡01 𝜕𝑤5
Hidden layer weight update Backward Propagation: Step 3: hidden layer - hidden layer weight update ae total ae douth1 x anet E 010 dout 九1 anet h1 0 hi ae total aE Oe douthi douth1 aouthi Etot=E。+E aE aE anet 01 douthi aneto aouth1 ae total aEo aneto aouth1 amethi aneto aouthi)amethi ar 17
17 Hidden layer weight update Backward Propagation: Step 3 : hidden layer -> hidden layer weight update 𝜕𝐸𝑡𝑜𝑡𝑎𝑙 𝜕𝑤1 = 𝜕𝐸𝑡𝑜𝑡𝑎𝑙 𝜕𝑜𝑢𝑡ℎ1 ∗ 𝜕𝑜𝑢𝑡ℎ1 𝜕𝑛𝑒𝑡ℎ1 * 𝜕𝑛𝑒𝑡ℎ1 𝜕𝑤1 𝜕𝐸𝑡𝑜𝑡𝑎𝑙 𝜕𝑜𝑢𝑡ℎ1 = 𝜕𝐸𝑜1 𝜕𝑜𝑢𝑡ℎ1 + 𝜕𝐸𝑜2 𝜕𝑜𝑢𝑡ℎ1 𝜕𝐸𝑡𝑜𝑡𝑎𝑙 𝜕𝑤1 = 𝑜 𝜕𝐸𝑜 𝜕𝑛𝑒𝑡𝑜 ∗ 𝜕𝑛𝑒𝑡𝑜 𝜕𝑜𝑢𝑡ℎ1 ∗ 𝜕𝑜𝑢𝑡ℎ1 𝜕𝑛𝑒𝑡ℎ1 ∗ 𝜕𝑛𝑒𝑡ℎ1 𝜕𝑤1 𝜕𝐸𝑜1 𝜕𝑜𝑢𝑡ℎ1 = 𝜕𝐸𝑜1 𝜕𝑛𝑒𝑡𝑜1 ∗ 𝜕𝑛𝑒𝑡𝑜1 𝜕𝑜𝑢𝑡ℎ1
Gradient Descent Network parameters={vn,n2…h,b2…} Starting Parameters aL(OOw Compute vl(00=0+nVL(0o) oL(0)O, Compute vl(e) 0=0+nVl(et aL(O)ab, Millions of parameters aL(0)/b2 To compute the gradients efficiently, we use back propagation 18 Sourceoftheslidehttp://speech.ee.ntu.edu.tw/-tlkagk/courses/ml2017/lecture/bp%20(v2)-pdf
18 Gradient Descent 0 Starting Parameters 1 2 …… ( ) 0 Compute L ( ) 1 0 0 = + L ( ) 1 Compute L ( ) 2 1 1 = + L L( ) = w1 ,w2 , ,b1 ,b2 , ( ) ( ) ( ) ( ) = 2 1 2 1 L L L L b bw w Millions of parameters …… To compute the gradients efficiently, we use back propagation. Network parameters Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf
Chain rule Case 1 y=g(x)==h() dz dz dy △x→>△y→△ Case 2 x=g(s) y=h(s) I=k(x, y) △ dz az dx az dy △z △ 19 Sourceoftheslidehttp://speech.ee.ntu.edu.tw/-tlkagk/courses/ml2017/lecture/bp%20(v2)-pdf
19 Chain Rule Case 1 Case 2 y = g(x) z = h(y) dx dy dy dz dx dz x → y → z = x = g(s) y = h(s) z = k(x, y) ds dy y z ds dx x z ds dz + = s z x y Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf
Backpropagation NN 6 N N L() "(0) aL(0) al(0 aw aw n=1 x x 2 20 Sourceoftheslidehttp://speech.ee.ntu.edu.tw/-tlkagk/courses/ml2017/lecture/bp%20(v2)-pdf
20 Backpropagation 𝐿 𝜃 = 𝑛=1 𝑁 𝑙 𝑛 𝜃 𝜕𝐿 𝜃 𝜕𝑤 = 𝑛=1 𝑁 𝜕𝑙 𝑛 𝜃 𝜕𝑤 𝑥1 𝑥2 𝑦1 𝑦2 x n NN 𝜃 y n 𝑦 ො 𝑛 𝑙 𝑛 Source of the slide: http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2017/Lecture/BP%20(v2).pdf