7.2.3.采用S形激活函数的神经元 inputs N weights activation output n (a) 、1 1+e-a 斯谛函数 (logistic function) 15/66
7.2.3. 采用 S 形激活函数的神经元 a = P n i=1 wixi y = φ(a) = 1 1+e−a ▶ φ(a) 称为 逻辑斯谛函数(logistic function) 15 / 66
采用S形激活函数的带偏置神经元 X0=-1 0a=∑1-0w× y=p(a)=1/(1+ea) x)/=01-p6) 训练算法: E[wo,wi,...,wn]=1/2 d(ta-ya)2 OE/Owi=->a(ta-ya)ya(1-ya)xai (xa,ta):训练样本, :神经网的实际输出 16/66
采用 S 形激活函数的带偏置神经元 dφ(x)/dx = φ(x)(1 − φ(x)) 单层感知机的梯度下降训练算法: E[w0,w1, ..., wn] = 1/2P d (td − yd) 2 ∂E/∂wi = − P d (td − yd)yd(1 − yd)xdi (xd, td):训练样本, yd:神经网的实际输出 16 / 66
S形函数的梯度下降规则 The integral of any smooth,positive, sigmoid"bump-shaped"function will be sigmoidal,thus the cumulative distribution functions for many common probability distributions are sigmoidal, such as a normal distribution [w1;...,wn]1/2(td-ya)2 ae/a,=8/aw,1/2(a-a)2 =0/a1/2(a-p(∑iwxd)2 =-(td-yd)'(i wixdi)xd 对逻辑斯谛函数:y=p(ad)=1/(1+e-a) p'(a=e-a/(1+e-a)2=p(a)(1-p(a) whew=wi+Awi=wi+nyd(1-yd)(td-yd)xdi 17/66
S 形函数的梯度下降规则 The integral of any smooth, positive, “bump-shaped”function will be sigmoidal, thus the cumulative distribution functions for many common probability distributions are sigmoidal, such as a normal distribution E[w1, ..., wn] = 1/2(td − yd) 2 ∂E/∂wi = ∂/∂wi1/2(td − yd) 2 = ∂/∂wi1/2(td − φ( P i wixdi))2 = −(td − yd)φ ′ ( P i wixdi)xdi 对逻辑斯谛函数:y = φ(a) = 1/(1 + e −a ) φ ′ (a) = e −a/(1 + e −a ) 2 = φ(a)(1 − φ(a)) w new i = wi + ∆wi = wi + ηyd(1 − yd)(td − yd)xdi 17 / 66
Gradient Descent Learning Rule △wi=)yad(1-yad)(1d-yad)xi 。学习率learning rate 。激活函数的倒数derivative of Wji activation function(logistic function) 。后突触神经元j的输出与实际值的 误差可一 。前突触神经元i的输出 18/66
Gradient Descent Learning Rule yj wji xi ∆wji = η ydj(1 − ydj) (tdj − ydj) xdi 学习率 learning rate 激活函数的倒数 derivative of activation function (logistic function) 后突触神经元 j 的输出与实际值的 误差 δj 前突触神经元 i 的输出 18 / 66
梯度下降的原理 Minimize the cost function C(x1,x2),and following Calculus,C changes as follows: Ax+0x2 △C≈0x x2 Define△r=(r,△x2)',VC=(S,)',and get:: △C≈VC·△x To make△C negative,set△xr=-nVC,and get: △C≈-nVC.7C=-ll7C2≤0 Then fromx+1-x,=Ax =-nVC,get: X1+1=x1-nVC Gradient is fastest descent direction only,as △C≈7C·△x=C△cos(0)≤=0,π/2≤8≤3/2 19/66
梯度下降的原理 ▶ Minimize the cost function C(x1, x2),and following Calculus, C changes as follows: ∆C ≈ ∂C ∂x1 ∆x1 + ∂C ∂x2 ∆x2 ▶ Define ∆x = (∆x1, ∆x2) T , ∇C = ( ∂C ∂x1 , ∂C ∂x2 ) T , and get: ∆C ≈ ∇C · ∆x ▶To make ∆C negative, set ∆x = −η∇C, and get: ∆C ≈ −η∇C · ∇C = −η∥∇C∥ 2 ≤ 0 ▶Then from xt+1 − xt = ∆x = −η∇C, get: xt+1 = xt − η∇C ▶ Gradient is fastest descent direction only, as ∆C ≈ ∇C · ∆x = ∥∇C∥∥∆x∥ cos(θ) ≤= 0, π/2 ≤ θ ≤ π3/2 19 / 66