当前位置：和泉文库 > 统计 > 浏览文档

电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第六讲非线性分类模型——多层感知机

1 历史进程 2 预备知识采用线性激活函数的神经元采用阈值激活函数的神经元采用 S 形激活函数的神经元 3 异或问题 4 多层感知机到底在做什么？ 5 Tilling（耕种，耕作）算法 6 可微激活函数函的多层感知机学习方法误差 e 的表示误差的反向传播反向传播算法计算例子

文件格式：PDF，文件大小：1.38MB，售价：21.86元

共83页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约83页）

7.2.3.采用S形激活函数的神经元 inputs N weights activation output n (a) 、1 1+e-a 斯谛函数 (logistic function) 15/66

7.2.3. 采用 S 形激活函数的神经元 a = P n i=1 wixi y = φ(a) = 1 1+e−a ▶ φ(a) 称为逻辑斯谛函数（logistic function） 15 / 66

采用S形激活函数的带偏置神经元 X0=-1 0a=∑1-0w× y=p(a)=1/(1+ea) x)/=01-p6) 训练算法： E[wo,wi,...,wn]=1/2 d(ta-ya)2 OE/Owi=->a(ta-ya)ya(1-ya)xai (xa,ta):训练样本， :神经网的实际输出 16/66

采用 S 形激活函数的带偏置神经元 dφ(x)/dx = φ(x)(1 − φ(x)) 单层感知机的梯度下降训练算法： E[w0,w1, ..., wn] = 1/2P d (td − yd) 2 ∂E/∂wi = − P d (td − yd)yd(1 − yd)xdi (xd, td)：训练样本， yd：神经网的实际输出 16 / 66

S形函数的梯度下降规则 The integral of any smooth,positive, sigmoid"bump-shaped"function will be sigmoidal,thus the cumulative distribution functions for many common probability distributions are sigmoidal, such as a normal distribution [w1;...,wn]1/2(td-ya)2 ae/a,=8/aw,1/2(a-a)2 =0/a1/2(a-p(∑iwxd)2 =-(td-yd)'(i wixdi)xd 对逻辑斯谛函数：y=p(ad)=1/(1+e-a) p'(a=e-a/(1+e-a)2=p(a)(1-p(a) whew=wi+Awi=wi+nyd(1-yd)(td-yd)xdi 17/66

S 形函数的梯度下降规则 The integral of any smooth, positive, “bump-shaped”function will be sigmoidal, thus the cumulative distribution functions for many common probability distributions are sigmoidal, such as a normal distribution E[w1, ..., wn] = 1/2(td − yd) 2 ∂E/∂wi = ∂/∂wi1/2(td − yd) 2 = ∂/∂wi1/2(td − φ( P i wixdi))2 = −(td − yd)φ ′ ( P i wixdi)xdi 对逻辑斯谛函数：y = φ(a) = 1/(1 + e −a ) φ ′ (a) = e −a/(1 + e −a ) 2 = φ(a)(1 − φ(a)) w new i = wi + ∆wi = wi + ηyd(1 − yd)(td − yd)xdi 17 / 66

Gradient Descent Learning Rule △wi=)yad(1-yad）(1d-yad）xi 。学习率learning rate 。激活函数的倒数derivative of Wji activation function(logistic function) 。后突触神经元j的输出与实际值的误差可一。前突触神经元i的输出 18/66

Gradient Descent Learning Rule yj wji xi ∆wji = η ydj(1 − ydj) (tdj − ydj) xdi 学习率 learning rate 激活函数的倒数 derivative of activation function (logistic function) 后突触神经元 j 的输出与实际值的误差 δj 前突触神经元 i 的输出 18 / 66

梯度下降的原理 Minimize the cost function C(x1,x2),and following Calculus,C changes as follows: Ax+0x2 △C≈0x x2 Define△r=(r,△x2)',VC=(S,)',and get:: △C≈VC·△x To make△C negative,set△xr=-nVC,and get: △C≈-nVC.7C=-ll7C2≤0 Then fromx+1-x,=Ax =-nVC,get: X1+1=x1-nVC Gradient is fastest descent direction only,as △C≈7C·△x=C△cos(0)≤=0，π/2≤8≤3/2 19/66

梯度下降的原理 ▶ Minimize the cost function C(x1, x2)，and following Calculus, C changes as follows: ∆C ≈ ∂C ∂x1 ∆x1 + ∂C ∂x2 ∆x2 ▶ Define ∆x = (∆x1, ∆x2) T , ∇C = ( ∂C ∂x1 , ∂C ∂x2 ) T , and get: ∆C ≈ ∇C · ∆x ▶To make ∆C negative, set ∆x = −η∇C, and get: ∆C ≈ −η∇C · ∇C = −η∥∇C∥ 2 ≤ 0 ▶Then from xt+1 − xt = ∆x = −η∇C, get: xt+1 = xt − η∇C ▶ Gradient is fastest descent direction only, as ∆C ≈ ∇C · ∆x = ∥∇C∥∥∆x∥ cos(θ) ≤= 0, π/2 ≤ θ ≤ π3/2 19 / 66

点击进入文档下载页（PDF格式）

共83页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第五讲支持向量机
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第四讲感知机
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第三讲回归模型
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第二讲概率与线性代数回顾
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第一讲概述（文泉、陈娟）
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 10 Unsupervised Learning
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 09 Data Representation — Non-Parametric Model
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 08 Data Representation - Parametric Model
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 07 Non-Linear Classification Model - Ensemble Methods
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 06 Multilayer Perceptron
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 05 Support Vector Machine
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 04 Perceptron
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第七讲非线性分类模型——集成方法
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第八讲数据表示——含参模型
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第九讲数据表示——不含参模型
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第十讲非监督学习
中国人民大学：《应用随机过程 Applied Stochastic Processes》课程教学资源（课件讲稿）第10章随机过程在保险精算中的应用
中国人民大学：《应用随机过程 Applied Stochastic Processes》课程教学资源（课件讲稿）第11章 Markov链Monte Carlo方法
中国人民大学：《应用随机过程 Applied Stochastic Processes》课程教学资源（课件讲稿）第1章预备知识（张波、商豪、邓军）
中国人民大学：《应用随机过程 Applied Stochastic Processes》课程教学资源（课件讲稿）第2章随机过程的基本概念和类型
中国人民大学：《应用随机过程 Applied Stochastic Processes》课程教学资源（课件讲稿）第3章 Poisson过程
中国人民大学：《应用随机过程 Applied Stochastic Processes》课程教学资源（课件讲稿）第4章更新过程
中国人民大学：《应用随机过程 Applied Stochastic Processes》课程教学资源（课件讲稿）第5章 Markov链
中国人民大学：《应用随机过程 Applied Stochastic Processes》课程教学资源（课件讲稿）第6章鞅

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录