当前位置：和泉文库 > 统计 > 浏览文档

电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 03 Regression Models

1 A Case 2 Least Squares Method 3 From Linear to Nonlinear: Using Linear Model 4 How Regression Got Its Name 5 Probability Interpretation 6 Bias–Variance Dilemma

文件格式：PDF，文件大小：925.42KB，售价：14.68元

共79页，可试读20页，点击往前阅读 ↑↑

文档详细内容（约79页）

So the algorithm for calculating the parameter w is W:+1=W:+n(d-wx)x It is also known as the Widrow-Hoff learning rule,which is for the case where has only one sample. For N samples case,it can be modified as follows: N W+1=W+nd-wix)x i=1 Note that each iteration of the algorithm uses the entire training set to update the parameters.It called batch gradient descent(BGD). 10/78

▶ So the algorithm for calculating the parameter w is wt+1 = wt + η(d − w T t x)x It is also known as the Widrow-Hoff learning rule, which is for the case where Ω has only one sample. ▶ For N samples case, it can be modified as follows: wt+1 = wt + η X N i=1 (d i − w T t x i )x i Note that each iteration of the algorithm uses the entire training set to update the parameters. It called batch gradient descent (BGD) . 10 / 78

Outline (Level 2-3) o Numeric Approach o Neural Network Terminologies Gradient Descent Principle o Stochastic Gradient Descent Algorithm 11/78

Outline (Level 2-3) Numeric Approach Neural Network Terminologies Gradient Descent Principle Stochastic Gradient Descent Algorithm 11 / 78

2.1.1.Neural Network Terminologies one pass:one forward pass one backward pass.Note:do not count the forward pass and backward pass as two different passes one epoch:one pass of all the training samples batch size:the number of training samples in one pass,also called mini-batch. The higher the batch size,the more memory space needed number of iterations:number of passes,each pass using [batch size]number of samples 12/78

2.1.1. Neural Network Terminologies ▶ one pass: one forward pass + one backward pass. Note: do not count the forward pass and backward pass as two different passes ▶ one epoch: one pass of all the training samples ▶ batch size: the number of training samples in one pass, also called mini-batch. The higher the batch size, the more memory space needed ▶ number of iterations: number of passes, each pass using [batch size] number of samples 12 / 78

Outline (Level 2-3) 。Numeric Approach o Neural Network Terminologies Gradient Descent Principle o Stochastic Gradient Descent Algorithm 13/78

Outline (Level 2-3) Numeric Approach Neural Network Terminologies Gradient Descent Principle Stochastic Gradient Descent Algorithm 13 / 78

2.1.2.Gradient Descent Principle Minimize the cost function C(x1,x2),and following Calculus,C changes as follows: △C≈ x1 aC Ax2 Cx+0x2 Define△r=(Ar,△r)',VC=(%,)',and get: △C≈VC.△x To make△C negative,set△x=-nVC,and get: △C≈-n7C.7C=-llVC2≤0 Then from x+1-x,=Ax =-nVC,get: X1+1=x1-nVC Gradient is fastest descent direction only,as △C≈7C·△x=I7Cl△cos(0)≤0，π/2≤θ≤π3/2 14/78

2.1.2. Gradient Descent Principle ▶ Minimize the cost function C(x1, x2), and following Calculus, C changes as follows: ∆C ≈ ∂C ∂x1 ∆x1 + ∂C ∂x2 ∆x2 ▶ Define ∆x = (∆x1, ∆x2) T , ∇C = ( ∂C ∂x1 , ∂C ∂x2 ) T , and get: ∆C ≈ ∇C · ∆x ▶To make ∆C negative, set ∆x = −η∇C, and get: ∆C ≈ −η∇C · ∇C = −η∥∇C∥ 2 ≤ 0 ▶Then from xt+1 − xt = ∆x = −η∇C, get: xt+1 = xt − η∇C ▶ Gradient is fastest descent direction only, as ∆C ≈ ∇C · ∆x = ∥∇C∥∥∆x∥ cos(θ) ≤ 0, π/2 ≤ θ ≤ π3/2 14 / 78

点击进入文档下载页（PDF格式）

共79页，可试读20页，点击继续阅读 ↓↓

您可能感兴趣的文档

电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 02 Review of Linear Algebra and Probability Theory
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 01 Introduction
安顺学院：《经济统计学》专业新增学士学位授予权评审汇报PPT（吴永武）
对外经济贸易大学：《应用统计 Applied Statistics》课程教学资源（教案讲稿）
对外经济贸易大学：《应用统计 Applied Statistics》课程教学资源（教学大纲）
上海交通大学：《统计原理 Principal of statistics》课程教学资源_大脑衰老与吃兴奋功能食品关系研究（调查问卷）
上海交通大学：《统计原理 Principal of statistics》课程教学资源_课后作业答案
上海交通大学：《统计原理 Principal of statistics》课程教学资源_课后习题解答
《统计原理 Principal of statistics》课程教学资源（统计软件教程）北京大学《统计软件SAS教程》（李东风）
《统计原理 Principal of statistics》课程教学资源（统计软件教程）数据分析与EVIEWS应用（易丹辉）
《统计原理 Principal of statistics》课程教学资源（统计软件教程）SPSS18.0教程（SPSS统计与分析）
《统计原理 Principal of statistics》课程教学资源（统计软件教程）R语言实战（中文完整版）
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 04 Perceptron
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 05 Support Vector Machine
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 06 Multilayer Perceptron
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 07 Non-Linear Classification Model - Ensemble Methods
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 08 Data Representation - Parametric Model
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 09 Data Representation — Non-Parametric Model
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿，英文版）Lecture 10 Unsupervised Learning
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第一讲概述（文泉、陈娟）
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第二讲概率与线性代数回顾
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第三讲回归模型
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第四讲感知机
电子科技大学：《统计学习理论及应用 Statistical Learning Theory and Applications》课程教学资源（课件讲稿）第五讲支持向量机

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录