当前位置：和泉文库 > 计算机 > 浏览文档

中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 11 马尔可夫决策过程

背景基础 Markov Processes Markov Reward Process Markov Decision Process Extensions to MDPs (Full Observable) Markov Decision Processes (MDPs) Partial Observable MDPs (POMDPs)

文件格式：PDF，文件大小：5.93MB，售价：20.72元

共102页，可试读30页，点击往前阅读 ↑↑

文档详细内容（约102页）

Why discount? Most Markov reward and decision processes are discounted.Why? Mathematically convenient to discount rewards Avoids infinite returns in cyclic Markov processes Uncertainty about the future may not be fully represented If the reward is financial,immediate rewards may earn more interest than delayed rewards Animal/human behaviour shows preference for immediate reward It is sometimes possible to use undiscounted Markov reward processes(i.e.Y=1),e.g.if all sequences terminate. 4口◆4⊙t1三1=，￥9QC

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Why discount?

Value Function The value function v(s)gives the long-term value of state s Definition The state value function v(s)of an MRP is the expected return starting from state s v(s)=E[GS:=s] 口·三4，进分双0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Value Function

Example:Student MRP Returns Sample returns for Student MRP: Starting from S1 C1 with G1=R2+yR3+...+T-2RT Cl C2 C3 Pass Sleep n=-2-2·}-2*}+10*} -2.25 C1 FB FB C1 C2 Sleep 4=-2-1*是-1*}-2*青-2*6 = -3.125 C1 C2 C3 Pub C2 C3 Pass Sleep n=-2-2*是-2*}+1*言-2*品 -3.41 C1 FBFB C1 C2 C3 Pub C1... h=-2-1*3-1*}-2*日-2*6… -3.20 FBFBFB C1 C2 C3 Pub C2 Sleep 4口卡404三·1=生0C

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example: Student MRP Returns

Example:State-Value Function for Student MRP(1) v(s)for y=0 0.9 0 0.1 AR-0 0.5 1.0 0.8 0.6 10 -2 R=-2 R=+10 0. 0.2 +1 0.4 R=+1 口卡+8·三色进分双0

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example: State-Value Function for Student MRP (1)

Example:State-Value Function for Student MRP(2) v(s)for y=0.9 0.9 -7.6 2 0.1 AR=-1 4R=0 0.5 L02 1.0 -5.0 05 0.9 0.8 4.1 0.6 10 R=-2 R=-2 R=-2 0.4 R=+10 0.4 0.2 0.4 1.9 R=+1 口◆4日4三·1=，生9QC

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example: State-Value Function for Student MRP (2)

点击进入文档下载页（PDF格式）

共102页，可试读30页，点击继续阅读 ↓↓

您可能感兴趣的文档

中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 10 Uncertainty and Bayesian Networks
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 07 Logical Agents
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 06 Game Playing
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 05 Constraint Satisfaction Problems
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 04 Informed Search
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 03 Solving Problems by Searching
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 02 Intelligent Agents
《Artificial Intelligence：A Modern Approach》教学资源（PPT课件，英文版）Chapter 9-Inference in first-order logic
《Artificial Intelligence：A Modern Approach》教学资源（PPT课件，英文版）Chapter 8-First-Order Logic
《Artificial Intelligence：A Modern Approach》教学资源（PPT课件，英文版）Chapter 7-Logical Agents
《Artificial Intelligence：A Modern Approach》教学资源（PPT课件，英文版）Chapter 6-Adversarial Search
《Artificial Intelligence：A Modern Approach》教学资源（PPT课件，英文版）Chapter 5-Constraint Satisfaction Problems
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 08 First-Order Logic and Inference in FOL
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 09 AI Planning
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 13 神经网络与深度学习
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 14 Reinforcement Learning
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 15 智能机器人系统介绍
中国科学技术大学：《人工智能基础》课程教学资源（课件讲稿）Lecture 01 Introdution（主讲：吉建民）
北京大学：《信息检索》课程教学资源（PPT课件讲稿）Course Overview（主讲：闫宏飞）
北京大学：《信息检索》课程教学资源（PPT课件讲稿）Web Search
北京大学：《信息检索》课程教学资源（PPT课件讲稿）Crawling the Web
北京大学：《信息检索》课程教学资源（PPT课件讲稿）Retrieval Models
北京大学：《信息检索》课程教学资源（PPT课件讲稿）Essential Background
哈尔滨工业大学：《信息检索》课程教学资源（课件讲义）文本分类 Text Categorization（主讲：刘挺）

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录