当前位置：和泉文库 > 航空航天 > 浏览文档

《认知机器人》（英文版） Partially Observable Markov Decision Processes

Leslie Pack Kaelbling, Michael. Littman and Anthony R. Cassandra, "Planning and Acting in Partially Observable Stochastic Domains," Artificial Intelligence, Vol. 101, 1998. j. Pineau,g. Gordon&s. Thrun"Point-based- value iteration: An anytime algorithm for POMDPs". International Joint Conference on Artificial Intelligence(JCAI) Acapulco, Mexico. Aug.

文件格式：PDF，文件大小：963.71KB，售价：3.58元

文档详细内容（约12页）

Partially Observable Markov Decision processes Additional reading Leslie Pack Kaelbling, Michael L Littman and Anthony R. Cassandra, "Planning and cting in Partially Observable Stochastic Domains, "Artificial Intelligence, Vol. 101 1998 J Pineau, G. Gordon &s. Thrun"Point-based value iteration: An anytime algorithm for POMDPs". International Joint Conference on Artificial Intelligence(IJCAI) Acapulco, Mexico. Aug. 2003 Ssues Problem statement If we do not know the current state of the world what should we do to act optimally? Input Model of states actions observations transition and emission functions reward function initial distribution and discount factor Outputs from different algorithms Complete/partial, exact/approximate value function Complete/partial, exact/approximate finite state machine Choices Policy Vs plan Exact VS approximate Representation Discrete vs, continuous states actions observations

Partially Observable Markov Decision Processes Ɣ Additional reading: Ɣ Acting in Partially Observable Stochastic Domains,'' Artificial Intelligence, Vol. 101, 1998. Ɣ Issues Ɣ Problem statement: – Ɣ Inputs: – Ɣ Outputs from different algorithms: – Complete/partial, exact/approximate value function – Complete/partial, exact/approximate finite state machine Ɣ Choices: – Policy vs. plan – Exact vs. approximate – Representation – Discrete vs. continuous states, actions, observations Leslie Pack Kaelbling, Michael L. Littman and Anthony R. Cassandra, ``Planning and J. Pineau, G. Gordon & S. Thrun "Point-based value iteration: An anytime algorithm for POMDPs''. International Joint Conference on Artificial Intelligence (IJCAI). Acapulco, Mexico. Aug. 2003. If we do not know the current state of the world, what should we do to act optimally? Model of states, actions, observations, transition and emission functions, reward function, initial distribution and discount factor

Graphical Models actions Sondik. 1971 Beliefs T(S/ ai, s) Observations O(;/ s hidden states Rewards RI Reliable navigation Conventional trajectories may not be robust to localization error Estimated robot position o True robot position o Goal position o

Graphical Models Sondik, 1971 States S1 Rewards R1 S2 T(sj |ai , si ) Z2 b Beliefs 1 Observations Z1 a Actions 1 O(zj |si ) b2 Hidden Observable Reliable Navigation Ɣ Conventional trajectories may not be robust to localization error Estimated robot position True robot position Goal position

Control models Markov Decision Processes Probabilistic P(x) Control Mode World state o Partially Observable Markov Decision Processes Probabilistic Perception P(x) Control Model World state Navigation as a POmdp Controller chooses actions based on probability distributions Action, observation State space State is hidden from the controller

Ɣ Markov Decision Processes Control Models Probabilistic Perception Model P(x) Control World state World state Probabilistic Perception Model P(x) Control Partially Observable Markov Decision Processes Navigation as a POMDP State Space State is hidden from the controller Action, observation Controller chooses actions based on probability distributions argmax P(x)

Passive Controlled Observable Markov Model MDP Hidden State HMM POMDP Stationary policy: Best action is fixed Non-stationary policy: Best action depends on time States can be discrete, continuous, or hybrid Tradeoffs MDP tractable to solve relatively easy to specify Assumes perfect knowledge of state POMDP treats all sources of uncertainty(action, sensing, environment )in a uniform framework Allows for taking actions that gain information Difficult to specify all the conditional probabilities Hugely intractable to solve optimally POMDP Advantages Models information gathering Computes trade-off between Getting reward Being uncertain Am I here, or over there? 人人

Ɣ Stationary policy: Best action is fixed Ɣ Non-stationary Ɣ States can be , continuous, or hybrid Tradeoffs Ɣ MDP + + – Assumes perfect knowledge of state Ɣ POMDP + in a uniform framework + – – Hugely intractable to solve optimally Hidden State HMM POMDP Markov Model MDP Fully Observable Passive Controlled POMDP Advantages Ɣ Models information gathering Ɣ Computes trade-off between: Ɣ Getting reward Ɣ Being uncertain Am I here, or over there? policy: Best action depends on time discrete Tractable to solve Relatively easy to specify Treats all sources of uncertainty (action, sensing, environment) Allows for taking actions that gain information Difficult to specify all the conditional probabilities

A Simple POmdp p(s) State is hidden S2 POMDP Policies Current belief p(s) Optimal action Belief Space

A Simple POMDP p(s) s1 s2 s3 State is hidden POMDP Policies

点击进入文档下载页（PDF格式）

共12页，试读已结束，阅读完整版请下载

您可能感兴趣的文档

《认知机器人》（英文版） Reactive Planning in Large State Spaces Through
《认知机器人》（英文版） Fast Solutions to CSp's
《认知机器人》（英文版） Massachusetts Institute of Technology
《认知机器人》（英文版） Distributed constraint Satisfaction problems
《认知机器人》（英文版） LPG: Local search for Planning Graphs
《认知机器人》（英文版） Fast Solutions to CSPs
《认知机器人》（英文版） Planning as Heuristic Forward Search
《认知机器人》（英文版） Using the Forest to See the Trees Context-based Object Recognition
《认知机器人》（英文版） Hybrid Mode Estimation and Gaussian Filtering with Hybrid HMMs
《认知机器人》（英文版） Model-based Programming and Constraint-based HMMs
《认知机器人》（英文版） Optimal csPs and Conflict-directed
《认知机器人》（英文版） Mapping Topics: Topological Maps
《认知机器人》（英文版） Vision-based SLAM
《认知机器人》（英文版） SIFT SLAM Vision Details
《认知机器人》（英文版） Information Based Adaptive Robotic Exploration
《认知机器人》（英文版） Temporal Plan Execution: Dynamic Scheduling and Simple Temporal Networks
《认知机器人》（英文版） Partially Observable Markov Decision Processes Part II
美国麻省理工大学：《航空系统的估计与控制》教学资源（讲义，英文版）Handout 1：Bode plot rules reminder
美国麻省理工大学：《航空系统的估计与控制》教学资源（讲义，英文版）Handout 2：Gain and Phase margins
美国麻省理工大学：《航空系统的估计与控制》教学资源（讲义，英文版）Handout 5：Control System Design Principles
美国麻省理工大学：《航空系统的估计与控制》教学资源（讲义，英文版）Handout 6：Proportional Compensation
美国麻省理工大学：《航空系统的估计与控制》教学资源（讲义）Handout 3：Gain and Phase Margins for unstable
美国麻省理工大学：《航空系统的估计与控制》教学资源（讲义）Handout 4：Root-Locus Review
美国麻省理工大学：《航空系统的估计与控制》教学资源（讲义）extra

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录