Example:State-Value Function for Student MRP (3) v(s)for y =1 09 -23 0 R=-1 R=0 0.2 1.0 -13 0.5 1.5 0.8 43 0.6 10 -2 R=-2 R=-2 0.4 R=+10 0.4 0.2 0.4 +0.8 R=+1 进分Q0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example: State-Value Function for Student MRP (3)
Bellman Equation for MRPs(1) The value function can be decomposed into two parts: immediate reward Rt+1 discounted value of successor state yv(St+1) v(s)=E[Gt St=s] =E[Rt+1+yR+2+Y2R+3+.|S:=s] =E[R+1+y(Rt+2+yR+3+…)|St=s] =E[R+1+yGt+1|S:=5] =E[R:+1+Yv(S:+1)IS:=s] 4口◆4⊙t1三1=,¥9QC
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bellman Equation for MRPs (1)
Bellman Equation for MRPs(2) v(s)=E[Rt+1+Yv(St+1)St=s] u(s)s' (s)=Rs+y∑Ps(s) s'∈S 口卡B·三4色进分双0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bellman Equation for MRPs (2)
Example:Bellman Equation for Student MRP 4.3=-2+0.6*10+0.4*0.8 -23 0 0.1 R=-1 R=0 0.5 0.2 1.0 -13 0.5 .5 0.8 4.3 06 10 -2 =-2 R=-2 0.4 R=+10 0.4 0.2 0.4 0.8 R=+1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example: Bellman Equation for Student MRP
Bellman Equation in Matrix Form The Bellman equation can be expressed concisely using matrices, v=R+YPv where v is a column vector with one entry per state (1)1 Ri P1 Pin (1) +Y v(n) Rn P1. v(n) 口卡+8·三色进分双0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bellman Equation in Matrix Form