where r( is the solution of the initial value problem i(s)=∫(x(s),u(s),t≤s≤t1, (t) Here, tE[to, ti] is a "variable"initial time, u( )is a control defined on t, ti] taking values space of admissible controls, containing at least the piecewise continuous control The value function is defined by infJ(t,x;(·) for (t, r)E[to, ti x R". The dymamic programming principle states that for every r E t,til v(tr inf L((s),u(s)ds+V(r,a(r) ()∈,r (we will prove this later on). From this, one can derive formally the equation at v(t, )+H(r, Vav(t, a ))=0 in(to, t1)X R with terminal data V v(ar) in R Here, the Hamiltonian is given by H(x,入)=inf{λ·f(x,U)+L(x,U)} The nonlinear first order PDE (7) is the dynamic programming PDE or Hamilton-Jacobi- Bellman(hjb) equation. The pair(7),( 8) specify what is called a Cauchy problem, and can be viewed as a special case of (1)together with suitable boundary conditions, usin Q=(to, ti)x Rn. Notice that the Hamiltonian( 9)is concave in the variable A(since it is the infimum of linear functions) Let us see how(7)is obtained. Set r=t+h, h>0, and rearrange(6)to yield inf (V(t+h, c(t+h))-v(t, a))+ If V and u() are sufficiently smooth, then (V(+h,a(t+h)-v(t, 2)-atv(a, t)+v2V(r, t). f(, u(t)as h-0 L(x(s),u(s))ds→L(x,u(t)ash→0
where x(·) is the solution of the initial value problem x˙(s) = f(x(s), u(s)), t ≤ s ≤ t1, x(t) = x. (4) Here, t ∈ [t0, t1] is a “variable” initial time, u(·) is a control defined on [t, t1] taking values in, say, U ⊂ Rm (U closed), and x(·) is the state trajectory in Rn . We denote by Ut,t1 a space of admissible controls, containing at least the piecewise continuous controls. The value function is defined by V (t, x) = inf u(·)∈Ut,t1 J(t, x; u(·)) (5) for (t, x) ∈ [t0, t1] × Rn . The dynamic programming principle states that for every r ∈ [t, t1], V (t, x) = inf u(·)∈Ut,r Z r t L(x(s), u(s)) ds + V (r, x(r)) (6) (we will prove this later on). From this, one can derive formally the equation ∂ ∂tV (t, x) + H(x, ∇xV (t, x)) = 0 in (t0, t1) × Rn , (7) with terminal data V (t1, x) = ψ(x) in Rn . (8) Here, the Hamiltonian is given by H(x, λ) = inf v∈U {λ · f(x, v) + L(x, v)} (9) The nonlinear first order PDE (7) is the dynamic programming PDE or Hamilton-JacobiBellman (HJB) equation. The pair (7), (8) specify what is called a Cauchy problem, and can be viewed as a special case of (1) together with suitable boundary conditions, using Ω = (t0, t1) × Rn . Notice that the Hamiltonian (9) is concave in the variable λ (since it is the infimum of linear functions). Let us see how (7) is obtained. Set r = t + h, h > 0, and rearrange (6) to yield inf u(·) 1 h (V (t + h, x(t + h)) − V (t, x)) + 1 h Z t+h t L(x(s), u(s)) ds = 0. If V and u(·) are sufficiently smooth, then 1 h (V (t + h, x(t + h)) − V (t, x)) → ∂ ∂tV (x, t) + ∇xV (x, t) · f(x, u(t)) as h → 0 and 1 h Z t+h t L(x(s), u(s)) ds → L(x, u(t)) as h → 0. 6
Combining these displays one is formally led to(7). A proof of(7)when V is sufficiently smooth requires a careful derivation of two inequalities which combine to give(7). Below we will prove that v is a viscosity solution of (7); in fact, the unique one satisfying the terminal condition( 8) Verification. Let V(t, r)be a CI solution of(7), 8). Let u(Eui,t be any control Then using(7) d v(t, a(t))=av(t, a(t)+vv(t, r(t)i(t) av(t, r(t))+Vv(t, c(t))f(a(t),u(t) ≥-L(x(t),u(t) Integrating, we get V(ti, a(t1))-v(to, To)2-/L(a(t),u(t)) v(to, o)<M L(a(t), u(t)dt+V(t1,a(ti) L((t), u(t)dt +v(a(t1)) using( 8). This shows that V(to, to)<V(to, To)(V is the value function defined by (5)) Now this same calculation for the control u()=u*(EUto,, t satisfying ) E argmin,{V(,x(),((0)+L((0y for tEt1, ti, where x() is the corresponding state trajectory, gives V(to, o)=/L(a*(t),u(t)dt +v(ar"(t1) showing that in fact u* is optimal and V(to, ro)=v(to, to). Indeed we have V=V in to, ti] x R by this arguement, and so we have shown that any smooth solution to(7) (8)must equal the value function-this is a uniqueness result. Unfortuneatly, in general there may be no such smooth solutions Optimal feedback. The above calculations suggest how one might obtain an optimal feedback controller. To simplify a bit, suppose that U=R, f(, u)=f(a)+g(r)u, L(, u)=e()+ alul Then evaluating the infimum in( 9) gives g(a)'A and H(c, A)=Af(r)-jAg(a)g(c)X+e(r)
Combining these displays one is formally led to (7). A proof of (7) when V is sufficiently smooth requires a careful derivation of two inequalities which combine to give (7). Below we will prove that V is a viscosity solution of (7); in fact, the unique one satisfying the terminal condition (8). Verification. Let V˜ (t, x) be a C 1 solution of (7), (8). Let u(·) ∈ Ut1,t1 be any control. Then using (7) d dtV˜ (t, x(t)) = ∂ ∂tV˜ (t, x(t)) + ∇V˜ (t, x(t)) ˙x(t) = ∂ ∂tV˜ (t, x(t)) + ∇V˜ (t, x(t))f(x(t), u(t)) ≥ −L(x(t), u(t)) Integrating, we get V˜ (t1, x(t1)) − V˜ (t0, x0) ≥ − Z t1 t0 L(x(t), u(t))dt or V˜ (t0, x0) ≤ R t1 t0 L(x(t), u(t))dt + V˜ (t1, x(t1)) = R t1 t0 L(x(t), u(t))dt + ψ(x(t1)) using (8). This shows that V˜ (t0, x0) ≤ V (t0, x0) (V is the value function defined by (5)). Now this same calculation for the control u(·) = u ∗ (·) ∈ Ut0,t1 satisfying u ∗ (t) ∈ argmin v∈U n ∇xV˜ (t, x∗ (t)) · f(x ∗ (t), v) + L(x ∗ (t), v) o , (10) for t ∈ [t1, t1], where x ∗ (·) is the corresponding state trajectory, gives V˜ (t0, x0) = Z t1 t0 L(x ∗ (t), u∗ (t))dt + ψ(x ∗ (t1)) showing that in fact u ∗ is optimal and V˜ (t0, x0) = V (t0, x0). Indeed we have V˜ = V in [t0, t1] × Rn by this arguement, and so we have shown that any smooth solution to (7), (8) must equal the value function - this is a uniqueness result. Unfortuneatly, in general there may be no such smooth solutions. Optimal feedback. The above calculations suggest how one might obtain an optimal feedback controller. To simplify a bit, suppose that U = Rm, f(x, u) = f(x) + g(x)u, L(x, u) = `(x) + 1 2 |u| 2 . Then evaluating the infimum in (9) gives u ∗ = −g(x) 0λ 0 and H(x, λ) = λf(x) − 1 2 λg(x)g(x) 0λ 0 + `(x). 7
Hence the HjB equation can be written as 0 VVf-5vvgg'VV (11) with optimal feedback controller (t, r)=g(a)'VV(t, r) This means that the optimal control u* (Eu is given by (t)=a(t,x(t),to≤t≤t1 Of course, this makes sense only when V is sufficiently smooth The equation(11)is sometimes refered to as a nonlinear riccati equation LQR. Take U=R",f(x,u)=Ax+Bu,L(x,u)=是2+u2,v(x)=号①x As a trial solution of (11)we use V(t, r)=5a'P(t where P(t)>0(symmetric)is to be determined. Now ot(t, x)=Jr'P() c, and VV(t, a)=xP(t) Plugging these into(11) gives 号xP(t)x+xP(Ax-xP(t)B'P(t)x+xx=0 Since this holds for all r Rn we must have P(t)+AP(t)+P(t)A-P(t)BB'P(t)+I At time t=t1 we have v(t1,x)=是x业x, and so P(t1)=业 (14) Therefore if there exists a CI solution P(t)to the Riccati differential equation(13)on to, ti] with terminal condition(14)we obtain a smooth solution 2r'P(t) r to(7), (8),and s argued above the value function for the lQr problem is given by (,x)=是xP(t) (15) The optimal feedback controller is given by (t,x)=-BP() This gives the optimal control u*(Eu (1)=-B'P(t)x(t),to≤t≤t1
Hence the HJB equation can be written as ∂ ∂tV + ∇V f − 1 2∇V gg0∇V 0 + ` = 0 (11) with optimal feedback controller u ∗ (t, x) = −g(x) 0∇V (t, x) 0 . (12) This means that the optimal control u ∗ (·) ∈ U is given by u ∗ (t) = u ∗ (t, x∗ (t)), t0 ≤ t ≤ t1. Of course, this makes sense only when V is sufficiently smooth. The equation (11) is sometimes refered to as a nonlinear Riccati equation. LQR. Take U = Rm, f(x, u) = Ax + Bu, L(x, u) = 1 2 |x| 2 + 1 2 |u| 2 , ψ(x) = 1 2 x 0Ψx. As a trial solution of (11) we use V˜ (t, x) = 1 2 x 0P(t)x, where P(t) ≥ 0 (symmetric) is to be determined. Now ∂ ∂tV˜ (t, x) = 1 2 x 0P˙(t)x, and ∇V (t, x) = x 0P(t). Plugging these into (11) gives 1 2 x 0P˙(t)x + x 0P(t)Ax − 1 2 x 0P(t)BB0P(t)x + 1 2 x 0x = 0. Since this holds for all x ∈ Rn we must have P˙(t) + A 0P(t) + P(t)A − P(t)BB0P(t) + I = 0. (13) At time t = t1 we have V˜ (t1, x) = 1 2 x 0Ψx, and so P(t1) = Ψ. (14) Therefore if there exists a C 1 solution P(t) to the Riccati differential equation (13) on [t0, t1] with terminal condition (14) we obtain a smooth solution 1 2 x 0P(t)x to (7), (8), and as argued above the value function for the LQR problem is given by V (t, x) = 1 2 x 0P(t)x. (15) The optimal feedback controller is given by u ∗ (t, x) = −B 0P(t)x. (16) This gives the optimal control u ∗ (·) ∈ U: u ∗ (t) = −B 0P(t)x ∗ (t), t0 ≤ t ≤ t1. (17) 8
2.1.3 Distance function As another example, we consider the distance function d(a, an) to the boundary an of an open, bounded set Q R". In some ways the HJ equation for this function is simpler than that of the optimal control problem described above, and we can more easily explain viscosity solutions and issues of uniqueness, etc, in this context The distance function is defined by d(x,092)=inf|r- Note that the infimum here is always attained, not necessarily uniquely, since an is compact and y H -yl is continuous; denote by r(ar)c an the set of minimizing y We write V(x)=d(x,09) for simplicity, and consider V(a)as a function on the closed set Q2. It can be verified that V(r)is a non-negative Lipschitz continuous function. In fact, we shall see that V is the unique continuous viscosity solution of VV|-1=0in9 (20) satisfying the boundary condition V=0 on aQ Equations(20)and(21)constitute a Dirichlet problem Example 2.1 Q=(1, 1)CR. Here, aQ=f-1, 1 and Q2=[-1, 1.Then v(a) 1+xif-1<x<0 1-xif0<x<1 which is Lipschitz continuous, and differentiable except at x =0. At each point +0 V solves the HJ equation(20), and V satisfies the boundary condition(21)(V(-1 v(1)=0), see figure2. Note that T(x)=-1for-1≤x<0,丌(x)=1for0<x≤1, and T(0 1, 1. The Lipschitz function Vi(a) 1 also satisfies(20)ae. and (21); there are many other such functions Dynamic programming. The distance function satisfies a simple version of the dynamic programming principle: for any r>0 we have V(x)=,inf{x-2|+V(z)} We will use this later to show that V is a viscosity solution of (20), but for now we discuss d derive(22
2.1.3 Distance Function As another example, we consider the distance function d(x, ∂Ω) to the boundary ∂Ω of an open, bounded set Ω ⊂ Rn . In some ways the HJ equation for this function is simpler than that of the optimal control problem described above, and we can more easily explain viscosity solutions and issues of uniqueness, etc, in this context. The distance function is defined by d(x, ∂Ω) = inf y∈∂Ω |x − y|. (18) Note that the infimum here is always attained, not necessarily uniquely, since ∂Ω is compact and y 7→ |x − y| is continuous; denote by π(x) ⊂ ∂Ω the set of minimizing y. We write V (x) = d(x, ∂Ω) (19) for simplicity, and consider V (x) as a function on the closed set Ω. It can be verified that V (x) is a non-negative Lipschitz continuous function. In fact, we shall see that V is the unique continuous viscosity solution of |∇V | − 1 = 0 in Ω (20) satisfying the boundary condition V = 0 on ∂Ω. (21) Equations (20) and (21) constitute a Dirichlet problem. Example 2.1 Ω = (−1, 1) ⊂ R1 . Here, ∂Ω = {−1, 1} and Ω = [−1, 1]. Then V (x) = 1 + x if − 1 ≤ x ≤ 0 1 − x if 0 ≤ x ≤ 1 which is Lipschitz continuous, and differentiable except at x = 0. At each point x 6= 0 V solves the HJ equation (20), and V satisfies the boundary condition (21) (V (−1) = v(1) = 0), see Figure 2. Note that π(x) = −1 for −1 ≤ x < 0, π(x) = 1 for 0 < x ≤ 1, and π(0) = {−1, 1}. The Lipschitz function V1(x) = |x| − 1 also satisfies (20) a.e. and (21); there are many other such functions. Dynamic programming. The distance function satisfies a simple version of the dynamic programming principle: for any r > 0 we have V (x) = inf |x−z|<r {|x − z| + V (z)}. (22) We will use this later to show that V is a viscosity solution of (20), but for now we discuss and derive (22). 9
Figure 2: Distance function V and another Lipschitz solution Vi Fixr∈ Q and r>0, and let|x-x<r. Choose y(2)∈丌(2), so that v(2 12-y*(a).Then V(x)≤|x-y*(2) ≤|x-2|+|z-y*(x) Since this holds for all laI-al <r we have (z)} To see that equality holds, simply take z=a. Thus establishes(22). Note that there are many minimizers z*for the RHS of(22), viz. segments of the lines joining z to points in 2.1. 4 Viscosity Solutions We turn now to the concept of viscosity solution for the HJ equation(1). The terminology comes from the vanishing viscosity method, which finds a solution V of (1)as a limit Vε→ V of solutions te AV(a)+ F(a, V(a), Vv(r))=0 The Laplacian term△v=是∑m1 azve can be used to model fluid viscosity.The definition below is quite independent of this limiting construction, and is closely related to dynamic programming; however, the definition applies also to equations that do not necessarily correspond to optimal control
V V_1 -1 1 Figure 2: Distance function V and another Lipschitz solution V1. Fix x ∈ Ω and r > 0, and let |x − z| < r. Choose y ∗ (z) ∈ π(z), so that V (z) = |z − y ∗ (z)|. Then V (x) ≤ |x − y ∗ (z)| ≤ |x − z| + |z − y ∗ (z)| = |x − z| + V (z). Since this holds for all |x − z| < r we have V (x) ≤ inf |x−z|<r {|x − z| + V (z)}. To see that equality holds, simply take z = x. Thus establishes (22). Note that there are many minimizers z ∗ for the RHS of (22), viz. segments of the lines joining x to points in π(x). 2.1.4 Viscosity Solutions We turn now to the concept of viscosity solution for the HJ equation (1). The terminology comes from the vanishing viscosity method, which finds a solution V of (1) as a limit V ε → V of solutions to − ε 2 ∆V ε (x) + F(x, V ε (x), ∇V ε (x)) = 0 (23) The Laplacian term ε 2∆V ε = ε 2 Pn i=1 ∂ 2 ∂x2 i V ε can be used to model fluid viscosity. The definition below is quite independent of this limiting construction, and is closely related to dynamic programming; however, the definition applies also to equations that do not necessarily correspond to optimal control. 10