APPENDIX IV OPTIMAL CONTROL THEORY This appendix provides a concise review of optimal control theory. Many economic problems require the use of optimal control theory. For example, optimization over time such as maximizations of utility over an individual's life time and of profit and social welfare of a country over time and optimization over space such as the ones analyzed in this book fit in its framework Although these problems may be solved by the conventional techniques such as Lagrange's method and nonlinear programming if we formulate the problems in discrete form by dividing time(or distance) into a finite number of intervals, continuous time(or space)models are usually more convenient and yield results which are more transparent. Optimization over continuous time, however, introduces some technical difficulties. In the continuous time model the number of choice variables is no longer finite: since decisions may be taken at each instant of time, there is a continuously infinite number of choice variables. The rigorous treatment of optimization in an infinite-dimensional space requires the use of very advanced mathematics Fortunately, once proven, the major results are quite simple, and analogous to those in the optimization in a finite-dimensional space There are three approaches in the optimal control theory: calculus of variations, the maximum principle and dynamic programming. Calculus of variations is the oldest among the three and treats only the interior solution. In applications, as it turned out, choice variables are often bounded, and may jump from one bound to the other in the interval considered. The maximum principle was developed to include such cases Roughly speaking, calculus of variations and the maximum principle are derived by using some appropriate forms of differentiation in an infinite-dimensional space Dynamic programming however, exploits the recursive nature of the problem. Many problems including those treated by calculus of variations and the maximum principle have the property that the optimal policy from any arbitrary time on depends only on the state of the system at that time and does not depend on the paths that the choice variables have taken up to that time. In such cases the maximum value of the objective function beyond time t can be considered as a function of the state of the system at time t. This function is called the value function. The value function yields the value which the best possible performance from t to the end of the interval achieves The dynamic programming approach solves the optimization problem by first obtaining the value function. Although the maximum principle and dynamic programming yield the same results, where they can both be applied, dynamic programming is less general than the approach based on the maximum principle, since it requires differentiability of the value functio We first try to facilitate an intuitive understanding of control the ole control problem is formulated and the conditions for the optimum are derived heuristically. Following the dynamic programming approach, Pontryagin's maximum principle is derived from the partial
Appendix IV 189 APPENDIX IV OPTIMAL CONTROL THEORY This appendix provides a concise review of optimal control theory. Many economic problems require the use of optimal control theory. For example, optimization over time such as maximizations of utility over an individual's life time and of profit and social welfare of a country over time and optimization over space such as the ones analyzed in this book fit in its framework. Although these problems may be solved by the conventional techniques such as Lagrange's method and nonlinear programming if we formulate the problems in discrete form by dividing time (or distance) into a finite number of intervals, continuous time (or space) models are usually more convenient and yield results which are more transparent. Optimization over continuous time, however, introduces some technical difficulties. In the continuous time model, the number of choice variables is no longer finite: since decisions may be taken at each instant of time, there is a continuously infinite number of choice variables. The rigorous treatment of optimization in an infinite-dimensional space requires the use of very advanced mathematics. Fortunately, once proven, the major results are quite simple, and analogous to those in the optimization in a finite-dimensional space. There are three approaches in the optimal control theory: calculus of variations, the maximum principle and dynamic programming. Calculus of variations is the oldest among the three and treats only the interior solution. In applications, as it turned out, choice variables are often bounded, and may jump from one bound to the other in the interval considered. The maximum principle was developed to include such cases. Roughly speaking, calculus of variations and the maximum principle are derived by using some appropriate forms of differentiation in an infinite-dimensional space. Dynamic programming however, exploits the recursive nature of the problem. Many problems including those treated by calculus of variations and the maximum principle have the property that the optimal policy from any arbitrary time on depends only on the state of the system at that time and does not depend on the paths that the choice variables have taken up to that time. In such cases the maximum value of the objective function beyond time t can be considered as a function of the state of the system at time t. This function is called the value function. The value function yields the value which the best possible performance from t to the end of the interval achieves. The dynamic programming approach solves the optimization problem by first obtaining the value function. Although the maximum principle and dynamic programming yield the same results, where they can both be applied, dynamic programming is less general than the approach based on the maximum principle, since it requires differentiability of the value function. We first try to facilitate an intuitive understanding of control theory in section 1. In order to do so, a very simple control problem is formulated and the necessary conditions for the optimum are derived heuristically. Following the dynamic programming approach, Pontryagin's maximum principle is derived from the partial
differential equation of dynamic programming. As mentioned above, this approach is not the most general one, but it facilitates economic interpretation of the necessary conditions. In section 2 the results in section I are applied to an example taken from Chapter VIl. Section 3 considers a more general form of the control problem(due to Bolza and Hestenes) and Hestenes' theorem, giving the necessary conditions for the optimum, is stated without proof. This theorem is general enough to include most problems that appear in this book. Finally, in section 4, Hestenes' theorem is used to solve the control problems in Chapter I 1. A Simple Control Problem Consider a dynamic process which starts at inital time to and ends at terminal time t,. Both to and 4, are taken as given in this section. For simplicity, the state of the system is described by only one variable, x(, called the state variable In most economic problems the state variable is usually a stock, such as the amounts of capital equipments and inventories available at time t. In Chapters IV and V of our book the volume of traffic at a radius is a state variable The state of the system is influenced by the choice of control variables, u(0, u2(0),.,u,(0), which are summarized as the control vector, l()=(1(t),u2(D)2…,ly(t) (11) The control vector must lie inside a given subset of a Euclidean r-dimensional space, U: to≤t≤1 (12) where U is assumed to be closed and unchanging. Note that control variables are chosen at each point of time. The rate of investment in capital equipment is one of the control variables in most models of capital accumulation; the rate of inventory investment is a variable in inventory adjustment models; and the population per unit distance is a control variable for the models in this book. An entire path of the control vector,u(O),to sIst, is a vector-valued function u(t) from the interval into the r-dimensional space and is simply called a control. A control is admissible if it satisfies the constraint(1.2)and some other regularity conditions which will be specified in section 3 The state variable moves according to the differential equation dx di=x()=/(x()u(0),0) (13) where f, is assumed to be continuously differentiable. Notice that the function fr is not the same as fo. In this section the initial state, x('o), is given, 190
Appendix IV 190 differential equation of dynamic programming. As mentioned above, this approach is not the most general one, but it facilitates economic interpretation of the necessary conditions. In section 2 the results in section 1 are applied to an example taken from Chapter VII. Section 3 considers a more general form of the control problem (due to Bolza and Hestenes) and Hestenes' theorem, giving the necessary conditions for the optimum, is stated without proof. This theorem is general enough to include most problems that appear in this book. Finally, in section 4, Hestenes' theorem is used to solve the control problems in Chapter I. 1. A Simple Control Problem Consider a dynamic process which starts at inital time 0 t and ends at terminal time 1 t . Both 0 t and 1 t are taken as given in this section. For simplicity, the state of the system is described by only one variable, x(t) , called the state variable. In most economic problems the state variable is usually a stock, such as the amounts of capital equipments and inventories available at time t. In Chapters IV and V of our book the volume of traffic at a radius is a state variable. The state of the system is influenced by the choice of control variables, ( ), ( ), , ( ), 1 2 u t u t u t K r which are summarized as the control vector, ( ) ( ( ), ( ), , ( )). 1 2 u t u t u t u t = K r (1.1) The control vector must lie inside a given subset of a Euclidean r-dimensional space, U: ( ) , , 0 1 u t ÎU t £ t £ t (1.2) where U is assumed to be closed and unchanging. Note that control variables are chosen at each point of time. The rate of investment in capital equipment is one of the control variables in most models of capital accumulation; the rate of inventory investment is a variable in inventory adjustment models; and the population per unit distance is a control variable for the models in this book. An entire path of the control vector, u(t) , , 0 1 t £ t £ t is a vector-valued function u(t) from the interval [ ] 0 1 t ,t into the r-dimensional space and is simply called a control. A control is admissible if it satisfies the constraint (1.2) and some other regularity conditions which will be specified in section 3. The state variable moves according to the differential equation ( ) ( ( ), ( ), ), 1 x t f x t u t t dt dx = & = (1.3) where 1 f is assumed to be continuously differentiable. Notice that the function 1 f , is not the same as 0 f . In this section the initial state, ( ) 0 x t , is given, ( ) , 0 0 x t = x (1.4)
where x is some constant, but the terminal state, x(t,), is unrestricted. For example, the capital stock at initial time is fixed the rate of change of the capital stock equals the rate of investment minus depreciation; and the capital stock at terminal time is not restricted The problem to be solved is that of maximizing the objective functional J=6(x(n),u(t),)d+S0(x(1)1) with to the control vecto nstraints(1. 2), (1.3),and(1. 4), where fo and So, the functions which make up the objective functional are continuously differentiable. A functional is defined as a function of a function or functions. that apping from a space of functions to a space of numbers. In the investment decision problem for a firm, for example, fo(x(O), u(t),ndt is the amount of profit earned in the time interval t, t+dt and So(x(tu), 41) is the scrap value of the amount of capital x(t,)at terminal time t, fIxIt, tt)m).E Fiqure la, A Trajectory of the Stata variable objective Functional fixt),u(4h, t) t+A time Figure 1b. -he Dbjective Functional
Appendix IV 191 where 0 x is some constant, but the terminal state, ( )1 x t , is unrestricted. For example, the capital stock at initial time is fixed; the rate of change of the capital stock equals the rate of investment minus depreciation; and the capital stock at terminal time is not restricted. The problem to be solved is that of maximizing the objective functional ò = + 1 0 ( ( ), ( ), ) ( ( ), ) 0 0 1 1 t t J f x t u t t dt S x t t (1.5) with respect to the control vector, u(t) , , 0 1 t £ t £ t subject to the constraints (1.2), (1.3), and (1.4), where 0 f and S0 , the functions which make up the objective functional are continuously differentiable. A functional is defined as a function of a function or functions, that is, a mapping from a space of functions to a space of numbers. In the investment decision problem for a firm, for example, f (x(t),u(t),t)dt 0 is the amount of profit earned in the time interval [t,t + dt] and ( ( ), ) 0 1 1 S x t t is the scrap value of the amount of capital ( )1 x t at terminal time 1 t
The problem is illustrated in Figure 1. In Fig. la, a possible trajectory of the state variable with the initial value x is depicted. If the trajectory of the control vector is specified for the entire time horizon [o, t,I, the trajectory of the state variable is completely characterized. The value of the state variable at time t and the choice of the control vector then jointly determine fo(x(o),u(o),t) In Fig. lb we graph the part of the value of the objective functional which has been realized at any time t for the particular trajectory of the control vector fo therefore, appears as the slope in Fig 1b, while the value of the objective functional the sum of the integral from fo to t,, of fo, and So, the scrap value at terminal time Our problem is to obtain the trajectory of the control vector that maximizes the The major difficulty of this problem lies in the fact that an entire time path of the control vector must be chosen. This amounts to a continuously infinite number of control variables. In other words, what must be found is not just the optimal numbers but the optimal functions. The basic idea of control theory is to transform the problem hal uing the entire optimal path of control variables into the problem of find ing the optimal values of control variables at each instant of time. In this way the problem of choosing an infinite number of variables is decomposed into an infinite number of mor elementary problems each of which involves determining a finite number of variables The objective functional can be broken into three pieces for any time t-a past, a t and a futu f0(x(),u(n),)dn ∫f6(x0):()d f0(x(0),u(),)dr+S0(x(1),1) The decisions taken at any time have two effects. They directly affect the present erm fo(x(t), u(t), !)dt by changing fo. They also change x, and hence the future path of x(o), through i=f(x(o), u(t),t). The new path of x(o) changes the future part of the functional For example, if a firm increases investment at time t, the rate at which profits are earned at that time falls because the firm must pay for the investment. The investment however, increases the amount of capital available in the future and therefore profits earned in the future. The firm must make investment decisions weighing these two effects. In general, the choice of the control variables at any instant of time must take into account both the instantaneous effect on the current earnings foAt and the indirect effect on the future earnings [ fodr'+So through a change in the state 192
Appendix IV 192 The problem is illustrated in Figure 1. In Fig.la, a possible trajectory of the state variable with the initial value 0 x is depicted. If the trajectory of the control vector is specified for the entire time horizon [ ] 0 1 t ,t , the trajectory of the state variable is completely characterized. The value of the state variable at time t and the choice of the control vector then jointly determine ( ( ), ( ), ) 0 f x t u t t . In Fig.1b we graph the part of the value of the objective functional which has been realized at any time t for the particular trajectory of the control vector 0 f , therefore, appears as the slope in Fig.1b, while the value of the objective functional is the sum of the integral from 0 t to 1 t , of 0 f , and S0 , the scrap value at terminal time. Our problem is to obtain the trajectory of the control vector that maximizes the objective functional. The major difficulty of this problem lies in the fact that an entire time path of the control vector must be chosen. This amounts to a continuously infinite number of control variables. In other words, what must be found is not just the optimal numbers but the optimal functions. The basic idea of control theory is to transform the problem of choosing the entire optimal path of control variables into the problem of finding the optimal values of control variables at each instant of time. In this way the problem of choosing an infinite number of variables is decomposed into an infinite number of more elementary problems each of which involves determining a finite number of variables. The objective functional can be broken into three pieces for any time t - a past, a present and a future - : ò ò ò +D +D + ¢ ¢ ¢ ¢+ + ¢ ¢ ¢ ¢ = ¢ ¢ ¢ ¢ 1 0 ( ( ) , ( ) , ) ( ( ), ). ( ( ) , ( ) , ) ( ( ) , ( ) , ) 0 0 1 1 0 0 t t t t t t t t f x t u t t dt S x t t f x t u t t dt J f x t u t t dt The decisions taken at any time have two effects. They directly affect the present term, ò +D ¢ ¢ ¢ ¢ t t t f (x(t) ,u(t) ,t )dt 0 , by changing 0 f . They also change x& , and hence the future path of x(t) , through ( ( ), ( ), ) 1 x& = f x t u t t . The new path of x(t) changes the future part of the functional. For example, if a firm increases investment at time t, the rate at which profits are earned at that time falls because the firm must pay for the investment. The investment, however, increases the amount of capital available in the future and therefore profits earned in the future. The firm must make investment decisions weighing these two effects. In general, the choice of the control variables at any instant of time must take into account both the instantaneous effect on the current earnings f Dt 0 and the indirect effect on the future earnings ò +D ¢+ 1 0 0 t t t f dt S through a change in the state
variable. The transformation of the problem is accomplished if a simple way to represent these two effects is found This leads us to the concept of the value function, which might be used by a planner who wanted to recalculate the optimal policy at time t after the dynamic process began. Consider the problem of maximizing f(x(),u(t),)'+S0(x(1),41) when the state variable at time t is x; x(o=x. The maximized value is then a function ofx and t (1.7) which is called the value function. The optimal value of the objective functional for the original problem(1. 2)-(1.5)is J*(x=(D),1)=J*(x),0) (1.8) The usefulness of the value function must be obvious by now: it facilitates the characterization of the indirect effect through a cha the state variable by summarizing the maximum possible value of the objective functional from time t on as a function of the state variable at time t(and n) The next step in the derivation of the necessary conditions for the optimum involves the celebrated Principle of Optimality due to Bellman. The principle exploits the fact that the value of the state variable at time t captures all the necessary information for the decision making from time t on: the paths of the control vector and the state variable up to time t do not make any difference as long as the state variable at time t is the same. This implies that if a planner recalculates the optimal policy at time I given the optimal value of the state variable at that time, the new optimal policy coincides with the original optimal policy. Thus if u*(0),to stsII, is the optimal control for the ariginal problem and x*(O),to stsI, the corresponding trajectory of the state variable. the value function satisfies J*=[6(x*().n*(1),1)d+Sox*(1),) (1.9) Applying the principle of optimality again, we can rewrite(1.9)as J*(x+(0=g1(x*(0)+(O,)d+、6(xm)nnd S0(x*(1)41) (1.10) f(x*(),u*(1),1)d’+J*(x*(t+M),t+△D) for any t and t+ A such that to≤t≤t+At≤l1. This construction allows us to
Appendix IV 193 variable. The transformation of the problem is accomplished if a simple way to represent these two effects is found. This leads us to the concept of the value function, which might be used by a planner who wanted to recalculate the optimal policy at time t after the dynamic process began. Consider the problem of maximizing ò ¢ ¢ ¢ ¢ + 1 ( ( ), ( ), ) ( ( ), ) 0 0 1 1 t t f x t u t t dt S x t t (1.6) when the state variable at time t is x ; x(t) = x . The maximized value is then a function of x and t: J * (x,t), (1.7) which is called the value function. The optimal value of the objective functional for the original problem (1.2)-(1.5) is *( *( ), ) *( , ). 0 0 J x t t = J x t (1.8) The usefulness of the value function must be obvious by now: it facilitates the characterization of the indirect effect through a change in the state variable by summarizing the maximum possible value of the objective functional from time t on as a function of the state variable at time t (and t). The next step in the derivation of the necessary conditions for the optimum involves the celebrated Principle of Optimality due to Bellman. The principle exploits the fact that the value of the state variable at time t captures all the necessary information for the decision making from time t on: the paths of the control vector and the state variable up to time t do not make any difference as long as the state variable at time t is the same. This implies that if a planner recalculates the optimal policy at time t given the optimal value of the state variable at that time, the new optimal policy coincides with the original optimal policy. Thus if *( ), , 0 1 u t t £ t £ t is the optimal control for the original problem and *( ), , 0 1 x t t £ t £ t the corresponding trajectory of the state variable, the value function satisfies * ( *( ), *( ), ) ( * ( ), ). 1 ò 0 + 0 1 1 = ¢ ¢ ¢ ¢ t t J f x t u t t dt S x t t (1.9) Applying the principle of optimality again, we can rewrite (1.9) as ( *( ), *( ), ) *( *( ), ), ( *( ), ) *( *( ), ) ( *( ), *( ), ) ( *( ), *( ), ) 0 0 1 1 0 0 1 f x t u t t dt J x t t t t S x t t J x t t f x t u t t dt f x t u t t dt t t t t t t t t t = ¢ ¢ ¢ ¢+ + D + D + = ¢ ¢ ¢ ¢+ ¢ ¢ ¢ ¢ ò ò ò +D +D +D (1.10) for any t and t + Dt such that 0 1 t £ t £ t +Dt £t . This construction allows us to