11 PLANNING In which we see how an agent can take advantage of the structure of a problem to construct complex plans of action. The task of coming up with a sequence of actions that will achieve a goal is called plannins We have seen two examples of planning agents so far:the search-based problem-solving logical planning agent of Chapter 10.This chapter is concerned rimarily with scen so far. scaling up to ing problems that defeat the approaches we have ion 11.1 develops an expressive yet carefully constrained lan ge for rep enting The la elated to and first-order ntations of actions in c s7and 10.Section now for ard and backv search alg rithms can take of this tation, rily t at e he (This ies that can be d fro anal to the int oo )Se 113 h11 ug ning algc ithms tha nd fo em In parti onstra we conside y environments that are fully obse able,determini finite static e ag in time,acton and effects) The c acts),and ica plan environm noncla nning is fo s and involves a different set 11.1 THE PLANNING PROBLEM Let us consider what can happen when an ordinary problem-solving agent using standard search algorithms-depth-first,A',and so on-comes up against large,real-world problems. That will help us design better planning agents. 375
11 PLANNING In which we see how an agent can take advantage of the structure of a problem to construct complex plans of action. The task of coming up with a sequence of actions that will achieve a goal is called planning. We have seen two examples of planning agents so far: the search-based problem-solving agent of Chapter 3 and the logical planning agent of Chapter 10. This chapter is concerned primarily with scaling up to complex planning problems that defeat the approaches we have seen so far. Section 11.1 develops an expressive yet carefully constrained language for representing planning problems, including actions and states. The language is closely related to the propositional and first-order representations of actions in Chapters 7 and 10. Section 11.2 shows how forward and backward search algorithms can take advantage of this representation, primarily through accurate heuristics that can be derived automatically from the structure of the representation. (This is analogous to the way in which effective heuristics were constructed for constraint satisfaction problems in Chapter 5.) Sections 11.3 through 11.5 describe planning algorithms that go beyond forward and backward search, taking advantage of the representation of the problem. In particular, we explore approaches that are not constrained to consider only totally ordered sequences of actions. For this chapter, we consider only environments that are fully observable, deterministic, finite, static (change happens only when the agent acts), and discrete (in time, action, objects, and effects). These are called classical planning environments. In contrast, nonclassical CLASSICAL PLANNING planning is for partially observable or stochastic environments and involves a different set of algorithms and agent designs, outlined in Chapters 12 and 17. 11.1 THE PLANNING PROBLEM Let us consider what can happen when an ordinary problem-solving agent using standard search algorithms—depth-first, A ∗ , and so on—comes up against large, real-world problems. That will help us design better planning agents. 375
376 Chapter 11.Planning The most obvious difficulty is that the problem-olving agent can be overwhelmed by Consid buying a copy from ar of 10 bi Suppose there is one all 1o acto nnd one that sat the goal whic 0137903952 A sensible planning agent, ab SN)dirsetly.Todo ths the amcedthe general knowed t goal de ption sucn as 903952 e the a that Buy()results in Have().Given this knowledge and the goal,the planner can decide in a single unification step that Buy(ISBN0137903952)is the right action. The next difficulty is finding a good heuristic function.Suppose the agent's goal is to buy four different books online.Then there will be 10 plans of just four steps,so searching without an accurate heuristic is out of the question.It is obvious to a human that a good heuristic estimate for the cost of a state is the number of books that remain to be bought unfortunately,this insight is not obvious to a problem-solving agent,because it sees the goal test only as a black box that returns true or false for each state.Therefore.the problem solving agent lacks autonomy;it requires a human to supply a heuristic function for each new problem.On the other hand,if a planning agent has access to an explicit representation of the goal as a conjunction of subgoals,then it can use a single domain-independent heuristic:the number of unsatisfied conjuncts.For the book-buying problem,the goal would be Have(A)A Have(B)A Have(C)A Have(D),and a state containing Have(A)A Have(C)would have cost 2.Thus,the agent automatically gets the right heuristic for this problem,and for many others.We shall see later in the chapter how to construct more sophisticated heuristics that examine the available actions as well as the structure of the goal. Finally,the problem solver might be inefficient because it cannot take advantage of problem decomposition.Consider the problem of delivering a set of overnight packages to their respective destinations,which are scattered across Australia.It makes sense to find out the nearest airport for each destination and divide the overall problem into several subprob- lems,one for each airport.Within the set of packages routed through a given airport,whether further decomposition is possible depends on the destination city.We saw in Chapter 5 that the ability to do this kind of decomposition contributes to the efficiency of constraint satisfac tion problem solvers.The same holds true for planners:in the worst case,it can take O(n!) time to find the best plan to deliver n packages,but only O((n/)!)time if the problem can be decomposed into k equal parts As we noted in Chapter 5,perfectly decomposable problems are delicious but rare. The design of many planning systems- particularly the partial-order planners described in Section 11.3-is based on the assumption that most real-world problems are nearly decom- posable.That is,the planner can work on subgoals independently,but might need to do some additional work to combine the resulting subplans.For some problems,this assump- Notice that even the deliv kage is not erfectly dec cases in which i nt ai
376 Chapter 11. Planning The most obvious difficulty is that the problem-solving agent can be overwhelmed by irrelevant actions. Consider the task of buying a copy of AI: A Modern Approach from an online bookseller. Suppose there is one buying action for each 10-digit ISBN number, for a total of 10 billion actions. The search algorithm would have to examine the outcome states of all 10 billion actions to find one that satisfies the goal, which is to own a copy of ISBN 0137903952. A sensible planning agent, on the other hand, should be able to work back from an explicit goal description such as Have(ISBN 0137903952) and generate the action Buy(ISBN 0137903952) directly. To do this, the agent simply needs the general knowledge that Buy(x) results in Have(x). Given this knowledge and the goal, the planner can decide in a single unification step that Buy(ISBN 0137903952) is the right action. The next difficulty is finding a good heuristic function. Suppose the agent’s goal is to buy four different books online. Then there will be 1040 plans of just four steps, so searching without an accurate heuristic is out of the question. It is obvious to a human that a good heuristic estimate for the cost of a state is the number of books that remain to be bought; unfortunately, this insight is not obvious to a problem-solving agent, because it sees the goal test only as a black box that returns true or false for each state. Therefore, the problemsolving agent lacks autonomy; it requires a human to supply a heuristic function for each new problem. On the other hand, if a planning agent has access to an explicit representation of the goal as a conjunction of subgoals, then it can use a single domain-independent heuristic: the number of unsatisfied conjuncts. For the book-buying problem, the goal would be Have(A)∧ Have(B) ∧ Have(C) ∧ Have(D), and a state containing Have(A) ∧ Have(C) would have cost 2. Thus, the agent automatically gets the right heuristic for this problem, and for many others. We shall see later in the chapter how to construct more sophisticated heuristics that examine the available actions as well as the structure of the goal. Finally, the problem solver might be inefficient because it cannot take advantage of problem decomposition. Consider the problem of delivering a set of overnight packages to PROBLEM DECOMPOSITION their respective destinations, which are scattered across Australia. It makes sense to find out the nearest airport for each destination and divide the overall problem into several subproblems, one for each airport. Within the set of packages routed through a given airport, whether further decomposition is possible depends on the destination city. We saw in Chapter 5 that the ability to do this kind of decomposition contributes to the efficiency of constraint satisfaction problem solvers. The same holds true for planners: in the worst case, it can take O(n!) time to find the best plan to deliver n packages, but only O((n/k)! × k) time if the problem can be decomposed into k equal parts. As we noted in Chapter 5, perfectly decomposable problems are delicious but rare.1 The design of many planning systems—particularly the partial-order planners described in Section 11.3—is based on the assumption that most real-world problems are nearly decomposable. That is, the planner can work on subgoals independently, but might need to do NEARLY DECOMPOSABLE some additional work to combine the resulting subplans. For some problems, this assump- 1 Notice that even the delivery of a package is not perfectly decomposable. There may be cases in which it is better to assign packages to a more distant airport if that renders a flight to the nearest airport unnecessary. Nevertheless, most delivery companies prefer the computational and organizational simplicity of sticking with decomposed solutions
Section 11.1.The Planning Problem 377 tion breaks down because working on one subgoal is likely to undo another subgoal.These hat nong sut s pu zles (ike c 8-puzzle) The language of planning problems The preceding discus the -statc 0 e for planr ing algorithms e pr ey is to be a wide but nd a language that is ty of pro . ient algo n thi guage of clas planners s the STRIPS language. Later,we point out some of the many possible variations in STR Representation of states. Planners decompose the world into logical condit ons and represent a state as a conjunction of positive literal s.We will consider propositional lite for example nknown might represent the state of a haple agent we will als use first-order literals:for example,At(Plane,Melbourne)A At(Plane2.Sydney)migh represent a state in the package delivery problem.Literals in first-order state descriptions must be ground and function-free.Literals such as At(z,y)or At(Father(Fred),Sydney are not allowed.The closed-world assumption is used,meaning that any conditions that are not mentioned in a state are assumed false. Representation of goals.A goal is a partially specified state,represented as a conjunc tion of positive ground literals.such as Rich A Famous or At(P2.Tahiti).A propositional state s satisfies a goal g ifs contains all the atoms in g(and possibly others).For example, the state Rich A Famous A Miserable satisfies the goal Rich A Famous. Representation of actions.An action is specified in terms of the preconditions that must hold before it can be executed and the effects that ensue when it is executed.For example,an action for flying a plane from one location to another is: Action(Fly(p,from,to) PRECOND:At(D.from)A Plane(D)A Airport(from)A Airport(to) EFFECT:-At(p,from)A At(p,to)) ACTION SCHEMA This is more properly called an action schema,meaning that it represents a number of dif ferent actions that can be derived by instantiating the variables p.from.and toto different constants.In general,an action schema consists of three parts: .The action name and parameter list-for example,Fly(p,from,to)-serves to identify the action. uetion of function-fre action can be must also a ppear in the action's parameter list. EFFECT The effect is a conjunction of function-free literals describing how the state changes when the action is executed.A positive literal P in the effect is asserted to be true in 2 STRIPs stands for STanford Research Institute Problem Solver
Section 11.1. The Planning Problem 377 tion breaks down because working on one subgoal is likely to undo another subgoal. These interactions among subgoals are what makes puzzles (like the 8-puzzle) puzzling. The language of planning problems The preceding discussion suggests that the representation of planning problems—states, actions, and goals—should make it possible for planning algorithms to take advantage of the logical structure of the problem. The key is to find a language that is expressive enough to describe a wide variety of problems, but restrictive enough to allow efficient algorithms to operate over it. In this section, we first outline the basic representation language of classical planners, known as the STRIPS language.2 Later, we point out some of the many possible variations in STRIPS-like languages. Representation of states. Planners decompose the world into logical conditions and represent a state as a conjunction of positive literals. We will consider propositional literals; for example, Poor ∧ Unknown might represent the state of a hapless agent. We will also use first-order literals; for example, At(Plane1, Melbourne) ∧ At(Plane2, Sydney) might represent a state in the package delivery problem. Literals in first-order state descriptions must be ground and function-free. Literals such as At(x, y) or At(Father(Fred), Sydney) are not allowed. The closed-world assumption is used, meaning that any conditions that are not mentioned in a state are assumed false. Representation of goals. A goal is a partially specified state, represented as a conjunction of positive ground literals, such as Rich ∧ Famous or At(P2, Tahiti). A propositional GOAL SATISFACTION state s satisfies a goal g if s contains all the atoms in g (and possibly others). For example, the state Rich ∧ Famous ∧ Miserable satisfies the goal Rich ∧ Famous. Representation of actions. An action is specified in terms of the preconditions that must hold before it can be executed and the effects that ensue when it is executed. For example, an action for flying a plane from one location to another is: Action(Fly(p, from,to), PRECOND:At(p, from) ∧ Plane(p) ∧ Airport(from) ∧ Airport(to) EFFECT:¬At(p, from) ∧ At(p,to)) ACTION SCHEMA This is more properly called an action schema, meaning that it represents a number of different actions that can be derived by instantiating the variables p, from, and to to different constants. In general, an action schema consists of three parts: • The action name and parameter list—for example, Fly(p, from, to)—serves to identify the action. PRECONDITION • The precondition is a conjunction of function-free positive literals stating what must be true in a state before the action can be executed. Any variables in the precondition must also appear in the action’s parameter list. EFFECT • The effect is a conjunction of function-free literals describing how the state changes when the action is executed. A positive literal P in the effect is asserted to be true in 2 STRIPS stands for STanford Research Institute Problem Solver
378 Chapter 11.Planning the state resulting from the action,whereas a negative literalPis asserted to be false Variables in the effect must also appear in the action's parameter list the semantics The most straightfor states.(An alternative method is to specify a direct translation into s or-state axioms nes from firs rcise 11.3.)First. say that an action the first-order ac establish g app licability will in olve substitu ion for the variables in the precondition.For example,suppose the current state is des ribed bv At(P,JFK)A At(P2,SFO)A Plane(P)A Piane(P2) A Airport(JFK)A Airport(SFO). This state satisfies the precondition At(p,from)A Plane(p)A Airport(from)AAirport(to) with substitution p/P,from/JFK,to/SFO}(among others-see Exercise 11.2).Thus, the concrete action Flu(pIEK SEO)is applicable FESL灯 Starting in state s the result of executing an applicable action a is a state s'that is the same as s except that any positive literal P in the effect of a is added to s'and any negative literal-P is removed from s'.Thus,after Fly(P,JFK,SFO),the current state becomes At(P,SFO)A At(P2,SFO)A Plane(P)A Plane(P2) A Airport(JFK)A Airport(SFO). e山ornd ne下N ot in s then hat t of the effect i mbodies the called strips ed in the effect r s un ged.In this way napter 1 I an a that,wh plest form,this is en e ed in initia results ina apte allow s every actio ons to be parti ordered ets of actions t respects the part order Is a so Expressiveness and extensions The various restrictions imposed by the STRIPs representation were chosen in the hope of making planning algorithms simpler and more efficient,without making it too difficult to deseribe real problems One of the most imnortant restrictions is that literals be fucton- free.With this restriction,we can be sure that any action schema for a given problem can be propositionalized-that is,turned into a finite collection of purely p onositional action representations with no variables (See Chapter 9 for more on this topic For example.in the air cargo domain for a problem with 10 planes and five airports.we could translate the Fly(p,from,to)schema into 1055=250 purely propositional actions.The planners
378 Chapter 11. Planning the state resulting from the action, whereas a negative literal ¬P is asserted to be false. Variables in the effect must also appear in the action’s parameter list. ADD LIST To improve readability, some planning systems divide the effect into the add list for positive DELETE LIST literals and the delete list for negative literals. Having defined the syntax for representations of planning problems, we can now define the semantics. The most straightforward way to do this is to describe how actions affect states. (An alternative method is to specify a direct translation into successor-state axioms, whose semantics comes from first-order logic; see Exercise 11.3.) First, we say that an action APPLICABLE is applicable in any state that satisfies the precondition; otherwise, the action has no effect. For a first-order action schema, establishing applicability will involve a substitution θ for the variables in the precondition. For example, suppose the current state is described by At(P1, JFK) ∧ At(P2, SFO) ∧ Plane(P1) ∧ Plane(P2) ∧ Airport(JFK) ∧ Airport(SFO) . This state satisfies the precondition At(p, from) ∧ Plane(p) ∧ Airport(from) ∧ Airport(to) with substitution {p/P1, from/JFK,to/SFO} (among others—see Exercise 11.2). Thus, the concrete action Fly(P1, JFK, SFO) is applicable. Starting in state s, the result of executing an applicable action a is a state s 0 RESULT that is the same as s except that any positive literal P in the effect of a is added to s 0 and any negative literal ¬P is removed from s 0 . Thus, after Fly(P1, JFK, SFO), the current state becomes At(P1, SFO) ∧ At(P2, SFO) ∧ Plane(P1) ∧ Plane(P2) ∧ Airport(JFK) ∧ Airport(SFO) . Note that if a positive effect is already in s it is not added twice, and if a negative effect is not in s, then that part of the effect is ignored. This definition embodies the so-called STRIPS STRIPS ASSUMPTION assumption: that every literal not mentioned in the effect remains unchanged. In this way, STRIPS avoids the representational frame problem described in Chapter 10. SOLUTION Finally, we can define the solution for a planning problem. In its simplest form, this is just an action sequence that, when executed in the initial state, results in a state that satisfies the goal. Later in the chapter, we will allow solutions to be partially ordered sets of actions, provided that every action sequence that respects the partial order is a solution. Expressiveness and extensions The various restrictions imposed by the STRIPS representation were chosen in the hope of making planning algorithms simpler and more efficient, without making it too difficult to describe real problems. One of the most important restrictions is that literals be functionfree. With this restriction, we can be sure that any action schema for a given problem can be propositionalized—that is, turned into a finite collection of purely propositional action representations with no variables. (See Chapter 9 for more on this topic.) For example, in the air cargo domain for a problem with 10 planes and five airports, we could translate the Fly(p, from,to) schema into 10 × 5 × 5 = 250 purely propositional actions. The planners
Section11.1. The Planning Problem 379 STRIPS Language ADL Language Only positive literals in states Positive and negative literals in states PoorA Unknown -Rich A-Famous Closed World Assumption: Open World Assumption: Unmentioned literals are false. Unmentioned literals are unknown. Effect PAQ means add P and delete Q. Effect PA-Q means add P and-Q and delete-P and Q. Only ground literals in goals: Quantified variables in goals: Rich a Famous At(P)A At(P)is the goal of having乃and乃in the same place. Goals are conjunctions: Goals allow conjunction and disjunction: Rich a Famous -Poor A (Famous V Smart) Effects are conjunctions. Conditional effects allowed: when P:E means E is an effect only if Pis satisfied. No support for equality. Equality predicate(=)is built in. No support for types. Variables can have types,as in(p:Plane). Figure 11.1 Comparison of STRIPS and ADL languages for representing planning prob- lems.In both cases.goals behave as the preconditions of an action with no parameters. in Sections 11.4 and 11.5 work directly with propositionalized descriptions.If we allow function symbols,then infinitely many states and actions can be constructed. In recent years,it has become clear that STRIPs is insufficiently expressive for some real domains.As a result,many language variants have been developed.Figure 11.1 briefly describes one important one,the Action Description Language or ADL,by comparing it with the basic STRIPS language.In ADL,the Fly action could be written as Action(Fly(p:Plane,from:Airport,to:Airport), PRECOND:At(p,from)A(from#to) EFFECT:-At(p,from)A At(p,to)). The notation p:Plane in the parameter list is an abbreviation for Plane(p)in the precondi- tion,this adds no expressive power,but can be easier to read.(It also cuts down on the number of possible propositional actions that can be constructed.)The precondition(from to)ex- presses the fact that a flight cannot be made from an airport to itself.This could not be expressed succinctly in STRIPS. The various planning formalisms used in AI have been systematized within a standard syntax called thethe Planning Domain Definition Language,or PDDL.This language allows researchers to exchange becnchmark problems and compare results.PDDL includes sublan- guages for STRIPS,ADL,and the hierarchical task networks we will see in Chapter 12
Section 11.1. The Planning Problem 379 STRIPS Language ADL Language Only positive literals in states: Positive and negative literals in states: Poor ∧ Unknown ¬Rich ∧ ¬Famous Closed World Assumption: Open World Assumption: Unmentioned literals are false. Unmentioned literals are unknown. Effect P ∧ ¬Q means add P and delete Q. Effect P ∧ ¬Q means add P and ¬Q and delete ¬P and Q. Only ground literals in goals: Quantified variables in goals: Rich ∧ Famous ∃xAt(P1, x) ∧ At(P2, x) is the goal of having P1 and P2 in the same place. Goals are conjunctions: Goals allow conjunction and disjunction: Rich ∧ Famous ¬Poor ∧ (Famous ∨ Smart) Effects are conjunctions. Conditional effects allowed: when P: E means E is an effect only if P is satisfied. No support for equality. Equality predicate (x = y) is built in. No support for types. Variables can have types, as in (p : Plane). Figure 11.1 Comparison of STRIPS and ADL languages for representing planning problems. In both cases, goals behave as the preconditions of an action with no parameters. in Sections 11.4 and 11.5 work directly with propositionalized descriptions. If we allow function symbols, then infinitely many states and actions can be constructed. In recent years, it has become clear that STRIPS is insufficiently expressive for some real domains. As a result, many language variants have been developed. Figure 11.1 briefly ADL describes one important one, the Action Description Language or ADL, by comparing it with the basic STRIPS language. In ADL, the Fly action could be written as Action(Fly(p : Plane, from : Airport,to : Airport), PRECOND:At(p, from) ∧ (from 6= to) EFFECT:¬At(p, from) ∧ At(p, to)) . The notation p : Plane in the parameter list is an abbreviation for Plane(p) in the precondition; this adds no expressive power, but can be easier to read. (It also cuts down on the number of possible propositional actions that can be constructed.) The precondition (from 6= to) expresses the fact that a flight cannot be made from an airport to itself. This could not be expressed succinctly in STRIPS. The various planning formalisms used in AI have been systematized within a standard syntax called the the Planning Domain Definition Language, or PDDL. This language allows researchers to exchange becnchmark problems and compare results. PDDL includes sublanguages for STRIPS, ADL, and the hierarchical task networks we will see in Chapter 12