CS182 Lecture 13: Planning Algorithms

CS182 − Lecture 13: Planning Algorithms  Agenda  Announcements  Planning as logic and search:  Reasoning about plans using PDDL: forward (progress...
Author: Kevin Price
7 downloads 1 Views 1MB Size
CS182 − Lecture 13: Planning Algorithms  Agenda  Announcements  Planning as logic and search:  Reasoning about plans using PDDL: forward (progression) and backward (regressions) search  Planning with plan grahs: Graphplan, SATplan

Acknowledgement: Slides taken or adapted from Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons and connected with the text, Automated Planning: Theory and Practice: http://projects.laas.fr/planning/ CS 182: Intelligent Systems: Reasoning, Actions, & Plans

Fall 2013

Announcements  Readings:  Today: AIMA 3e, 10.3  Thursday, Oct 24: AIMA 3e, 11.2, skim 11.1; recommended: D. S. Nau. “Current trends in automated planning”, AI Magazine 28(4):43–58, 2007  Background: Ghallab, Nau and Traverso, Automated Planning: Theory and Practice, Morgan Kaufmann, 2004.

 Sections this week: planning representations and algorithms.

 Assignment date changes:  Assignment 4: available Thursday, due November 5  Project proposals: now due Monday, November 4 CS 182: Intelligent Systems: Reasoning, Actions, & Plans

Fall 2013

Planning = Logic + Search  Logic to express information about actions and states, and logical representations that enable planners to take advantage of action structure in reasoning about which actions to do when.

 Search to generate a sequence of actions that accomplishes goal (gets to state satisfying goal propositions).

Factored Representations of Actions: PDDL  Agent: one that acts  Actions: simple and complex  Beliefs: what an agent knows  Goals (aka Desires): the states of the world an agent “prefers”

 Intentions: actions the agent is committed to  Recipes: descriptions/representations of ways to do a complex action or accomplish a task.

 Plans: recipes+beliefs+intentions

FOL Representations of States and Actions  Motivation: richer representations to support reasoning  states are sets of features (e.g., ground atoms in FOL)  actions are represented by structured operators

 Represent relations among domain objects with ground atoms:  Ground expression: contains no variable symbols; e.g., in(c1,p3)  Unground expression: at least one variable symbol; e.g., in(c1,x)

 Substitution:  = {x1  v1, x2  v2, …, xn  vn}  Each xi is a variable symbol; each vi is a term  Instance of exp: result of applying a substitution  to exp; replace variables of e simultaneously, not sequentially

PDDL Operators  Operators (schema) are used to model (classes of) actions:

 Operator schema include  Header (action name and arguments): a(p1,…, pk)

 Preconditions: conjunction of literals; what must be true (immediately) before action is done.

 Effects: add, delete lists: conjunction of positive and negative literals.

Operators, More Formally  Operator: a triple o=(name(o), precond(o), effects(o))  precond(o): preconditions  literals that must be true in order to use the operator  effects(o): effects  literals the operator will make true  name(o): a syntactic expression of the form n(x1,…,xk)  n is an operator symbol - must be unique for each operator  (x1,…,xk) is a list of every variable symbol (parameter) that appears in o

 Purpose of name(o) is so we can refer unambiguously to instances of o  Rather than writing each operator as a triple, often is written as follows: pack(agt,obj,cont,loc) ;;agent agt packs object obj in container cont at location loc pre: at(agent,loc),at(obj,loc),at(cont,loc),empty(cont) effect: in(obj,cont), ¬empty(cont), ¬at(obj,loc)

example action: pack(Robbie, Books2, Carton1,RobRoom)

PDDL Operators for Semester Abroad  Pack(agt,obj,cont,loc)  pre: at(agent,loc),at(obj,loc),at(cont,loc),empty(cont)  eff: in(obj,cont), ¬empty(cont), ¬at(obj,loc)

 Walk(agt,loc1,loc2)  pre: at(agt,loc1)  eff: at(agt,loc2)

 BikeGo(agt,loc1,loc2,bike)  pre: at(agt,loc1),at(bike,loc1)  eff: at(agt,loc2),at(bike,loc2) CS 182: Intelligent Systems: Reasoning, Actions, & Plans

Fall 2013

PDDL Operators for Semester Abroad (cont’d)  ApproveStudyPlan(advisor,student,plan,loc)  pre: at(advisor,loc), at(student,loc), finished(student,plan)  eff: approved(student,plan)

 GetVaccine(agt,disease)  pre: at(agt, clinic), ¬vaccinated(agt,disease)  effect: vaccinated(agt,disease)

 PlaneTicket(agt,fromcity,tocity)  pre: have(agt,fare)  effect: haveticket(fromcity,tocity), ¬have(agt,fare) CS 182: Intelligent Systems: Reasoning, Actions, & Plans

Fall 2013

PDDL Operators for Semester Abroad (cont’d)  CleanOutRoom(agt,room)  pre: at(agt,room)  eff: empty(room),clean(room)

 AdjSuitcaseWeight(suitcase)  pre: packed(suitcase)  effect: weight(suitcase,OK)

 3 groups to report!!

CS 182: Intelligent Systems: Reasoning, Actions, & Plans

Fall 2013

States and Actions  State: a set s of ground atoms  The atoms represent the things that are true in the state  Only finitely many ground atoms, so only finitely many possible states

 An action is a ground instance (via substitution) of an operator  Let  = {agt  Robbie, obj  RTextBs, cont  Carton1, loc  RRm}

 Then (pack(agt,obj,cont,loc) is the following action: Pack(Robbie,RTextBs,Carton1,RRm)

precond: at(Robbie,RRm), at(RTextBs,RRm), at(Carton1,RRm), empty(Carton1)

effects:

in(RTextBs,Carton1), empty(Carton1), at(RTextBs,RRm)

Notation  For S, a set of literals,  S+ = {atoms that appear positively in S}  S– = {atoms that appear negatively in S}

 For an operator a be an operator,    

precond+(a) = {atoms that appear positively in a’s preconditions} precond–(a) = {atoms that appear negatively in a’s preconditions} effects+(a) = {atoms that appear positively in a’s effects} effects–(a) = {atoms that appear negatively in a’s effects}

Notation Example  Pack(agt,obj,cont,loc)  pre: at(agent,loc),at(obj,loc),at(cont,loc),empty(cont)  eff: in(obj,cont), ¬empty(cont), ¬at(obj,loc)

 pack(Robbie,RTextBs,Carton1,RRm)  effects notation:  effects+(pack(Robbie,RTextBs,Carton1,RRm) = {in(RTextBs,Carton1)}  effects–(pack(Robbie,RTextBs,Carton1,RRm) = {empty(Carton1), at(RTextBs,RRm)}

Applicability  Let s be a state and a be an action  a is applicable to (or executable in) s if s satisfies precond(a)  precond+(a)  s  precond–(a)  s =   notation: (s,a)

 Executing an applicable action:  Remove a’s negative effects and add a’s positive effects  (s,a) = (s – effects–(a))  effects+(a)

Planning Problems Formally  Given a planning domain (language L, operators O)  Statement of a planning problem: a triple P=(O,s0,g)  O is the collection of operators  s0 is a state (the initial state)  g is a set of literals (the goal formula)  Planning problem: P = (,s0,Sg)  s0 = initial state  Sg = set of goal states   = (S,A,) is a state-transition system  S = {all sets of ground atoms in L}  A = {all ground instances of operators in O}   = the state-transition function determined by the operators

Plans and Solutions  Let P=(O,s0,g) be a planning problem  Plan: any sequence of actions π = a1, a2, …, an such that each ai is an instance of an operator in O

 π is a solution for P=(O,s0,g) if it is executable and achieves g; i.e., if there are states s0, s1, …, sn such that     

 (s0,a1) = s1  (s1,a2) = s2 …

 (sn–1,an) = sn sn satisfies g

Using the Structure  The PDDL language structure can be exploited to achieve search efficiently and in both directions.

INITIAL PRECOND STATE Only some actions are relevant

EFFECT GOAL Only some actions could put us here

Planning via Search  We will search for a sequence of actions that take us from initial state to goal. Yields a totally-ordered plan. Next time we’ll look at partial plans.

 State-space planning: Each node represents a state of the world, as represented in FOL.

 Forward or Progressive Search: Start from the initial state, construct tree of actions, using any complete search technique.

 Backward or Regressive Search: Start from the goal state and search for actions that can achieve each of the goals. (cf. backward chaining in logic)

Two Ways to Use PDDL Operators  Progression: search forward from initial state looking for goal state:  Determine operators that apply in current situation  Huge search space: Domain-independent heuristics to the rescue (later).

 Regression: search backwards from goal state to initial state.  Pick goal to satisfy and look for operator effect (add) that mentions.  Challenge: determine what must be true to apply an operator.

Progressive (Forward) Planning  Initial State: set of initial world literals  Goal State: goal literals  Search operator:  Choose an Action A, whose preconditions are satisfied  Construct new state by using effects of Action A: adding positive literals and removing negative literals

 Path Cost: 1 per step (number of actions)  State space if finite  Methods: any search algorithm, but some better than others (coming up)

Forward Search

goal: in(RClothes,RSuitcase)

si: at(Robbie,LR) at(RSuitcase,Attic) at(RClothes,RRm) at(RBike,LR)

Walk(Robbie,LR,attic)

... Walk(Robbie,LR,RRm) BikeGo(Robbie,LR,AdvOffice) …

Properties  Forward-search is sound  for any plan returned by any of its nondeterministic traces, this plan is guaranteed to be a solution

 Forward-search also is complete  if a solution exists then at least one of Forward-search’s nondeterministic traces will return a solution.

Deterministic Implementations  Some deterministic implementations of forward search:  breadth-first search s0  depth-first search  best-first search (e.g., A*)  greedy search

a1 a2 a3

s1

a4

s2

a5

s4 s5



s3

 Breadth-first and best-first search are sound and complete  But they usually aren’t practical because they require too much memory  Memory requirement is exponential in the length of the solution  In practice, more likely to use depth-first search or greedy search  Worst-case memory requirement is linear in the length of the solution  In general, sound but not complete  But classical planning has only finitely many states  Thus, can make depth-first search complete by doing loop-checking

sg

Two Ways to Use PDDL Operators  Progression: search forward from initial state looking for goal state:  Determine operators that apply in current situation  Huge search space: Domain-independent heuristics to the rescue (later).

 Regression: search backwards from goal state to initial state.  Pick goal to satisfy and look for operator effect (add) that mentions.  Challenge: determine what must be true to apply an operator.

Regressive (Backward) Planning  Initial State: goal literals  Search operator:  Choose an Action A in state X, such that A is  Relevant: effects has at least one of the literals of current state  Consistent: effects do not negate another literal in X  Then perform regression (“regress goal through action”)  New state = remove all positive effects, add preconditions of A

 Goal State: initial world literals  Method: Again any search technique applies

Backward Search, Formally  For forward search, we started at the initial state and computed state transitions  new state = (s,a)

 For backward search, we start at the goal and compute inverse state transitions  new set of subgoals = –1(g,a)

 To define -1(g,a), must first define relevance:  An action a is relevant for a goal g if  a makes at least one of g’s literals true  g  effects(a) ≠   a does not make any of g’s literals false  g+  effects–(a) =  and g–  effects+(a) = 

Blocks World Regression Example pickup(x):

pre: ontable(x), clear(x), HE (i.e., hand empty) eff: holding(x), ¬ontable(x), ¬clear(x), ¬HE putdown(x): pre: holding(x) eff: ontable(x), clear(x), HE, ¬holding(x) stack(x,y): pre: holding(x), clear(y)

eff: HE, on(x,y), clear(x), ¬holding(x), ¬clear(y) unstack(x,y): …

Simple Regression Examples Regress (on(A,B), Pickup(C)) = on(A,B) Regress (on(A,B), Stack(A,B)) = True Regress (HE,Pickup (A)) = False

 If goal regresses to True, it can be eliminated from goal set.

 If goal regresses to False, goal set cannot be satisfied, so prune branch.

Inverse State Transitions  If a is relevant for g, then  –1(g,a) = (g – effects(a))  precond(a)

 Otherwise –1(g,a) is undefined

 Example: suppose that  g = {on(b1,b2), on(b2,b3)}  a = stack(b1,b2)

 What is –1(g,a)?

g4

a4

g1 g2

s0

g5

a1 a2

a5 g3

a3

g0

Efficiency of Backward Search  Backward search can also have a very large branching factor  E.g., an operator o that is relevant for g may have many ground instances a1, a2, …, an such that each ai’s input state might be unreachable from the initial state

 As before, deterministic implementations can waste lots of time trying all of them

b1 b1

b2

b3

initial state



b50

goal

Lifting ontable(b1) pickup(b1)

on(b1,b1) unstack(b1,b1) unstack(b1,b2)

...

on(b1,b2)

holding(b1)

unstack(b1,b50)

on(b1,b50)

 Can reduce the branching factor of backward search if we partially instantiate the operators  this is called lifting

ontable(b1) on(b1,y)

pickup(b1)

unstack(b1,y)

holding(b1)

Lifted Backward Search  More complicated than Backward-search, because have to keep track of the substitutions that were performed

 But it has a much smaller branching factor

Search Space Still Too Large  Lifted-backward-search generates a smaller search space than Backward-search, but it still can be quite large  Suppose actions a, b, and c are independent, action d must precede all of them, and there’s no path from s0 to d’s input state  We’ll try all possible orderings of a, b, and c before realizing there is no solution  More about this in Chapter 5 (Plan-Space Planning) d

s0

a

b

d

b

a

d

b

a

d

a

c

d

b

c

d

c

b

c

b

a

goal

Aside: STRIPS Challenge1: The Sussman Anomaly a c a

b b

Initial state

c

goal

 On this problem, STRIPS can’t produce an irredundant solution  Try it and see

Aside: STRIPS Challenge 2: Register Assignment Problem  Interchange the values stored in two registers  State-variable formulation:  registers r1, r2, r3 s0: {value(r1)=3, value(r2)=5, value(r3)=0} g:

{value(r1)=5, value(r2)=3}

Operator: assign(r,v,r',v') precond: value(r)=v, value(r')=v' effects: value(r)=v'

 STRIPS cannot solve this problem at all

Summary: PDDL for Progression and Regression  The PDDL language encoding can expose problem structure  Key Take-Away Ideas  (a) Action schema allows systematic search forwards & backwards  (b) BUT Efficiency still requires smarts! (which actions when?); need heuristics.  Next: Action language allows you automatically generate heuristics!

PRECOND Only some actions CS 182: Intelligent Systems: Reasoning, Actions, & Plans are relevant

EFFECT Only some actions Fall 2013 could put us here

Planning Graphs: An Alternative Representation for Search  Motivation: A big source of inefficiency in search algorithms is the branching factor, i.e., the number of children of each node  forward search may try operators that don’t help  a backward search may try lots of actions that can’t be reached from the initial state

 Planning graphs: a new data structure that simplifies search and incremental approach that starts with “relaxed problem”

 The Graphplan Algorithm  Constructing planning graphs  Mutual exclusion  Solution extraction

Graphplan procedure Graphplan:

 for k = 0, 1, 2, …  Graph expansion: relaxed  create a “planning graph” that contains k “levels” problem  Check whether the planning graph satisfies a necessary (but insufficient) condition for plan existence  If it does, then possible possible literals actions  do solution extraction:  backward search, modified to consider only the actions in the planning graph

 if we find a solution, then return it

in state si in state si

The Planning Graph  Search space for a relaxed version of the planning problem  Alternating layers of ground literals and actions  Nodes at action-level i: actions that might be possible to execute at time i  Nodes at state-level i: literals that might possibly be true at time i  Edges: preconditions and effects state-level i-1

action-level i

state-level i

state-level 0 (the literals true in s0)

preconditions Maintenance action: for the case where a literal remains unchanged

effects

Example (from Dan Weld, U. Washington)  Suppose you want to prepare dinner as a surprise for your sweetheart who is asleep, initial state and goals are s0 = {garbage, cleanHands, quiet} g = {dinner, present, garbage} Action

Preconditions

Effects

cook() wrap()

cleanHands quiet

dinner present

carry() none garbage, cleanHands dolly() none garbage, quiet Also have the maintenance actions: one for each literal

Example (continued)  state-level 0: {all atoms in s0} U {negations of all atoms not in s0}

state-level 0 action-level 1 state-level 1

 action-level 1: {all actions whose preconditions are satisfied and non-mutex in s0}

 state-level 1: {all effects of all of the actions in action-level 1} Action Preconditions Effects cook() cleanHands dinner wrap() quiet present carry() none garbage, cleanHands dolly() none garbage, quiet Also have the maintenance actions

dinner present

dinner present

Mutual Exclusion

 Two actions at the same action-level are mutex if  Inconsistent effects: an effect of one negates an effect of the other  Interference: one deletes a precondition of the other  Competing needs: they have mutually exclusive preconditions  Otherwise they don’t interfere with each other  Both may appear in a solution plan  Two literals at the same state-level are mutex if  Inconsistent support: one is the negation of the other, or all ways of achieving them are pairwise mutex

Recursive propagation of mutexes

Example (continued)  Augment the graph to indicate mutexes

state-level 0 action-level 1 state-level 1

 carry is mutex with the maintenance action for garbage (inconsistent effects)

 dolly is mutex with wrap  interference  ~quiet is mutex with present  inconsistent support

 each of cook and wrap is mutex with a maintenance operation Action Preconditions Effects cook() cleanHands dinner wrap() quiet present carry() none garbage, cleanHands dolly() none garbage, quiet Also have the maintenance actions

dinner present

dinner present

Example (continued) state-level 0 action-level 1 state-level 1

 Check to see whether there’s a possible solution

 Recall that the goal is  {garbage, dinner, present}

 Note that in state-level 1,  All of them are there  None are mutex with each other  Thus, there’s a chance that a plan exists  Try to find it  Solution extraction

dinner present

dinner present

Solution Extraction The set of goals we are trying to achieve

The level of the state sj

procedure Solution-extraction(g,j) A real action or a maintenance action if j=0 then return the solution for each literal l in g nondeterministically choose an action statestateactionto use in state s j–1 to achieve l level level level if any pair of chosen actions are mutex i-1 i i then backtrack g' := {the preconditions of the chosen actions} Solution-extraction(g', j–1) end Solution-extraction

Example (continued) state-level 0 action-level 1 state-level 1

 Two sets of actions for the goals at state-level 1

 Neither of them works  Both sets contain actions that are mutex

dinner present

dinner present

Recall what the algorithm does procedure Graphplan:

 for k = 0, 1, 2, …  Graph expansion:  create a “planning graph” that contains k “levels”  Check whether the planning graph satisfies a necessary (but insufficient) condition for plan existence  If it does, then  do solution extraction:  backward search, modified to consider only the actions in the planning graph  if we find a solution, then return it

Example (continued) state-level 0 action-level 1 state-level 1 action-level 2 state-level 2

 Go back and do more graph expansion

 Generate another action-level and another statelevel

dinner present

dinner present

dinner present

Example (continued) state-level 0 action-level 1 state-level 1 action-level 2 state-level 2

 Solution extraction

 Twelve combinations at level 4  Three ways to

achieve garb  Two ways to achieve dinner  Two ways to achieve present

dinner present

dinner present

dinner present

Example (continued) state-level 0 action-level 1 state-level 1 action-level 2 state-level 2

 Several of the combinations look OK at level 2

 Here’s one of them

dinner present

dinner present

dinner present

Example (continued)  Call SolutionExtraction recursively at level 2

state-level 0 action-level 1 state-level 1 action-level 2 state-level 2

 It succeeds

 Solution whose parallel length is 2

dinner present

dinner present

dinner present



Comparison with Plan-Space Planning Advantage:  The backward-search part of Graphplan—which is the hard part—will only look at the actions in the planning graph  smaller search space, thus faster

 Disadvantage:  To generate the planning graph, Graphplan creates a huge number of ground atoms  Many of them may be irrelevant

 Can alleviate (but not eliminate) this problem by assigning data types to the variables and constants  Only instantiate variables to terms of the same data type

 For classical planning, the advantage outweighs the disadvantage  GraphPlan solves classical planning problems much faster than PSP

Planning Graph for Heuristic Search  Given a planning problem Pv, create a relaxed planning problem P'v and use GraphPlan to solve it  Convert to set-theoretic representation and no negative literals;

 

 

goal is now a set of atoms. Remove the delete lists from the actions. Construct a planning graph until a layer is found that contains all of the goal atoms. The graph will contain no mutexes because the delete lists were removed. Extract a plan π' from the planning graph  No mutexes  no backtracking  polynomial time

 |π'| is a lower bound on the length of the best solution to Pv

GraphPlan Heuristic  Use reachability “level” as a heuristic for forward search.  P = (A,si,g) be a propositional planning problem and G = (N,E) the corresponding planning graph  g = {g1,…,gn}  gk, k ϵ [1,n], is reachable from si if there is a proposition layer Pg such that gk ϵ Pg  in proposition layer Pm: if gk not in Pm then gk not reachable in m steps

 define (admissible) hPG(gk) = m for {gk}reachable in m steps

CS 182: Intelligent Systems: Reasoning, Actions, & Plans

Fall 2013

Key Concepts in Modeling Plan-based Activities  Agent: one that acts  Actions: simple and complex  Beliefs: what an agent knows  Desires: the states of the world an agent “prefers”  Intentions: actions the agent is committed to  Recipes: descriptions/representations of ways to do a complex action or accomplish a task.

 Plans: recipes+beliefs+intentions

Announcements  Reading for Thursday, Oct 24: AIMA 3e, 11.2, skim 11.1.  Also recommended: D. S. Nau. “Current trends in automated planning”, AI Magazine 28(4):43–58, 2007  Background: Ghallab, Nau and Traverso, Automated Planning: Theory and Practice, Morgan Kaufmann, 2004.

 Assignment date changes:  Assignment 4: available Thursday, due November 5  Project proposals: now due Sunday, November 3.

 Sections this week: planning as logic+search  Course mid-semester questionnaire (updated) CS 182: Intelligent Systems: Reasoning, Actions, & Plans

Fall 2013