The Maximum Principle for Optimal Control

Stellenbosch University Honours project 2012 The Maximum Principle for Optimal Control Author: Paveshen Padayachee Supervisor: Dr. P. Ouwehand A...
5 downloads 2 Views 373KB Size
Stellenbosch University

Honours project 2012

The Maximum Principle for Optimal Control

Author: Paveshen Padayachee

Supervisor: Dr. P. Ouwehand

Abstract An introductory look into control theory and optimal control is made with the major goal being the statement and proof of Pontryagin’s Maximum Principle. A minor portion is devoted to questions of controllability, thus the problem of if controls exist and determining results when they are optimal. In the proof of the Maximum Principle, the reachable set is approximated by cones of mappings of needle variations. A final look at dynnamic programming and its use in determining the adjoint in the Maximum Principle is made.

i

Contents Abstract

i

Chapter 1. Control Systems 1. Introduction 2. Definitions 3. Examples

1 1 1 4

Chapter 2. Controllability 1. Definitions 2. Conditions for Controllability 3. Examples

5 5 6 7

Chapter 3. The Maximum Principle 1. Calculus of Variations 2. Variations of Curves 3. Reachability 4. The Proof of The Maximum Priciple 5. Examples

9 9 12 16 17 23

Chapter 4. Dynamic Programming 1. The Dynamic Programming Equation 2. Link to The Maximum Principle

25 25 26

Appendix A. Things Topological and Geometric 1. Hyperplanes 2. Cones,Convex Sets,Constraint Sets and Edged Sets 3. Analytical Results

29 29 29 30

Appendix.

31

Bibliography

iii

CHAPTER 1

Control Systems 1. Introduction Since recorded history, humans have always sought to manipulate the universe to a gain. It is in this natural setting that control theory exists. The theory is spurred by many real world problems. The basic problem is this, given a dynamical system and a object in a starting state, can one ’control’ the object so that it lies in another desired state. Then one can consider the cost of moving and try to minimize that cost so that one can ask whether there is an ’optimal’ manner in which to reach the desired final state. This will be our guiding philosophy as we make precise these notions and prove a useful theorem that will aid us in this quest for optimal control of our universe. 2. Definitions A control system Σ = (X, f, A) consists of open

(1) X ⊂ Rn (2) A ⊂ Rm (3) f : X × cl(A) −→ Rn such that • f is continuous • (∀a ∈ cl(A))f (·, a) is of class C 1 , that is for all x ∈ X, a ∈ A, ∂ ∂x f (x, a) exists and is continuous A is usually called the control space Let I ⊂ R Definition 1. Let Σ = (X, f, A) be a control system An admissible control is a measurable map µ : I −→ A such that (∀x ∈ X), f (x, µ(t)) is locally integrable in t The time interval I in this paper will always be the finite interval, [t0 , t1 ], for some fixed t0 , t1 . This case is called a fixed time control. The interval I in most cases should be compact, in which case there will be a initial(starting) time and terminal(stopping) time. Definition 2. Let Σ = (X, f, A) be a control system. A trajectory in X is a map x : I −→X such that (2.1)

x(t) ˙ = f (x(t), µ(t))

This differential equation describes the system and is called the dynamics equation. We have I compact, so the trajectory will start at position x(t0 ) = x0 The solution to the dynamics equation will then depend on the control, the starting point and the starting time. 1

2

1. CONTROL SYSTEMS

Notation 1. The solution to the dynamics equations (2.1) will be represented as x(µ, x0 , t0 , t), though at times we may write x(µ, x0 , t0 , t) as x(t) for convenience, though it should always be implicit that any trajectories defined from now on are dependent on µ, x0 , t0 , t With this in mind, we shall refer to a controlled trajectory (x, µ) where x is a trajectory dependent on a control µ Since controls give rise to trajectories we define a subset of all admissible controls Let A(x0 , t0 , [t0 , t1 ]) be all admissible controls such that the solution the I.V.P. x(t) ˙ = f (x(t), µ(t)),

x(t0 ) = x0

We are usually interested in systems where the dynamics have a cost or payoff attached. The aim then is to minimize/maximize the cost/payoff. We do this with Lagrangian functions. Definition 3. Let Σ = (X, f, A) be a control system. A Lagrangian L for Σ is a map L : X × cl(A) −→ R such that • L is continuous • (∀a ∈ cl(A))L(x, a) has continuous first order partial derivatives in x. A controlled trajectory (x, µ) is called L-acceptable if for a controlled trajectory (x, µ), L(x(t), µ(t)) is integrable in t The idea with the Lagrangian in this paper is that it should measure the instantaneous cost at time t. With this in mind we can now define the total cost Definition 4. The cost functional up to time t for a control system Σ = (X, f, A) with Langrangian L is a map JΣ,L :X × A × [t0 , t1 ] −→ R Z t JΣ,L (x, µ, t) = L(x(τ ), µ(τ )) dτ + ϕ(x(t0 )) t0

for trajectory x and control µ. The value ϕ(x(t0 )) is the initial/start cost. It is also possible and sometimes convenient to incorporate the cost into the control system to get a new control system. This technique lets us consider the cost of a controlled trajectory as a controlled trajectory in a new system. This will be useful later in showing that optimally controlled trajectories have certain types of paths in the extended system. Definition 5. Let Σ = (X, f, A) be a control system with associated Lagrangian ˆ = (X, ˆ fˆ, A) is defined as L. The Extended control system Σ ˆ = R×X • X ˆ • f ((xl , x), a) = (L(x, a), f (x, a)) Thus the new dynamics equation is (2.2) x ˆ˙ (t) = (L(x(t), µ(t), t), f (x(t), µ(t))) = (x˙ l , x(t)) ˙ Thus from the definition, the solution to the first term of the extended dynamics equation considered as a first order O.D.E. problem with initial condition xl (t0 ) = ϕ(x(t0 )) is the cost function JΣ,L (x, µ, t)

2. DEFINITIONS

3

Notation 2. For a control system Σ, anything relating to the extended system will be representing with a ’hat’ , so for example the system function f will be fˆ in ˆ the extended system Σ The next important function will play a role in the Maximum Principle and will be useful in determining optimal trajectories Definition 6. Let Σ = (X, f, A) be a control system with associated Langrangian L. The Hamiltonian functions are (1) The Hamiltonian HΣ : X × R × A −→ R defined by HΣ (x, p, a) = p · f (x, a) (2) The maximum Hamiltonian HΣmax : X × R −→ R defined by HΣmax (x, p) = supa∈A {HΣ (x, p, a)} (3) The Extended Hamiltonian HΣ,L : X × R × A −→ R defined by HΣ,L (x, p, a) = p · f (x, a) + L(x, a) max : X × R −→ R (4) The maximum Extended Hamiltonian HΣ,L defined by HΣmax (x, p) = supa∈A {HΣ,L (x, p, a)} With the definitions above, it is easy to see that the Extended Hamiltonian and the Hamiltonian of the Extended system are related as ˆ ˆ ((xl , x), (pl , p), a) =fˆ((xl , x), a) · (pl , p) = (L(x, a), f (x, a)) · (pl , p) H Σ =f (x, a) · p + pl L(x, a) = HΣ,pl L (x, p, a) There is one more important function that plays a critical role in the Maximum Principle, A large part of this paper will be devoted to proving the existence of such a function Definition 7. Let Σ = (X, f, A) be a control system, (x, µ) a controlled trajectory. An adjoint response λ along (x, µ) is a absolutely continuous map λ : I −→ Rn , such that x(t) ˙ =∇λ HΣ (x(t), λ(t), µ(t)) ˙ λ(t) = − ∇x HΣ (x(t), λ(t), µ(t)) There are corresponding equations for the extended Hamiltonian In order to simplify many of the definitions,lemmas and theorems leading up to the Maximum Principle, we introduce the following convenient notation. Let U,V be respectively k and n dimensional real Euclidean spaces and f : U → V be a function of the variable x = (x1 , . . . , xk ) such that all the partial derivatives of f = (f1 , . . . , fn ) exist and are continuous, where fi are the real-valued component functions.  ∂f1 ∂f1  · · · ∂x ∂x1 k  .. ..  ..  . ∇x f =  . .   . ∂fn ∂fn .. ∂x1

∂xk

This is just the usual definition of the Jacobian matrix of a function, though in this paper we shall use the same notation for the case when V a 1-dimensional space, ∇x is the usual gradient of a function.

4

1. CONTROL SYSTEMS

3. Examples 3.1. The Optimal Control Problem. In real world situations, we usually want to steer a trajectory from one point to another in the most efficient way possible. This type of problem has been one of the primary guiding principles behind control theory. Stated more formally Definition 8. Let Σ = (X, f, A) be a control system, let S0 , S1 ⊂ Rn . Let L be a Lagrangian for Σ. A trajectory x∗ : [t0 , t1 ] → X with control µ∗ , starting position x∗ (t0 ) ∈ S0 and final position x∗ (t1 ) ∈ S1 is optimal for the Control Problem if for all t, JΣ,L (x∗ , µ∗ , t) ≤ JΣ,L (x, µ, t) for all other trajectories x and controls µ . We will denote S(Σ, L, S0 , S1 , [t0 , t1 ]) as all (x, µ) that satisfy the Optimal control problem for initial set S0 and terminal set S1 . 3.2. Linear Control Systems. Definition 9. A linear control system Σ = (X, f, A) has X ⊂ Rn , A ⊂ Rm and has f defined by linear maps(matrices) M − n × n and N − n × m so that f (x, a) = M x + N a 3.3. Time optimal controls. For a control system Σ = (X, f, A). If L = 1 is the Lagrangian for Σ then the optimal control problem above becomes a timeoptimal control problem since the cost function is minimised precisely when the time taken to reach the terminal set S1 is minimised 3.4. An Example. Let X = R3 , A = [−1, 1]3 the cube of centered at zero in R3 , let f be defined as       −1 x y − a1 0 1 0 f ((x, y, z), (a1 , a2 , a3 )) = −z + a2  = 0 0 −1 y + 0 z 1 0 0 x − a3 0

side length 2   a1 0 0 1 0  a2  a3 0 −1

Let L(x, a) = 1 . The Hamiltonian is HΣ,L (x, p, a) = f (x, a) · p = p1 (y − a1 ) + p2 (−z + a2 ) + p3 (x − a3 ) + 1 This system is linear and by the theory in the following chapter is controllable, the dynamics are dependant on the ‘following‘ term and thus one would expect circular solutions. This will be a time-optimal control we seek taking the system from initial point (x0 , y0 , z0 ) to (x1 , y1 , z1 ).

CHAPTER 2

Controllability 1. Definitions It hasn’t been stated whether we can in fact find a control so that a trajectory may be steered to some desired place. This is a problem of controllability. That is given two points x0 , x1 in X, is there a control µ and trajectory x defined on [t0 , t1 ] such that x(µ, x0 , t0 , t0 ) = x0 and x(µ, x0 , t0 , t1 ) = x1 . A more general and thorough treatment of the material in this chapter may be found in [2], the pioneering paper by [Hermann and Krener,1977] Definition 10. The reachable set from x0 at time t0 in time t1 − t0 R(x0 , t0 , t1 ) = {x(µ, x0 , t0 , t1 ) | µ ∈ A(x0 , t0 , [t0 , t1 ])} The Reachable set from x0 starting at time t0 in some finite time is S R(x0 , t0 ) = t>t0 R(x0 , t0 , t) The Reachable set from x0 consists of all points reachable from x0 in some finite time. S R(x0 ) = t0 ∈R R(x0 , t0 ) Let S ⊂ X.If we have (∀x0 ∈ S) R(x0 ) = X , then we say that Σ is controllable in S, written ΣS is controllable. If X is controllable, then we say that the system Σ is controllable. In describing the conditions for controllability for general systems, we will need an alternative description of control systems. In addition to the definition given in Chapter 1, let X be a smooth n-manifold and f (x, a) the dynamics function be smooth (C ∞ ). The case for linear systems has the nice property that all results are global. With non-linear system however, we will have to be content with local results. With this in mind, we will need a local version for reachability which will enable us to define local controllability as Definition 11. For x0 ∈ X, Σ is locally controllable at x0 , if Ux0 is a neighbourhood of x0 ⇒ R(x) ∩ Ux0 is a neighbourhood of x0 ΣS is locally controllable if it is locally controllable for all x0 ∈ S. In connected spaces local controllability will lead to global controllability. [2] Now in general, if x1 is reachable from x0 , it is not generally true that x0 is reachable from x1 . It is for this we define the notion of Definition 12. xn is weakly reachable from x0 if there is a sequence of points {xi }1≤i≤n such that for all i , xi ∈ R(xi−1 ) or xi−1 ∈ R(xi ) The set of all weakly reachable points from x0 is denoted W R(x0 ) Σ is weakly controllable at x0 if W R(x0 ) = X 5

6

2. CONTROLLABILITY

So from the definition, x0 is weakly reachable from y0 iff y0 is weakly reachable from x0 This is now again leads to a local version Definition 13. For x0 ∈ X, Σ is locally weakly controllable at x0 , if Ux0 is a neighbourhood of x0 ⇒ W R(x0 ) ∩ Ux0 is a neighbourhood of x0 Now we have four different types of controllability yet they tie together in a nice way. Let S ⊂ X be connected , then we have that ΣS locally controllable ⇔ΣS controllable ⇓



ΣS weakly locally controllable ⇔ΣS weakly controllable The main purpose of this chapter is determining at which points a system is controllable, thus we seek a convenient algebraic method to carry out this determination, with this in mind we define a few concepts from differential geometry. Definition 14. For X ⊂ Rn .Let f, g : X −→ Rn be two smooth vector fields. Let x = (x1 , . . . , xn ) be a coordinate system for X.The 1st order Lie Bracket of f and g is (1.1)

[f, g] = (∇x g)f − (∇x f )g

and so [f,g] is a new vector field. We can now define higher oreder Lie brackets (adkf , g) as (ad0f , g) = g (ad1f , g) = [f, g] (ad2f , g) = [f, [f, g]] .. . (adkf , g) = [f, (adk−1 , g)] f The Lie bracket operation gives rise to a Lie Algebra, that is for the vector space of all smooth vector fields on X, C ∞ (X, X) with the bilinear operator, the Lie Bracket [] , we have that • (∀f ∈ C ∞ (X, X))[f, f ] = 0 • (∀f, g, h ∈ C ∞ (X, X))[[f, g], h] + [[g, h], f ] + [[h, f ], g] = 0 For fixed a∗ ∈ A , we can consider f (·, a∗ ) as a vector field on X, so let Definition 15. For a control system Σ = (X, f, A) • F0 = {f (·, a) | a ∈ A} ⊂ C ∞ (X, X) • F is the subalgebra generated by F0 , that is F is the smallest Lie Algebra(set closed under []) in C ∞ (X, X) that contains F0 • F(x0 ) = {fx0 | f ∈ F} , essentially x0 mapped using F 2. Conditions for Controllability The Theory is well known for linear systems and the requirements are simple enough.

3. EXAMPLES

7

Definition 16. For a linear control system, Σ = (X, f, A) where f (x, a) = Mx + Na The Controllability m × (nm) matrix C(M, N ) defined as C(M, N ) = [N, M N, M 2 N, . . . , M n−1 N ] where the (i, j = km + h) entry is the (i, h) entry in M k N We now seek the general algebraic condition to determine just how controllable a system is. The following will provide a close enough local condition Definition 17. Let x0 ∈ X.If F(x0 ) has dimension n = dim(X) then Σ is said to satisfy The Controllability Rank Condition at x0 To show that this condition is satisfied, we only need to find n - linearly independent vector fields. That is since C ∞ (X, X) is a vector space over the reals, we seek n vector fields, f1 , . . . , fn in F such that n X

αi fi (x) = 0 =⇒ (∀i)αi = 0

i=1

but we can now express this in terms of matrices        f1 x1 · · · fn x1 αi αi 0  ..       . . . . .. ..   ..  = 0 =⇒  ..  =  ...   .  f1 xn · · · fn xn αn αn 0 But this just says that the matrix 

f1 x1  C =  ... f1 xn

··· .. . ···

 fn x1 ..  .  fn xn

must have rank n The controllability rank condition will be our required algebraic condition used to determine weak controllabilty. The condition however is not straightforward and will not be sufficient. It will be ’almost’ sufficient. Proofs of the following theorems may be found in [2]. Let S ⊂ X Theorem 1. If ΣS satisfies the controllability rank condition at x0 , then ΣS is locally weakly controllable at x0 Theorem 2. If ΣS is locally weakly controllable then ΣS satisfies the controllabilty rank condition on an open dense subset of X 3. Examples Let Σ = (R2 , f, R) , with  f ((x, y), a) =

1 0

      0 x 1 x+a + a= −1 y 1 −y + a

Then the Controllability matrix is  1 C = [N, M N ] = 1

 1 −1

8

2. CONTROLLABILITY

Which has Det(C)=-2, and so has full rank, so the system is controllable Now let’s change the dynamics function slighlty so that f is non-linear  2  x +a f ((x, y), a) = −y 2 + a Let f1 = f (·, 0) and f2 = f (·, 1) Then the vector fields, f1 , [f1 , f2 ] are in (F ) and are linearly independent, since   2    2    2x 0 x 2x 0 x +1 2x [f1 (x, y), f2 (x, y)] = − = 0 −2y −y 2 0 −2y −y 2 + 1 2y Then if we calculate the determinant of  2 x C= −y 2

2x 2y



we get that det(C) = (2y)x2 + (2x)y 2 = 2xy(y + x) So det(C) 6= 0 in {(x, y) ∈ R2 | y 6= −x, y 6= 0, x 6= 0} f (·, 0), f (·, 1) provides a basis in the region {(x, y) ∈ R2 | y 6= −x, y = 0, x = 0} , since we would have det(C) = So the system is locally weakly controllable on y 6= −x The Geometric intuition in this system is that any controls controls the x and y coordinates equally, so if we start in the region y > −x , the push in the y-direction is greater than the push in the x-direction. The result since controls contribute equally in the y and x direction is that it is impossible to reach the region y < −x A similar argument applies for starting in the the region y < −x

CHAPTER 3

The Maximum Principle We begin the main chapter of this paper with a look at the calculus of variations, the forerunner to the ideas behind the Maximum Principle. This chapter will focus on the intuition behind the Maximum Principle, providing the machinary to prove it and finally proving it. We refer to Definition(33) in the appendix for the definition and intuition behind constraint sets. All that is needed here is that for a smooth constraint set S , there is a map Ψ : X → Rk such that S = ker(Ψ) where 0 ≤ k ≤ n . In this sense we think of Ψ defining the set S Theorem 3. Let Σ = (X, f, A) be a control system with Lagrangian L. Let S0 , S1 ⊂ X. If (x∗ , µ∗ ) ∈ S(Σ, L, S0 , S1 , [t0 , t1 ]) then there is λ∗0 ∈ {0, −1} and a absolutely continuous map λ∗ : [t0 , t1 ] → Rn such that (1) (2) (3) (4) (5)

λ∗0 = −1 or λ∗ (t0 ) 6= 0 λ∗ is an adjoint response for (Σ, λ∗0 L) along (x∗ , µ∗ ) max ∗ ∗ HΣ,λ∗0 L (x∗ (t), λ∗ (t), µ∗ (t)) = HΣ,λ ∗ L (x (t), µ (t)) for almost all t ∈ [t0 , t1 ] 0 ∗ max ∗ ∗ if µ is bounded then HΣ,λ∗ L (x (t), µ (t)) = 0 for all t ∈ [t0 , t1 ] 0 If S0 , S1 are respectively smooth constraint n0 and n1 sets defined by maps Ψ0 , Ψ1 , then λ∗ can be chosen such that λ∗ (t0 ) ⊥ Ker(∇x Ψ0 (x(t0 ))) and λ∗ (t1 ) ⊥ Ker(∇x Ψ1 (x(t1 )))

The above conditions will provide us with enough information to determine our optimal controls and trajectories as seen in the examples at the end of this chapter. 1. Calculus of Variations We now provide some motivation for the Maximum Priniple, when we say that the Maximum Principle provides necessary conditions for a trajectory we mean that for A,B two logical formulae. B is a necessary condition for A if A⇒B So in the theorem below if a trajectory is optimal in the sense defined in Example (3.1) in Chapter 1, we must have certain conditions satisfied. The theory below is different from our control theory but will bear enough resemblance to the control theory definitions. The Calculus of Variations was driven by a physical problem, namely the brachistochrone problem. The brachistochrone is a problem of finding the time-optimal trajectory of a particle falling in a gravitational field. The techniques used in this section will provide an insight to some of the ideas in the Maximum Principle, particularly the use of trajectory variations in the proof of Theorem(4). The following results and detailed proofs may be found in 9

10

3. THE MAXIMUM PRINCIPLE

[3][Lewis,2006] We define the variations Lagrangian L : X × Rn → R to be continuously differentiable on X × R in both variables. Let C 2 (x0 , x1 , [t0 , t1 ]) = {x : [t0 , t1 ] → X | x(t0 ) = x0 , x(t1 ) = x1 , and x has continuous partial derivatives in t up to order 2 } Rt ˙ )) dτ The cost function for L is JL (x) = t01 L(x(τ ), x(τ Definition 18. Let S(L, x0 , x1 , [t0 , t1 ]) = {x∗ ∈ C 2 (x0 , x1 , [t0 , t1 ])|(∀x ∈ C 2 (x0 , x1 , [t0 , t1 ]))JL (x∗ ) ≤ JL (x)} Theorem 4. Let L be a Langrangian, x ∈ S(L, x0 , x1 , [t0 , t1 ]) Then x satisfies the Euler-Lagrange equations d (∇x˙ L(x(t), x(t))) ˙ = ∇x L(x(t), x(t)) ˙ (1.1) dt Proof. Let x ∈ C 2 (x0 , x1 , [t0 ,1 ]) and let ς : [t0 , t1 ] → Rn be the function ς(t) be such that Now since X is open, we take small enough  so that the variation xς : (−, ) × [t0 , t1 ] → X xς (s, t) = x(t) + sς(t) is well defined with ς(t) an infitesimal variation of xς d JL (xς ) = ds

Z

t1

t0 t1

d L(xς (s, t), x˙ ς (s, t)) dt ds

Z

(∇xς L(xς (s, t), x˙ ς (s, t)) ·

= t0 t1

Z

(∇xς L(xς (s, t), x˙ ς (s, t)) −

= t0

d d xς (s, t) + ∇x˙ ς L(xς (s, t), x˙ ς (s, t)) · x˙ ς (s, t)) dt ds ds d ∇x˙ L(xς (s, t), x˙ ς (s, t))) · ς(t) dt dt ς

+ ∇x˙ ς L(xς (s, t), x˙ ς (s, t)) · ς(t) |tt10 Z t1 d = (∇xς L(xς (s, t), x˙ ς (s, t)) − ∇x˙ ς L(xς (s, t), x˙ ς (s, t))) · ς(t) dt dt t0 Where integration by parts and Clairaut’s theorem are used in second last line. Now suppose for some t∗ ∈ [t0 , t1 ] we have that ∇x L(x(s, t∗ ), x(s, ˙ t∗ )) −

d ∗ ˙ t∗ )) dt ∇x˙ L(x(s, t ), x(s,

6= 0

Then there must be some ς0 such that (∇x L(x(s, t∗ ), x(s, ˙ t∗ )) −

d ∗ ˙ t∗ ))) dt ∇x˙ L(x(s, t ), x(s,

· ς0 ≥ 0

Since L has continuous second order derivatives (∃δ > 0) such that (∀t ∈ [t∗ − δ, t∗ + δ] ⊂ [t0 , t1 ]) (∇x L(x(s, t), x(s, ˙ t)) −

d ˙ t))) dt ∇x˙ L(x(s, t), x(s,

Now let φ : [t0 t1 ] → R be Thus let ς(t) = φ(t)ς0 and we have d ds |s=0

JL (xς (s, t)) > 0

· ς0 ≥ 0

1. CALCULUS OF VARIATIONS

So JL (xς )(s, t) cannot have a maximum at s = 0 , but JL (xς )(0, t) = JL x(t) x∈ / S(L, x0 , x1 , [t0 , t1 ])

11



The Euler-Lagrange equation is somewhat similar to the adjoint equation in part(2) of the Maximum Principle. As a necessary condition it provides us with information to calculate a optimal trajectory. The following theorem due to Legendre is similar to the condition in multivariable calculus that the minimum of a function at a point must have its second derivative positive-semidefinite on a neighbourhood of that point. open

Theorem 5. Let X ⊂ Rn . Let L be a Lagrangian. Let x ∈ S(L, x0 , x1 , [t0 , t1 ]). Then (∀t ∈ [t0 , t1 ]) ∇x˙ (∇x˙ L(x(t), x(t))) ˙ is positive semi-definite that is ∇x˙ (∇x˙ L(x(t), x(t))) ˙ · (x1 , x2 ) ≥ 0 on a neighbourhood of x(t), x(t) ˙ open

Definition 19. Let X ⊂ Rn , L the Langrangian. The Weierstrass excess function is EL :X × Rn × Rn → R EL (x, v, u) =L(x, u) − L(x, v) − ∇v L(x, v) · (u − v) open

Theorem 6. Let X ⊂ Rn , let x0 , x1 ∈ X and L the Lagrangian. Let x ∈ S(L, x0 , x1 , [t0 , t1 ]), Then (∀t ∈ [t0 , t1 ])(∀u ∈ Rn ) EL (x(t), x(t)) ˙ ≥0 1.1. Hamiltonian Dynamics. The variations Hamiltonian is similar to the control theory Hamiltonian. Intuitively the variations/classical Hamiltonian is the total energy of the system. open

Definition 20. Let X ⊂ Rn , L the Lagrangian. Then the Hamiltonian HL is a C 2 function H L : X × Rn × Rn → R HL (x, v, p) = p · v − L(x, v) The following theorem relates the classical Hamiltonian equations to the EulerLagrange equation . This is a direct analogue to the adjoint equation in the Maximum principle. The theorem is useful because it provides a construction of an ’adjoint’ . This will later be generalised in the chapter on dynamic programming. The proof follows the lines of [Lewis,2006][3] For some function λ : [t0 , t1 ] −→ X Theorem 7. Let x ∈ C 2 (x0 , x1 , [t0 , t1 ]) and L the Lagrangian, then (1) x satisfies the Euler-Lagrange equations and the equation λ(t) = ∇x˙ L(x(t), x(t)) ˙ if and only if (2) x(t) ˙ =∇λ HL (x(t), x(t), ˙ λ(t)) ˙λ(t) = − ∇x HL (x(t), x(t), ˙ λ(t)) 0 =∇x˙ HL (x(t), x(t), ˙ λ(t))

12

3. THE MAXIMUM PRINCIPLE

Proof. (1) ⇒ (2) ∇λ HL (x(t), x(t), ˙ λ(t)) = ∇λ (λ(t) · x(t)) ˙ = x(t) ˙ ∇x˙ HL (x(t), x(t), ˙ λ(t)) = ∇x˙ (λ(t) · x(t)) ˙ − ∇x˙ L(x(t), x(t)) ˙ = λ(t) − ∇x˙ L(x(t), x(t)) ˙ =0 Similarly from the definition of HL , d ˙ λ(t) = ∇x˙ − L(x(t), x(t)) ˙ dt = ∇x − L(x(t), x(t)) ˙ = ∇x ((x(t) ˙ · λ(t)) − L(x(t), x(t))) ˙ = ∇x HL (x(t), x(t), ˙ λ(t)) (1) ⇐ (2) From the last equation we have that ∇x˙ HL (x(t), x(t), ˙ λ(t)) = λ(t) − ∇x˙ L(x(t), x(t)) ˙ =0 Using this equation now yields ˙ λ(t) = − ∇x HL (x(t), x(t), ˙ λ(t)) = − ∇x L(x(t), x(t)) ˙ d (−∇x˙ L(x(t), x(t))) ˙ dt and so the Euler-Lagrange equations are satisfied =



The final theorem of this section will link our new ’adjoint’ to maximisation of the hamiltonian provided the necessary conditions stated earlier are satisfied. Theorem 8. x ∈ C 2 (x0 , x1 , [t0 , t1 ]) satisfies Theorems(4),(5),(6) if and only if there exists a differentiable function λ : [t0 , t1 ] → Rn such that the following equations hold x(t) ˙ =∇λ HL (x(t), x(t), ˙ λ(t)) ˙ λ(t) = − ∇x HL (x(t), x(t), ˙ λ(t)) HL (x(t), x(t), ˙ λ(t)) = maxn {HL (x(t), u, λ(t))} u∈R

2. Variations of Curves We now start with the machinary for the proof of the Maximum Principle. The idea of a variation of a curve is to explore things around the curve. We consider variations in both trajectories and control. In particular we will be interested in certain types of variations of controls which will induce variations of trajectories. The images of these maps will form nice geometric sets for which we refer to Appendix A2. These geometric sets will then be used later to approximate the reachable set R(x0 , t0 , t1 ),Definition(10). open

Definition 21. Let X ⊂ R, x0 , x1 ∈ X and x(t) ∈ C 2 (x0 , x1 , [t0 , t1 ]) A Variation of x(µ, x0 , t0 , t) is a map σ : J × [t0 , t1 ] → X , such that (1) J is an interval of R such that 0 ∈ int(J)

2. VARIATIONS OF CURVES

13

(2) (∀t ∈ [t0 , t1 ])σ(0, t) = x(µ, x0 , t0 , t) (3) (∀s ∈ J)σ(s, ˙ t) = f (σ(s, t), µ(t)) d (4) (∀t ∈ [t0 , t1 ]) ds σ(s, t) exists and is continuous For a Variation σ of x, The infitesimal variation is d ς(t) = δσ(t) = ds |s=0 σ(s, t) Variations and the resultant infitesimal variation arise naturally in control theory. With the following definition and theorem, we have that the Jacobian of the dynamics function w.r.t. x is the time derivative of some infitesimal variation Definition 22. For Σ = (X, f, A) a control system and a µ an admissible control. (1) The variational equations for Σ and µ are x(t) ˙ =f (x(t), µ(t)) ς(t) ˙ =∇x f (x(t), µ(t)) · ς(t) (2) The adjoint equations for Σ and µ are x(t) ˙ =f (x(t), µ(t)) ˙λ(t) = − ∇x f T (x(t), µ(t)) · λ(t) Theorem 9. Let Σ = (X, f, A), let µ ∈ A(x0 , t0 , [t0 , t1 ]), let ς : [t0 , t1 ] −→ Rn , Then ς is an infitesimal variation of some σ for x(µ, x0 , t1 ) that is, δσ = ς if and only if (∀t ∈ [t0 , t1 ])x(t), ς(t) satisfies the variational equation The proof follows the lines of [Lewis,2006][3] Proof. ⇒ : Let σ be the variation of x(µ, x0 , t0 , t) such that δσ = ς. Let γ(s) = σ(s, t0 ) So σ(s, t) = x(µ, γ(s), t0 , t) . Since γ(0) = x0 we can write ς as ς(t) = δσ(t) =

d ds |s=0

σ(s, t) = ∇γ x(µ, x0 , t0 , t)ς(t0 )

For x(µ, x0 , t0 , t) a solution to the control equation, let Φ(t) : I −→ M atn×n (Rn ) Φ(t) = ∇x0 x(µ, x0 , t0 , t) ˙ We can calculate Φ(t) d ˙ Φ(t) = ∇x0 x(µ, x0 , t0 , t) dt =∇x x(µ, ˙ x0 , t0 , t)∇x0 x(µ, x0 , t0 , t) =∇x f (x(µ, x0 , t0 , t), µ)Φ(t) Since x(µ, x0 , t0 , t0 ) = x0 , Φ(t0 ) = In Thus Φ(t) satisfies the n-dimensional initial value problem ˙ Φ(t) = ∇x f (x(µ, x0 , t0 , t), µ)Φ(t) , Φ(t0 ) = In since ς(t) = Φ(t)δσ(t0 ) ς(t) ˙ = ∇x f (x(µ, x0 , t0 , t), µ)ς(t) , ς(t0 ) = δσ(t0 )

14

3. THE MAXIMUM PRINCIPLE

d ⇐: Let γ : [−, ] → X be a C 1 map such that γ(0) = x0 and ds γ(s) |s=0 = ς(t0 ). Let σ(s, t) = x(µ, γ(s), t0 , t) . Then σ(s, t) varies the starting position for s Claim: σ(s, t) is well-defined variation of x Since solutions to the dynamics equation(2.1) are continuous and σ(0, t) = x(µ, x0 , t0 , t), σ(s, t) is continuous at s = 0, Thus for small enough , σ(s, t) is a solution to the dynamics equation Clearly σ satisfies property (1),(2) of our definition of a variation. It satisfies part (3) by the argument above and part (4) since we defined γ(s) as being of class C 1 

The map Φ(t) that was used was instrumental in the proof. This type of map has a certain geometric quality and is worthwhile looking into. Definition 23. Let Φ(µ, x0 , t0 , τ, t) denote the solution to the matrix initial value problem ˙ Φ(t) = ∇x f (x(µ, x0 , t0 , t)µ(t))Φ(t) , Φ(τ ) = In The map Φ(µ, x0 , t0 , τ, t) maps the tangent space at x(µ, x0 , t0 , τ ) to the tangent space at x(µ, x0 , t0 , τ ). It has some nice properties that we will use later (1) Φ(µ, x0 , t0 , τ, τ ) = In (2) Φ(µ, x0 , t0 , τ1 , τ2 ) = Φ(µ, x0 , t0 , τ2 , τ1 )−1 (3) Φ(µ, x0 , t0 , τ0 , τ1 )Φ(µ, x0 , t0 , τ1 , τ2 ) = Φ(µ, x0 , t0 , τ0 , τ2 ) Lemma 1. Let Σ = (X, f, A) be a control system, µ ∈ A(x0 , t0 , [t0 , t1 ]), τ ∈ [t0 , t1 ] The solution to the adjoint I.V.P. is ˙ λ(t) = −∇x f T (x(µ, x0 , t0 , t), µ(t)) , λ(τ ) = λτ is λ(t) = Φ(µ, x0 , t0 , t, τ )T λτ where Φ is as in Definition 23 Proof. Let Φ0 (µ, x0 , t0 , τ, t) be the solution to the matrix I.V.P. d 0 dt Φ (µ, x0 , t0 , τ, t)

= −∇x f T (x(µ.x0 , t0 , t), µ(t))Φ0 (µ, x0 , t0 , τ, t) Φ0 (µ, x0 , t0 , τ, τ ) = In

so the solution to the adjoint I.V.P. is Φ0 (µ, x0 , t0 , τ, t)λτ By the definition of Φ(µ, x0 , t0 , τ, t), Φ(µ, x0 , t0 , τ, t)ςτ is a solution to the variational I.V.P. for all ςτ ∈ Rn ς(t) ˙ = ∇x f (x(µ, x0 , t0 , t), µ(t)) , ς(τ ) = ςτ Thus d ˙ (λ(t) · ς(t)) =λ(t) · ς(t) + λ(t) · ς(t) ˙ dt = − ∇x f T (x(µ, x0 , t0 , t), µ(t))λ(t) · ς(t) + λ(t) · ∇x f (x(µ, x0 , t0 , t), µ(t))ς(t) = − (∇x f T (x(µ, x0 , t0 , t), µ(t))λ(t))T ς(t) + (λ(t))T ∇x f (x(µ, x0 , t0 , t), µ(t))ς(t) =0

2. VARIATIONS OF CURVES

So for all λτ , ςτ ∈ Rn λ(t) · ς(t) = λ(τ ) · ς(τ ) = λτ · ςτ Therefore Φ0 (µ, x0 , t0 , τ, t) = Φ(µ, x0 , t0 , τ, t)−T = Φ(µ, x0 , t0 , t, τ )T

15



The above theorem will prove extremely useful later as a tool for proving the existence of an adjoint response. If given a point λ ∈ Rn , we can now form a function that takes on that value and satisfies the adjoint equation 2.1. Needle Variations. With the idea of a variation of a curve, we will produce variations of trajectories and controls. This can be done through needle variations. A needle variation changes the value of the control instantaneously to a constant value over a closed interval of specified length. This will let us study how changing the control effects the system. This will be done by considering the resultant trajectory of a control variation. We then find that one can create certain convex sets using these variations and then in the next section be able to approximate the boundary of the reachable set. Definition 24. Let Σ = (X, f, A) be a control system, µ ∈ A(x0 , t0 , [t0 , t1 ]) (1) The Fixed interval multi-needle variation representative is an indexed set Θ = {θ1 , . . . , θk } of ordered triples, θi = (τi , li , αi ) ⊂ (t0 , t1 ] × R≥ × A such that the times τ1 , . . . , τk are distinct (2) The control variation of µ associated to the fixed interval multi-needle variation representative is a map µΘ : [0, s0 ] × ([t0 , t1 ] −→ A αi if t ∈ [τi − sli , τi ], 1 ≤ i ≤ k µΘ (s, t) = where s0 is chosen small µ(t) else enough such that (∀s ∈ [0, s0 ])µΘ (s, t) is a well-defined admissible control (3) The Fixed interval multi-needle variation associated with µ, the trajectory x(µΘ (s, ·), x0 , t0 , t) and fixed interval representative Θ = {θ1 , . . . , θk } is d |s=0 x(µ, x0 , t0 , t) ςΘ (t) = ds for times t > max{τi | 1 ≤ i ≤ k} , when the derivative exists In the case where k = 1 , We call θ a needle variation or singular needle variation and the Notation 3. By a needle variation, we will refer to a singular needle variation and state explicitly that we are working with multi-needle variation when the case arises Definition 25. Let Σ = (X, f, A) be a control system and µ ∈ A(x0 , t0 , [t0 , t1 ]). The set of Lebesgue points denoted Leb(µ, x0 , t0 , t) ⊂ [t0 , t] are all times t∗ such that Z ∗ 1 t + lim |f (x(µ, x0 , t0 , t), µ(t)) − f (x(µ, x0 , t0 , t∗ ), µ(t∗ ))| dt = 0 →0 2 t∗ − Lemma 2. For control system Σ = (X, f, A), µ ∈ A(x0 , t0 , [t0 , t1 ]). If Θ = {θ1 , . . . , θk } is a fixed interval multi-needle representative with all τi ∈ Leb(µ, x0 , t0 , t) and λ = (λ1 , . . . , λk ) ∈ Rn,+ , where λi > 0 for all i, we have that Pk ςλΘ (t) = j=1 λj Φ(µ, x0 , t0 , τj , t)ςθj (t) Pk d And if j=1 λj = 1 then the derivative ds |s=0 x(µλΘ , x0 , t0 , t) exists uniformly

16

3. THE MAXIMUM PRINCIPLE

A proof may be found in [Lewis,2006][3] 3. Reachability The main aim of this section is to approximate the boundary of the reachable set. From the theory in Chapter 2, we have conditions to determine whether the reachable set in nonempty, therefore in order for this section to be worthwhile we shall always assume here that the reachable set is in fact nonempty. We are now nearing the end of the results need for the proof of the Maximum principle’s assertations. We want to approximate the boundary of the reachable set to show that optimal trajectories of the extended system lie on this boundary. This major result will be the first necessary condition we develop for optimal controls. Using needle variations and the tangent space isomorphisms, Definition(23). We shall generate tangent cones and show that the interior of this cone is in the reachable set Definition 26. The Reachable set from x0 at time t0 in the time frame t1 − t0 is R(x0 , t0 , t1 ) = {x(µ, x0 , t0 , t1 )|µ(t) ∈ A(x0 , t0 , [t0 , t1 ])} 3.1. Tangent Cones. We refer to the Appendix A2 where the definitions of cones and convex sets are made. A k-simplex is the convex hull of a set S = {x0 , x1 , . . . , xk } affinely independent points of Rn A k-simplex cone is the coned hull of a set S = {x1 , . . . , xk } linearly independent vectors of Rn Definition 27. Let Σ = (X, f, A) be a control system, µ ∈ A(x0 , t0 , [t0 , t1 ]). For t ∈ [t0 , t1 ], let the fixed interval tangent cone at t, K(µ, x0 , t0 , t) be the coned convex hull of the set S Ξ = {Φ(µ, x0 , t0 , τ, t)ς(τ ) | τ ∈ Leb(µ, x0 , t0 , t, ) and ς a fixed interval needle variation at τ } So K(µ, x0 , t0 , t) = cone(conv(Ξ)) Definition 28. For Σ = (X, f, A) a control system, µ ∈ A(x0 , t0 , [t0 , t1 ]) The fixed interval tangent r-simplex cone representative is a set of Θ0 = {Θ1 , . . . , Θr } of fixed interval multi-needle variation representatives such that • For θi,j = (τi,j , li,j , αi,j ) in Θi , all the τi,j are distinct for all 1 ≤ i ≤ r and all 1 ≤ j ≤ ki • The coned convex hull of the fixed interval multi-needle variations is an r-simplex cone, that is cone(conv({ςΘ1 (t), . . . , ςΘr (t))) is an r-simplex cone The r-simplex cone defined by the fixed interval tangent representative is know as The fixed interval tangent r-simplex cone much as in Lemma (2). Lemma 3. For control system Σ = (X, f, A), µ ∈ A(x0 , t0 , [t0 , t1 ]), Θ0 = {Θ1 , . . . , Θr } a tangent r-simplex cone representative and λ = {λ1 , . . . , λr } a collection of positive real scalars. If we consider Θ0 as a kr needle variation The control variation µλΘ0 (s, t) gives rise to a multi needle variation ςλΘ0 such that

4. THE PROOF OF THE MAXIMUM PRICIPLE

ςλΘ0 (t) =

d ds |s=0

x(µλΘ0 (s, t), x0 , t0 , t) =

r P

17

λi ςΘi (t)

i=1

A proof may be found in [Lewis,2006][3] The following lemma says that the fixed interval tangent cone at time t can be calculated from the fixed interval tangent cone at Lebesgue times in [t0 , t1 ] Lemma 4. For control system Σ = (X, f, A), µ ∈ A(x0 , t0 , [t0 , t1 ]) K(µ, x0 , t0 , t) S = cl( {Φ(µ, x0 , t0 , τ, t))K(mu, x0 , t0 , τ ) | τ ∈ Leb(µ, x0 , t0 , [t0 , t1 ])} A proof may be found in [Lewis,2006][3] Thus we only need to work with the fixed interval tangent cone at Lebesgue times 3.2. Approximation to the reachable set. Theorem 10. Let Σ = (X, f, A) be a control system, µ ∈ A(x0 , t0 , [t0 , t1 ]) If for t ∈ [t0 , t1 ], ς0 ∈ int(K(µ, x0 , t0 , t)), There is a cone K ⊂ K(µ, x0 , t0 , t) and r > 0 such that • ς0 (τ ) ∈ int(K) • {x(µ, x0 , t0 , t) + ςτ ) | ς(τ ) ∈ K , kς(τ )k < r} ⊂ R(x0 , t0 , t) The theorem says that given a needle variation ’inside’ the fixed interval tangent cone, there is a cone K such that for ’small enough” needle variations ς in this cone, there are reachable points within ς(τ ) distance around a trajectory x(µ, x0 , t0 , t) 4. The Proof of The Maximum Priciple 4.1. The Hamiltonian and Tangent Cones. Lemma 5. Let Σ = (X, f, A) be a control system. Let µ ∈ A(x0 , t0 , [t0 , t1 ]). (∀t ∈ [t0 , t1 ]) let Kt be a convex cone such that K(µ, x0 , t0 , t) ⊂ Kt . Let λ be the adjoint response for Σ along (x, µ) If (∃τ ∈ [t0 , t1 ])(∀v ∈ Kt ) such that λ(τ ) · v ≤ 0 Then HΣ (x(µ, x0 , t0 , t), λ(t), µ(t)) = HΣmax (x(µ, x0 , t0 , t), λ(t)) We now have a sufficient condition for maximisation of the Hamiltonian. The proof follows the lines of [Lewis,2006][3]. Proof. Let t ∈ Leb(µ, x0 , t0 , τ ), then for any α ∈ A, let θ = (t, 1, α) be a needle variation representative. By Lemma (4) we have that Φ(µ, x0 , t0 , t, τ )(f (x(µ, x0 , t0 , t), α) − f (x(µ, x0 , t0 , t), µ(t))) ∈ K(µ, x0 , t0 , t) ⊂ Kτ Hence λ(τ ) · Φ(µ, x0 , t0 , t, τ )(f (x(µ, x0 , t0 , t), α) − f (x(µ, x0 , t0 , t), µ(t))) ≤ 0 ⇒λ(τ ) · Φ(µ, x0 , t0 , t, τ )f (x(µ, x0 , t0 , t), α) ≤ Φ(µ, x0 , t0 , t, τ )f (x(µ, x0 , t0 , t), µ(t)) ⇒λ(t) · f (x(µ, x0 , t0 , t), α) ≤ λ(t) · f (x(µ, x0 , t0 , t), µ(t)) ⇒(∀α ∈ A)HΣ (x(µ, x0 , t0 , t), λ(t), α) ≤ HΣ (x(µ, x0 , t0 , t), λ(t), µ(t)) ⇒HΣmax (x(µ, x0 , t0 , t), λ(t)) ≤ HΣ (x(µ, x0 , t0 , t), λ(t), µ(t)) ⇒HΣmax (x(µ, x0 , t0 , t), λ(t)) ≤ HΣ (x(µ, x0 , t0 , t), λ(t), µ(t))  When is the Hamiltonian zero?

18

3. THE MAXIMUM PRINCIPLE

Corollary 1. If in addition we have that (∀t ∈ [t0 , t1 ])(∀α ∈ R)αf (x(µ, x0 , t0 , τ )µ(τ )) ∈ Kτ Then HΣ (x(µ, x0 , t0 , τ ), λ(τ ), µ(τ )) = 0 Proof. From (∀t ∈ [t0 , t1 ])(∀α ∈ R)αf (x(µ, x0 , t0 , τ )µ(τ )) ∈ Kτ λ(τ ) · f (x(µ, x0 , t0 , τ )µ(τ )) ≤ 0 and −λ(τ ) · f (x(µ, x0 , t0 , τ )µ(τ )) ≤ 0 Hence λ(τ ) · f (x(µ, x0 , t0 , τ )µ(τ )) = 0 So HΣ (x(µ, x0 , t0 , τ ), λ(τ ), µ(τ )) = 0



If an adjoint maximises the Hamiltonian, we must that that the Hamiltonian is non-constant on a subset of [t0 , t1 ] of measure zero. The proof follows similar lines to [Lewis,2006][3] Theorem 11. Let Σ=(X, f, A), Let µ ∈ A(x0 , t0 , [t0 , t1 ]) be a bounded function. Let λ be an adjoint response for Σ along (x(µ, x0 , t0 , ·), µ) such that HΣ (x(µ, x0 , t0 , t), λ(t), µ(t)) = HΣmax (x(µ, x0 , t0 , t), λ(t)) for almost all t ∈ [t0 , t1 ]. Then ∂ max (x(µ, x0 , t0 , t), λ(t)) = 0 ∂t HΣ for all t ∈ [t0 , t1 ]. The proof works by showing that the theorem holds when the Hamiltonian is restricted to a compact subset of the control space. This yields the result for almost all t ∈ [t0 , t1 ]. We then show that HΣmax is lower semi-continuous and derive the full result. Proof. Let B = cl(Im(µ)), then since µ is bounded, B is compact. Let ˜ Σ (x, p, a) = HΣ (x, p, a) |a∈B H ˜ max (x, p) Then (∀x, p ∈ Rn )HΣmax (x, p) ≥ H Σ Since Im(µ) has the same Lebesgue measure as it’s closure B, m(B \ Im(µ)) = 0 ˜ max (x(t), λ(t)) We have that for almost all t ∈ [t0 , t1 ] HΣmax (x(t), λ(t)) = H Σ Since f (x, a) has continuous first order partial derivatives for x, HΣ (x, p, a) has continuous first order partial derivatives for x, p. Let W = Im(x) × Im(λ), Since [t0 , t1 ] is a finite interval and we have that x, λ are continuous on [t0 , t1 ], so W is bounded and cl(W ) is compact. Now by Theorem(22) We have that for some C > 0 |HΣ (x1 , p1 , a) − HΣ (x2 , p2 , a)| < C k(x1 , p1 ) − (x2 , p2 )k for (x1 , p1 ), (x2 , p2 ) ∈ cl(W ), Pick a1 , a2 ∈ B such ˜ max (x1 , p1 ) = HΣ (x1 , p1 , a1 ) and H ˜ max (x2 , p2 ) = HΣ (x2 , p2 , a2 ) H Σ Σ Such a1 , a2 exist since B is compact. We are restricted to the set B. Hence we have the following inequalities ˜ Σ (x1 , p1 , a2 ) ≤ H ˜ Σmax (x1 , p1 ) = H ˜ Σ (x1 , p1 , a1 ) = HΣ (x1 , p1 , a1 ) HΣ (x1 , p1 , a2 ) = H ˜ Σ (x2 , p2 , a1 ) ≤ H ˜ max (x2 , p2 ) = H ˜ Σ (x2 , p2 , a2 ) = HΣ (x2 , p2 , a2 ) HΣ (x2 , p2 , a1 ) = H Σ Now using the above inequalities we get that −C k(x1 , p1 ) − (x2 , p2 )k ≤HΣ (x1 , p1 , a2 ) − HΣ (x2 , p2 , a2 ) ≤HΣ (x1 , p1 , a1 ) − HΣ (x2 , p2 , a2 ) ≤HΣ (x1 , p1 , a2 ) − HΣ (x2 , p2 , a1 ) ≤C k(x1 , p1 ) − (x2 , p2 )k

4. THE PROOF OF THE MAXIMUM PRICIPLE

19

Hence we have that ˜ max ˜ Σmax (x1 , p1 ) ≤ kC k(x1 , p1 ) − (x2 , p2 )kk HΣ (x1 , p1 ) − H ˜ max is a Lipschitz function So H Σ ˜ max is absolutely continuous in t, Let Now to show that H Σ Pk {(ai , bi ) | 1 ≤ i ≤ k} be a collection of disjoint open intervals such that i=1 |bi − ai | ≤ ˜ max is Lipschitz and x, λ are continuous we have that δ, then since H Σ k X ˜ max (x(bi ), λ(bi )) − H ˜ max (x(ai )λi ) ≤ H Σ Σ i=1 k X

Ci k(x(bi ) − x(ai ), λ(bi ) − λ(ai ))k
t∗ ˜ max (x(t∗ ), λ(t∗ )) ˜ max (x(t), λ(t)) − H H

Σ ≥ t − t∗ ∗ ∗ ∗ HΣ (x(t), λ(t), µ(t )) − HΣ (x(t), λ(t ), µ(t )) + t − t∗ HΣ (x(t), λ(t∗ ), µ(t∗ )) − HΣ (x(t∗ ), λ(t∗ ), µ(t∗ )) t − t∗ Σ

And for t < t∗ ˜ max (x(t), λ(t)) − H ˜ max (x(t∗ ), λ(t∗ )) H Σ Σ ≤ t − t∗ HΣ (x(t), λ(t), µ(t∗ )) − HΣ (x(t), λ(t∗ ), µ(t∗ )) + t − t∗ HΣ (x(t), λ(t∗ ), µ(t∗ )) − HΣ (x(t∗ ), λ(t∗ ), µ(t∗ )) t − t∗ d ˜ max ∗+ ∗− Taking the limits t → t and t → t , we get that dt HΣ (x(t∗ ), λ(t∗ )) = 0 ˜ max is constant on T Since t∗ was arbitrary in T , we have that H Σ max ˜ ˜ max (x(t), λ(t)) = C So HΣ (x(t), λ(t)) is constant for almost all t ∈ [t0 , t1 ], say H Σ max Now by Theorem(23) HΣ is lower semi-continuous, so we have that for all t∗ ∈

20

3. THE MAXIMUM PRINCIPLE

[t0 , t1 ] (∀ > 0)(∃δ > 0)[|t − t∗ | < δ ⇒ HΣ (x(t∗ ), λ(t∗ )) < HΣ (x(t), λ(t)) + ] ˜ max (x(t), λ(t)) for almost all t ∈ [t0 , t1 ] , we have Since HΣmax (x(t), λ(t)) = H Σ ∗ Since the open ball D(t , δ) centered at t∗ of distance δ has measure 2δ > 0, the are t ∈ D(t∗ , δ), such that (∀ > 0) HΣmax (x(t∗ ), λ(t∗ )) < C +  and so HΣmax (x(t∗ ), λ(t∗ )) ≤ C Since t∗ was arbitrary, the above holds for all t ∈ [t0 , t1 ] ˜ max , we have that (∀t∗ ∈ [t0 , t1 ]) From the definition of H Σ max ∗ ∗ HΣ (x(t ), λ(t )) ≥ C , and so we have that HΣmax (x(t), λ(t)) is constant in t  Theorem 12. Let Σ = (X, f, A) be a control system. Let µ∗ ∈ A(x0 , t0 , [t0 , t1 ]). If x (µ, x0 , t0 , t1 ) ∈ bd(R(x0 , t0 , t1 )) then there exists an adjoint response λ∗ for Σ along (x∗ (µ∗ , x0 , t0 , ·), µ∗ ) such that the Hamiltonian is maximised • HΣ (x(µ, x0 , t0 , t), λ(t), µ(t)) = HΣmax (x(µ, x0 , t0 , t), λ(t)) • If µ is bounded then HΣmax (x(µ, x0 , t0 , t), λ(t)) is constant in t ∗

Proof. Let x(µ∗ , x0 , t0 , t1 ) = F ∈ bd(R(x0 , t0 , t1 )) = cl(R(x0 , t0 , t1 )) ∩ X \ cl(R(x0 , t0 , t1 )) So let (Fi )i∈N be a sequence in X \ cl(R(x0 , t0 , t1 ) that converges to F Fi −F , so (xi )i∈N is a bounded sequence in X. By Bolzano-Weierstrass Let xi = kF i −F k , (xi )i∈N has a convergent subsequence (xij )j∈N . Say xij → x0 Claim 1. x0 ∈ / int(K(µ∗ , x0 , t0 , t1 )) Suppose the claim is false, then (∃N ∈ N)[j ≥ N ⇒ xij ∈ int(K(µ∗ , x0 , t0 , t1 ))]. Using Theorem(10), (∃N ∈ N)[j ≥ N ⇒ Fij ∈ int(R(x0 , t0 , t1 ))]. But this contradicts our choice of Fi lying in X \ cl(R(x0 , t0 , t1 ). Hence x0 ∈ cl(X \ K(µ∗ , x0 t0 , t1 )) .Using Theorem(21), there is a separating Hyperplane P , such that there are half spaces X1 , X2 such that x0 ∈ X1 and K(µ∗ , x0 , t0 , t1 ) ⊂ X2 . Let λ∗ (t1 ) ∈ X1 be a vector orthogonal to P . Then (∀x ∈ K(µ∗ , x0 , t0 , t1 ))λ∗ (t1 ) · x ≤ 0 Let λ∗ be the adjoint response for Σ along (x∗ , µ∗ ) and equal to λ∗ (t1 ) at t1 . Then using the previous theorem we get the result.  The following theorem says that the Reachable set is a sort of sink or attractor for trajectories. Theorem 13. For Σ = (X, f, A) a control system.let µ ∈ A(x0 , t0 , [t0 , t1 ]) If (∃τ ∈ [t0 , t1 ]) such that x(µ, x0 , t0 , τ ) ∈ int(R(x0 , t0 , τ )) Then (∀t ∈ [τ, t1 ])x(µ, x0 , t0 , t) ∈ int(R(x0 , t0 , t)) Proof. Let U be a neighbourhood of x(µ, x0 , t0 , τ ) in R(x0 , t0 , τ ), let x ∈ U , let µx (t) ∈ A(x0 , t0 , [t0 , τ ]) such that x(µx , x0 , t0 ,(τ ) = x µx (t) ift ∈ [t0 , τ ] For t0 ∈ [τ, t1 ], define new control µ ¯x by µ ¯x (t) = Define a map µ(t) ift ∈ (τ, t0 ] h : U → R(x0 , t0 , t0 ) by h(x) = x(¯ µx , x0 , t0 , t0 ) 0 Therefore Im(h) ⊂ R(x0 , t0 , t ) so Im(f ) is an open set contained in R(x0 , t0 , t0 ) and x(µx , x0 , t0 , t0 ) ∈ Im(h)  Optimal trajectories lie on the boundary of the Reachable set of the extended control system. We now have a necessary condition for optimal controls.

4. THE PROOF OF THE MAXIMUM PRICIPLE

21

ˆ = (X, ˆ fˆ, A) ˆ be the extended control system for Σ = Theorem 14. Let Σ (X, f, A) with associated Langrangian L. Let (x∗ , µ∗ ) ∈ S(Σ, L, S0 , S1 , [t0 , t1 ]) ˆ x(t0 ), t1 )) Then x ˆ(t1 ) ∈ bd(R(ˆ Rt Proof. x ˆ(t1 ) = (c0 (t1 ), x(t1 )) , by definition c0 (t1 ) = t01 L(x(τ ), µ(τ )) dτ Rt since (x, µ) is optimal, t01 L(x(τ ), µ(τ )) dτ is minimal, so therefore n o ˆ x(t0 ), t1 ) c0 (t1 ) = inf c ∈ R|(c, x(t1 )) ∈ R(ˆ ˆ of x ˆ will contain a set of the form A neighbourhood U ˆ(t1 ) in X (c0 (t1 ) − , c0 (t1 ) + ) × U for U a neighbourhood of x(t1 ) in X ˆ contains elements (c, x(t1 )) with c < c0 (t1 ) and c0 (t1 ) < c thus U ˆ ∩R(ˆ ˆ x(t0 ), t1 ) 6= So U ˆ ˆ ˆ ∅ and U ∩ XR(ˆ x(t0 ), t1 ) 6= ∅  4.2. The Adjoint Response. We now have enough machinary to prove the assertation of the Maximum Principle. The proof follows the lines of [Lewis,2006][3] Theorem 15. Let Σ = (X, f, A) be a control system. Let L be a Lagrangian for Σ. S0 , S1 the initial and terminal sets. If (x∗ , µ∗ ) ∈ S(Σ, L, S0 , S1 , [t0 , t1 ]) Then there is an absolutely continuous map λ∗ : [t0 , t1 ] → Rn and a constant λ∗0 ∈ {−1, 0} such that (1) λ∗0 = −1 , or λ∗ (t0 ) 6= 0 (2) λ∗ is an adjoint response for (Σ, λ∗0 L) along (x∗ , µ∗ ) max ∗ ∗ (3) HΣ,λ∗0 L (x∗ (t), λ∗ (t), µ∗ (t)) = HΣ,λ ∗ L (x (t), λ (t)) , for almost all t ∈ [t0 , t1 ] 0

ˆ x∗ (t0 ), t0 , t1 ) Proof. (x , µ ) is optimal, by Lemma(10) there can’t be points in R(ˆ ∗ n with cost lower than xl (t1 ). So (−1, 0) ∈ R ⊕ R cannot be in the interior of ˆ ∗, x K(µ ˆ0 , t0 , t1 ) ˆ into two closed By Theorem(20) there is a hyperplane Pˆ (t1 ) in R⊕Rn that divides X ∗ ˆ ˆ ˆ ˆ ˆ 2 (t1 ) half spaces X1 (t1 ) and X2 (t1 ) with (−1, 0) ∈ X1 (t1 ) and K(µ , x ˆ0 , t0 , t1 ) ⊂ X ∗ ∗ ˆ (t1 ) ∈ X ˆ (t1 ) ⊥ Pˆ (t1 ) and again using Theorem(20) , ˆ 1 (t1 ) such that λ Let λ ˆ ∗ (t1 ) · vˆ ≤ 0 for all vˆ ∈ K(µ ˆ ∗ (t1 ) · (−1, 0) ≥ 0 and λ ˆ ∗, x ˆ0 , t0 , t1 ) , hence λ∗l (t1 ) ≤ 0 λ ∗ ∗ ˆ ˆ Let λ (t) be the adjoint response equal to λ (t1 ) at time t1 Since fˆ is independent of xl we have that λ˙ ∗l (t) = −∇xl flT (x∗l (t), µ∗ (t)) = 0 , So ˆ ∗ (t) = (λ∗ , λ∗ (t)) λ∗l is a nonpositive constant. The extended adjoint is λ l l ∗ ∗ ∗ ˆ If λl 6= 0 then since we want λl (t) = (0, λ (t)) or (−1, λ∗ (t)) , we can choose as ˆ the extended adjoint (λ∗l )−1 λ(t) which works since (λ∗l )−1 λ∗ (t) is an adjoint for Σ ∗ −1 ∗ ∗ along ((λl ) x (t), µ (t)) ˆ ∗ (t) 6= 0 , since λ ˆ ∗ (t) satisfies the adjoint equation, We must have that (∀t ∈ [t0 , t1 ])λ ∗ ˆ ˆ ∗ (t1 ) 6= 0 if λ (t) was zero at a point, it would be zero at all points, but we chose λ ∗ ∗ , hence if λl = 0 we must have (∀t ∈ [t0 , t1 ])λ (t) 6= 0 Now from Theorem (12) ∗ ∗ ∗ ∗ ∗ max ∗ ˆ∗ ˆ ˆ (ˆ ∗ H (x (t), λ∗ (t)) Σ x (t), λ (t), µ (t)) = HΣ,λl L (x (t), λ (t), µ (t)) = HΣ,λ∗ l for almost all t ∈ [t0 , t1 ]  ∗



Corollary 2. If in addition to Theorem (15) we have that if µ∗ is bounded, then the maximum Hamiltonian is constant in t ∂ max ∗ ∗ ∂t HΣ,λ∗ L (x (t), λ (t)) = 0 l

22

3. THE MAXIMUM PRINCIPLE

4.3. Transversality. Definition 29. Before we can prove the final assertion of the Maximum Principle, we need a few geometric concepts. • For an edged set E = φ(U ). For x ∈ bs(E) , let y ∈ ψ −1 (x) The Tangent Half space to E at x is Tx+ E = ∇y ψ(y)({(v1 , . . . , vk ) | vk ≥ 0})

(4.1)

• Let S be a smooth constraint set represented by Ψ, The tangent space to S at a point x in S is Tx S = Ker(∇x Ψ(x)) • For xs ∈ S, µ ∈ A(x0 , t0 , [t0 , t1 ]) and a Lebesgue time τ ∈ (t0 , t1 ) ∩ Leb(µ, x0 , t0 , t1 ) Let x0 = x(µ, (0, x0 ), t0 , t1 ) , then (4.2)

K(µ, x0 , t0 , t) = cl(conv(cone(Φ(µ, x0 , t0 , t0 , τ )Tx0 S ∪ K(µ, x0 , t0 , t))))

Theorem 16. Let Σ = (X, f, A) be a control system. Let Ψ : X → Rk define a constraint set S. Let µ ∈ A(x0 , t0 , [t0 , t1 ]). Pick τ ∈ (t0 , t1 ) ∩ Leb(µ, x0 , t0 , t1 ). Let E be an edged set with x(µ, x0 , t0 , τ ) ∈ bs(E). + If Tx(τ ) E and K(µ, x0 , t0 , τ ) are not separable. Then (x∗0 ) ∈ S such that (E \ bs(E)) ∩ R(x∗0 , t0 , t1 ) 6= 0 Theorem 17. Let Σ = (X, f, A) be a control system with Lagrangian L. Let S0 , S1 be constraint sets. If (x∗ , µ∗ ) ∈ S(Σ, L, S0 , S1 , [t0 , t1 ]) ˆ ∗, x Then K(µ ˆ0 , t0 , t1 ) and Txˆ+∗ (t1 ) Sˆ1 are separable The proof follows the lines of [Lewis,2006][3] ˆ ∗, x Proof. Proof by Contradiction: Suppose that K(µ ˆ0 , t0 , t1 ) and Txˆ+∗ (t1 ) Sˆ1 are not separable. S ˆ ∗, x ˆ ∗, x Since K(µ ˆ0 , t0 , t1 ) = t∈(t0 ,t1 ) K(µ ˆ0 , t0 , t) ˆ ∗, x ˆ ∗, x There is a Lebesgue time τ ∈ Leb(µ∗ , x0 , t0 , t1 ) such that Φ(µ ˆ0 , t0 , τ, t1 )K(µ ˆ0 , t0 , τ ) + ∗ ˆ ˆ and T ∗ S1 are not separable. Now applying Φ(µ , x ˆ0 , t0 , t1 , τ ) to the previous x ˆ (t1 )

ˆ ∗, x ˆ ∗, x two sets, we that, Φ(µ ˆ0 , t0 , t1 , τ )Txˆ+∗ (t1 ) Sˆ1 and K(µ ˆ0 , t0 , τ ) are not separable. + + ∗ ˆ ˆ ∗, x Φ(µ ,x ˆ0 , t0 , t1 , τ )T ∗ Sˆ1 = T ∗ Sˆτ , since Φ(µ ˆ0 , t0 , t1 , τ ) maps the tangent x ˆ (t1 )

x ˆ (τ )

space at x ˆ(t1 ) to the tangent space at x ˆ(τ ). By Theorem(16) we have that there is a control µx(τ ) : [t0 , τ ] → A and a x ˆ00 ∈ Sˆ0 such that x(µx(τ ) ) ∈ Sˆτ \ bs(Sˆτ ). µx(τ ) extends to µ ¯x(τ ) defined on [t0 , t1 ] and agreeing with µ∗ on [τ, t1 ] as in the proof of Theorem(13). Thus we now have a control µ ¯x(τ ) that steers x ˆ00 ∈ Sˆ0 to Sˆ1 \ bs(Sˆ1 ). ∗ This contradicts the optimality of µ  We now have enough machinary to prove the final assertation in the Maximum Principle. The proof follows the lines of [Lewis,2006][3] Theorem 18. For Σ = (X, f, A) , with Lagrangian L. S0 = Ker(Ψ0 ), S1 = Ker(Ψ1 ) smooth constraint sets. Then there exists an adjoint λ∗ as described in Parts(1-4) in The Maximum Principle such that λ∗ (t0 ) ⊥ Ker(∇x Ψ0 (x∗ (t0 ))) and λ∗ (t1 ) ⊥ Ker(∇x Ψ1 (x∗ (t1 )))

5. EXAMPLES

23

ˆ ∗, x Proof. From the previous Theroem we have that K(µ ˆ0 , t0 , t1 ) and Txˆ+∗ (t1 ) Sˆ1 are separable. ˆ 1 ) = (λl , λ∗ (t1 )) be such that Let λ(t ˆ ∗ (t1 ) · vˆ ≤ 0 λ

ˆ ∗, x vˆ ∈ K(µ ˆ0 , t0 , t1 )

ˆ ∗ (t1 ) · tˆ ≥ 0 λ

tˆ ∈ Txˆ+∗ (t1 ) Sˆ1

ˆ ∗, x ˆ ∗, x K(µ ˆ0 , t0 , t1 ) ⊂ K(µ ˆ0 , t0 , t1 , let λ∗ (t) be the adjoint response equal to λ∗ (t1 ) at t1 , this satisfies the statements in The Maximum Principle. Now Tx+∗ (t1 ) S1 ⊂ Txˆ+∗ (t1 ) Sˆ1  5. Examples 5.1. A circular Example. If a function λ satisfies the adjoint equation ˙ λ(t) = −∇x HΣ,L (x(t), λ(t), µ(t)) Then this yields equations governing the motion of the system as x˙ = y − a1 λ˙ 1 (t) = −λ3 (t)

y˙ = −z + a2 λ˙ 2 (t) = −λ1 (t)

z˙ = x − a3 λ˙ 3 (t) = λ2 (t)

In order to solve for λ , note that ¨ 3 (t) = ... λ1 (t) = −λ˙ 2 (t) = −λ λ 1 (t) Using standard O.D.E. theory gives that a solution of the adjoint as √ √ 3 3 − 2t − 2t t + ω) + D exp sin( t + ω) λ1 (t) = C exp cos( 2 2 √ √ √ √ t t C D 3 C 3 D 3 3 λ2 (t) = ( − ) exp− 2 cos( t + ω) + (− + ) exp− 2 sin( t + ω) 2 2√ 2√ 2 2 √ √2 t t C D 3 3 C 3 D 3 λ3 (t) = ( − ) exp− 2 cos( t + ω) + ( + ) exp− 2 sin( t + ω) 2 2 2 2 2 2 for constants C, ω Now Part (3) of the Maximum Principle says that an optimal control and resulting trajectory maximise the Hamiltonian and since the control space is compact, µ∗ is bounded, so Part(4) says that the maximum Hamiltonian is 0 HΣ,λ∗0 L (x∗ (t), λ∗ (t), µ∗ (t)) =f (x∗ (t), µ∗ (t)) · λ∗ (t) + λ∗0 =(y, −z, x) · λ∗ (t) + (−µ∗1 (t), µ∗2 (t), −µ∗3 (t)) · λ∗ (t) + λ∗0 =

max ∗ ∗ HΣ,λ ∗ L (x (t), λ (t)) 0

=(y, −z, x) · λ∗ (t) + max{(−a1 , a2 , −a3 ) · λ∗ (t) + λ∗0 } a∈A

⇒ (−µ∗1 (t), µ∗2 (t), −µ∗3 (t)) · λ∗ (t) = max{(−a1 , a2 , −a3 ) · λ∗ (t)} a∈A

Thus since A is a compact set bounded by 1, we get that µ∗ (t) = (µ∗1 (t), µ∗2 (t), µ∗2 (t)) = (−sgn(λ∗1 (t)), sgn(λ∗2 (t)), −sgn(λ∗3 (t))) where sgn isthe usual sign of a real valued function,  f (x) > 0 1 sgn(f(x))= −1 f (x) < 0   0 f (x) = 0

24

3. THE MAXIMUM PRINCIPLE

So we now have that the dynamics equation will satisfy on (t − , t + ), for some  equations of the sort x˙ ∗ (t) = y ∗ (t) ± 1, y˙ ∗ (t) = −z ∗ (t) ± 1, z˙ ∗ (t) = x∗ (t) ± 1 depending on the sign of the adjoint, again note that ...∗ x (t) = y¨∗ (t) = −z˙ ∗ (t) = −x ∗ (t) Which have solutions of the form

√ √ t t 3 3 x∗ (t) = −C exp− 2 cos( t + ν) − D exp− 2 sin( t + ν) ± 1 2 2 √ √ √ √ t t D 3 3 C 3 D 3 C y ∗ (t) = ( − ) exp− 2 cos( t + ν) + ( + ) exp− 2 sin( t + ω) ± 1 2 2√ 2 2 √2 √ √2 t t C D 3 3 C 3 D 3 z ∗ (t) = (+ + ) exp− 2 cos( t + ν) + (− + ) exp− 2 sin( t + ν) ± 1 2 2 2 2 2 2

CHAPTER 4

Dynamic Programming 1. The Dynamic Programming Equation The modified cost functional for a trajectory is t0

Z JΣ,L (x(t), µ(t), t) =

L(x(τ ), µ(τ )) dτ + ψ(x(t1 )) t

This varies the starting point and time. Now we want the best control such that the the Payoff is maximised Definition 30. The value function V : X×I → R (1.1)

V (x, t) =

sup

JΣ,I,L (x(t), µ(t), t)

µ∈A(x,t,[t,t1 ])

We shall only consider L-acceptable trajectories so that V (x, t) < ∞ Note that V (x, t1 ) = ψ(x) when there is a final time. Lemma 6. (∀x ∈ Rn )(∀a ∈ A)∀t ∈ [t0 , t1 ] and t1 − t0 > δ > 0 Z (1.2)

t+δ

V (x, t) ≥

L(x(τ ), a) dτ + V (x, t1 ) t0

Proof.(Let µ(t) be an optimal control and define a new control a if t ≤ τ < t + δ µ ˘(τ ) = µ(τ ) else R t+δ The payoff for the time period [t, t + δ) is t L(x(τ ), a) dτ a payoff for the period [t + δ, t1 ] is

R t+δ t

L(x(τ ), a) dτ + V (x(t + δ), t + δ)

but the largest payoff for all controls is V(x,t) by definition, hence The inequality 1.1 holds  25

26

4. DYNAMIC PROGRAMMING

This inequality closely resembles the definition of a derivative. In fact, from the inequality Z V (x(t + δ), t + δ) − V (x, t) 1 t+δ L(x(τ ), a) dτ ≤ 0 + δ δ t Z V (x(t + δ), t + δ) − V (x, t) 1 t+δ L(x(τ ), a) dτ ) ≤ 0 ⇒ (∀a ∈ A) lim ( + δ→0 δ δ t ∂ ⇒ (∀a ∈ A) V (x, t) + ∇x V (x, t) · x(t) ˙ + L(x(t), a) ≤ 0 ∂t ∂ ⇒ (∀a ∈ A) V (x, t) + ∇x V (x, t) · f (x(t), a) + L(x(t), a) ≤ 0 ∂t ∂ ⇔ V (x, t) + max{∇x V (x, t) · f (x(t), a) + L(x(t), a)} ≤ 0 a∈A ∂t ∂ ⇔ V (x, t) + max{HΣ (x(t), ∇x V (x, t), a) + L(x(t), a)} ≤ 0 a∈A ∂t ∂ ⇔ V (x, t) + max{HΣ,L (x(t), ∇x V (x, t), a)+} ≤ 0 a∈A ∂t Corollary 3. If (x∗ , µ∗ ) ∈ S(Σ, L, x(t), x(t1 ), [t, t1 ]) (1.3)

max{ a∈A

∂ V (x, t) + HΣ,L (x(t), ∇x V (x, t), a)+} = 0 ∂t

The Equation (1.3) is known as the Hamilton-Jacobi-Bellman equation 2. Link to The Maximum Principle In the proof of the Maximum Principle, we showed the existence of an adjoint funtion by finding a point orthogonal to the Tangent Cone at t1 . Now we shall provide a method for calculating an adjoint using Dynamic Programming Theorem 19. Let Σ = (X,f,A) be a control system with Lagrangian L. If (x∗ , µ∗ ) ∈ S(Σ, L, [t0 , t1 ]) and the value function V(x,t) is of class C 2 Then for t ∈ [t0 , t1 ] λ∗ (t) = −∇x V (x∗ (t), t) is an adjoint response along (x∗ , µ∗ ) Proof. From our definition of λ∗ (t) , we have that d λ˙ ∗ (t) = ∇x V (x∗ (t), t) + (∇x ∇x V (x∗ (t), t))x(t) ˙ dt Now since V (x∗ (t), t) satisfies Equation(1.3) and (x∗ , µ∗ ) ∈ S(Σ, L, [t0 , t1 ]), so the Extended Hamiltonian is maximised , we have that d V (x∗ (t), t) + f (x(t), µ(t)) · ∇x V (x(t), t) + L(x(t), µ(t)) dt Now since agian by the HJB equation we have that ∂ ∂t V (x, t) + ∇x V (x, t) · f (x, µ(t)) + L(x, µ(t)) ≤ 0

2. LINK TO THE MAXIMUM PRINCIPLE

27

is bounded above by 0, and achieves this bound for at x∗ (t) , we have that ∂ 0 =∇x ( V (x∗ (t), t) + ∇x V (x∗ (t), t) · f (x∗ (t), µ∗ (t)) + L(x∗ (t), µ∗ (t))) ∂t ∂ =∇x V (x∗ (t), t) + ∇x ∇x V (x∗ (t), t) ∂t + (∇x f (x∗ (t), µ∗ (t)))∇x V (x(t), t) + ∇x L(x∗ (t), µ∗ (t)) =λ∗ (t) + ∇x (∇x f (x∗ (t), µ∗ (t)))∇x V (x(t), t) + ∇x L(x∗ (t), µ∗ (t)) The last line from the calculation of λ∗ (t) previously And now from the definition of λ∗ (t) , we get that λ∗ (t) = − (∇x f (x∗ (t), µ∗ (t)))λ∗ (t) − ∇x L(x∗ (t), µ∗ (t)) = − ∇x (f (x∗ (t), µ∗ (t)) · λ∗ (t) + L(x∗ (t), µ∗ (t))) = − ∇x HΣ,L (x∗ (t), λ∗ (t), µ∗ (t)) so λ∗ (t) is an adjoint



APPENDIX A

Things Topological and Geometric 1. Hyperplanes The use of Separation theorems is prevalent throughout this paper and in fact also in many texts on Control Theory suggesting that this particular geoemetric notion is of great importance and perhaps connected with Control Theory in some manner. We now state a few theorems that reference was made to in this paper. Definition 31. • A hyperplane Pλ,c in Rn for some real number c and some non-zero vector is defined as Pλ,c = {x ∈ Rn | λ · x = c} • A hyperplane Pλ,a in Rn will separate Rn into two open spaces in a natural way Let − + Xλ,c = {x ∈ Rn | λ · x < c} Xλ,c = {x ∈ Rn | λ · x > c}

Notation 4. When c=0 , we will refer to the Hyperplane and half spaces as Pλ , Xλ− , Xλ+ • For Y ⊂ Rn , a support hyperplane for A is a any hyperplane Pλ,c such that + Y ⊂ Xλ,c ∪ Pλ,c Note: If Pλ,a is a support hyperplane, then P−λ,c is also a support hyperplane but has the half spaces switched around • For subsets Y, Z ⊂ Rn a separating hyperplane for Y,Z is any hyperplane Pλ,c such that + − Y ⊂ Y ⊂ Xλ,c ∪ Pλ,c Z ⊂ Y ⊂ Xλ,c ∪ Pλ,c

Theorem 20. If C is a proper convex subset of Rn , then there is a supporting hyperplane for C Theorem 21. Let C ⊂ Rn be a convex subset. If x0 ∈ / int(C) Then there exists a separating Hyperplane for {x0 } and C 2. Cones,Convex Sets,Constraint Sets and Edged Sets Definition 32. Let S ⊂ Rn and S 6= ∅ • A convex combination of elements v1 , . . . vk of S is any element Pk Pk i=1 λi vi where all the λi > 0 and i=1 λi = 1 • A coned convex combination of elements v1 , . . . vk of S is any element Pk i=1 λi vi where all the λi > 0 • An affine combination of elements v1 , . . . vk of S is any element Pk i=1 λi vi where all the λi ∈ R 29

30

A. THINGS TOPOLOGICAL AND GEOMETRIC

• The convex hull of S, conv(S) is the smallest convex set in Rn that contains S • The coned hull of S, cone(S) is the smallest cone in Rn that contains S • The affine hull of S, aff(S) is the smallest affine set in Rn that contains S Definition 33. A smooth constraint set S of a control system Σ = (X, f, A) is a submanifold of X and can be be represented as S = Ker(Ψ) = {x ∈ X | Ψ(x) = 0} , where d Ψ(x) is surjective Ψ : X → Rk , is of class C1 and such that (∀x ∈ KerΨ) dx An example would be a the circle S 1 as a subset of the sphere S 2 , or Rk ⊂ Rn for k ¡ n , in which case the map Ψ would be the map, Ψ((x1 , ...xk , ...xn )) = (0, 0...., xk+1 , ...xn ) Definition 34. An edged set E is a subset of Rn such that E=φ(U ) where • U = U0 ∩ {(y1 , . . . , yk ) | yk ≥ 0}, for U0 a neighbourhood of 0 in Rk d φ(y) is injective • φ : U0 −→ Rn is a homeomorphism onto φ(U0 ) and dy The base bs(E) of E is the set bs(E) = φ(U0 \ U0k ) , where U0k = {(y1 , . . . , yk ) | yk 6= 0} The dimension of an edged set is the dimension of the domain of φ, dim(dom(φ))=k 3. Analytical Results Theorem 22. Let M be a compact subset of Rn .(∀x, y ∈ M ) Let Ux be a neighbourhood of x contaning y. let g be a C 1 map g : Ux → Rk . Then there exists a C > 0 , such that kg(y) − g(x)k ≤ C ky − xk Theorem 23. For a topologival space X, let {fj }j∈J be a family of continuous real valued functions on X. Let f max : X → R be defined as f max (x) = supj∈J fj (x) Then (∀x ∈ X)(∃ > 0)f max (x) − f max (x0 ) <  , for all x0 ∈ Ux ,some neighbourhood of x Proof. For x ∈ X,  > 0, consider f max (x) −  Since f max (x) is the smallest upper bound for (fjmax (x))j∈J f max (x) −  is not an upper bound of (fjmax (x))j∈J So (∃j0 ∈ J)fj0 (x) > f max (x) −  Since we are in the extended real line (∃q ∈ R)fjmax (x) ≤ q 0 So (∀ > 0)f max (x) −  < q implies that f max (x) ≤ q Now (∀r ∈ R)(r, ∞] is open in R, so (∀j ∈ J)fj−1 ((r, ∞]) is open in X, however (∀j ∈ J)x ∈ fj−1 ((r, ∞] x ∈ ∪j∈J fj−1 ((r, ∞] ⇔ (∀j ∈ J)r < fj (x) ≤ ∞⇔ r < f max (x) ≤ ∞ ⇔ x ∈ (f max )−1 (r, ∞] which says that (f max )−1 (r, ∞] since it is a union of open sets Hence (f max )−1 (f max (x) − , ∞] is open and contains x, so certainly there is a neighbourhood Ux of x such that (∀x0 ∈ Ux ) f max (x) − f max (x0 ) <  

Bibliography [1] Fleming, W.H. and Soner, M: Controlled Markov Processes and Viscosity Solutions, second edition, Springer, (2006). [2] Hermann, R and Krener, A.J.: Nonlinear Controllability and Observability, IEEE transactions on automatic control, Vol22, No.5 , (1977) [3] Lewis, A.D: The Maximum Principle of Pontryagin in control and optimal control, Course notes, Available from:http://www.mast.queensu.ca/ andrew/teaching/MP-course/,(2006). [4] Pontryagin, L. S., Boltyanskii, V. G., Gamkrelidze, R. V., and Mishchenko, E. F.: The Mathematical Theory of Optimal Processes, Interscience Publishers, translation from Russian by K. N. Trirogoff, (1962).

31

Suggest Documents