Many of the concepts for functions of one variable can be extended to functions of several variables

Chapter 3 Unconstrained Optimization: Functions of Several Variables Many of the concepts for functions of one variable can be extended to functions ...
Author: Samantha York
11 downloads 0 Views 101KB Size
Chapter 3

Unconstrained Optimization: Functions of Several Variables Many of the concepts for functions of one variable can be extended to functions of several variables. For example, the gradient extends the notion of derivative. In this chapter, we review the notion of gradient, the formula for small changes, how to nd extrema and the notion of convexity.

3.1 Gradient Given a function f of n variables x1; x2; : : :; xn , we de ne the partial derivative relative to variable @f , to be the derivative of f with respect to xi treating all variables except xi as xi , written as @x constant. i

Example 3.1.1 Compute the partial derivatives of f (x1; x2) = (x1 , 2)2 + 2(x2 , 1)2. The answer is:

@f @x1 (x1; x2)

= 2(x1 , 2);

@f @x2 (x1; x2)

= 4(x2 , 1).

@f (x) = Let x denote the vector (x1; x2; : : :; xn ). With this notation, f (x) = f (x1; x20 ; : : :; xn ), @x 1 @f ( x ) BB @x@f1 (x) CC B @x2. CC. The @f @x (x1; x2; : : :; xn), etc. The gradient of f at x, written rf (x), is the vector B B@ .. CA @f @x (x) gradient vector rf (x) gives the direction of steepest ascent of the function f at point x. The gradient acts like the derivative in that small changes around a given point x can be estimated using the gradient. i

i

n

f (x + )  f (x ) + rf (x)

where  = (1; : : :; n ) denotes the vector of changes.

Example 3.1.2 If f (x1; x2) = x21 , 3x1x2 + x22, then f (1; 1) = ,1. What about f (1:01; 1:01)? @f @f (x1 ; x2) = 2x1 , 3x2 and @x (x1 ; x2) = In this case, x = (1; 1) and  = (0:01; 0:01). Since @x 1 2 ,3x1 + 2x2, we get 37

38CHAPTER 3. UNCONSTRAINED OPTIMIZATION: FUNCTIONS OF SEVERAL VARIABLES

.

rf (1; 1) = ,,11

!

! , 1 So f (1:01; 1:01) = f ((1; 1)+(0:01; 0:01))  f (1; 1)+(0:01; 0:01)rf (1; 1) = ,1+(0:01; 0:01) ,1 =

,1:02.

Example 3.1.3 Suppose that we want to put away a shing pole in a closet having dimensions 3

by 5 by 1 feet. If the ends of the pole are placed at opposite corners, there is room for a pole of length, p f (x1 ; x2; x3) = f (3; 5; 1) = x21 + x22 + x23 = 5:9ft: It turns out that the actual dimensions of the closet are 3 + 1 , 5 + 2 and 1 + 3 feet, where 1 , 2 and 3 are small correction terms. What is the change in pole length, taking into account these corrections? By the formula for small changes, the change in pole length is f (3 + 1 ; 5 + 2 ; 1 + 3 ) , f (3; 5; 1)  (1; 2; 3)rf (3; 5; 1):

So, we need to compute the partial derivatives of f . For i = 1; 2; 3 @ x f (x1; x2; x3) = p 2 i 2 2 : @xi x1 + x2 + x3

Now we get

0 1 0:51 (1; 2; 3)rf (3; 5; 1) = (1; 2; 3) B @ 0:85 CA = 0:511 + 0:852 + 0:173: 0:17

Exercise 28 Consider the function f (x1; x2) = x1 ln x2. (a) Compute the gradient of f . (b) Give the value of the function f and give its gradient at the point (3; 1). (c) Use the formula for small changes to obtain an approximate value of the function at the point (2:99; 1:05).

Exercise 29 Consider a conical drinking cup with height h and radius r at the open end. The volume of the cup is V (r; h) = 3 r2h.

a) Suppose the cone is now 5 cm high with radius 2 cm. Compute its volume. b) Compute the partial derivatives @V=@r and @V=@h at the current height and radius. c) By about what fraction (i.e., percentage) would the volume change if the cone were lengthened 10%? (Use the partial derivatives.) d) If the radius were increased 5%?

3.2. MAXIMUM AND MINIMUM

39

e) If both were done simultaneously? Hessian matrix 2

Second partials @x@ @xf (x) are obtained from f (x) by taking the derivative relative to xi (this @f @f (x) relative to x . So we yields the rst partial @x (x) ) and then by taking the derivative of @x j @2f @ 2f can compute @x1@x1 (x), @x1 @x2 (x) and so on. These values are arranged into the Hessian matrix i

j

i

i

0 BB H (x) = B BB @

@2f @x12@x1 (x) @ f @x2 @x1 (x)

@ 2f @x12@x2 (x) @ f @x2 @x2 (x)

 

@2f @x12@xn (x) @ f @x2 @xn (x)

@2f @xn @x1 (x)

@ 2f @xn @x2 (x)



@2f @xn@xn (x)

.. .

.. .

The Hessian matrix is a symmetric matrix, that is

...

@2f @xi @xj (x)

.. .

1 CC CC CA

2

f = @x@ @x (x). j

i

Example 3.1.1 (continued): Find the Hessian matrix of f (x1; x2) = (x1 , 2)2 + 2(x2 , 1)2: ! The answer is H (x) =

2 0 : 0 4

3.2 Maximum and Minimum Optima can occur in three places: 1. at the boundary of the domain, 2. at a nondi erentiable point, or 3. at a point x with rf (x ) = 0. We will identify the rst type of point with Kuhn{Tucker conditions (see next chapter). The second type is found only by ad hoc methods. The third type of point can be found by solving the gradient equations. In the remainder of this chapter, we discuss the important case where rf (x ) = 0. To identify if a point x with zero gradient is a local maximum or local minimum, check the Hessian.

 If H (x) is positive de nite then x is a local minimum.  If H (x) is negative de nite, then x is a local maximum. Remember (Section 1.6) that these properties can be checked by computing the determinants of the principal minors.

Example 3.2.1 Find the local extrema of f (x1; x2) = x31 + x32 , 3x1x2.

40CHAPTER 3. UNCONSTRAINED OPTIMIZATION: FUNCTIONS OF SEVERAL VARIABLES This function is everywhere di erentiable, so extrema can only occur at points x such that rf (x) = 0. ! 2 , 3x 3 x 2 1 rf (x) = 3x2 , 3x1 2 This equals 0 i (x1 ; x2) = (0; 0) or (1; 1). The Hessian is H (x) =

So,

6x1 ,3 ,3 6x2

!

!

H (0; 0) = ,03 ,03 Let H1 denote the rst principal minor of H (0; 0) and let H2 denote its second principal minor (see Section 1.6). Then det(H1) = 0 and det(H2) = ,9. Therefore H (0; 0) is neither positive nor negative de nite.

!

H (1; 1) = ,63 ,63 Its rst principal minor has det(H1) = 6 > 0 and its second principal minor has det(H2) = 36 , 9 = 25 > 0. Therefore H (1; 1) is positive de nite, which implies that (1; 1) is a local minimum.

Example 3.2.2 Jane and Jim invested $20,000 in the design and development of a new product.

They can manufacture it for $2 per unit. For the next step, they hired marketing consultants XYZ. In a nutshell, XYZ's conclusions are the following: if Jane and Jim spend $a on advertizing and sell the product at price p (per unit), they will sell

p

2; 000 + 4 a , 20p units. Using this gure, express the pro t that Jane and Jim will make as a function of a and p. What price and level of advertising will maximize their pro ts?

p

The revenue from sales is (2; 000 + 4pa , 20p)p. The production costs are (2; 000 + 4 a , 20p)2, the development cost is $20,000 and the cost of advertizing is a. Therefore, Jane and Jim's pro t is

p

f (p; a) = (2; 000 + 4 a , 20p)(p , 2) , a , 20; 000

To nd the maximum pro t, we compute the partial derivatives of f and set them to 0: p @f @p (p; a) = 2; 040 + 4 a , 40p = 0: @f (p; a) = 2(p , 2)=pa , 1 = 0: @a Solving this system of two equations yields p = 63:25

a = 15; 006:25

We verify that this is a maximum by computing the Hessian.

3.3. GLOBAL OPTIMA

41

! pa , 40 2 = p p H (x) = 2= a ,(p , 2)=a a p det(H ) = ,40 < 0 and det(H ) = 40(p , 2)=a a , 4=a > 0 at the point p = 63:25, a = 15; 006:25. 1

2

So, indeed, this solution maximizes pro t.

Example 3.2.3 Find the local extrema of f (x1; x2; x3) = x21 + (x1 + x2)2 + (x1 + x3)2. @f @x1 (x) @f @x2 (x) @f @x3 (x)

= 2x1 + 2(x1 + x2 ) + 2(x1 + x3 ) = 2(x1 + x2 ) = 2(x1 + x3 ) Setting these partial derivatives to 0 yields the unique solution x1 = x2 = x3 = 0. The Hessian matrix is

0 1 6 2 2 H (0; 0; 0) = B @ 2 2 0 CA

2 0 2 The determinants of the principal minors are det(H1) = 6 > 0, det(H2) = 12 , 4 = 8 > 0 and det(H3) = 24 , 8 , 8 = 8 > 0. So H (0; 0; 0) is positive de nite and the solution x1 = x2 = x3 = 0 is a minimum.

Exercise 30 Find maxima or minima of the following functions when possible. (a) f (x1; x2; x3) = ,x21 , 3x22 , 10x23 + 4x1 + 24x2 + 20x3 (b) f (x1; x2; x3) = x1 x2 + x2 x3 + x3 x1 , 2x1 , 2x2 , 2x3 Exercise 31 Consider the function of three variables given by f (x1; x2; x3) = x21 , x1 , x1x2 + x22 , x2 + x43 , 4x3 : (a) Compute the gradient rf (x1; x2; x3). (b) Compute the Hessian matrix H (x1; x2; x3). (c) Use the gradient to nd a local extremum of f . Hint: if x33 = 1, then x3 = 1. (d) Compute the three principal minors of the Hessian matrix and use them to identify this extremum as a local minimum or a local maximum.

3.3 Global Optima Finding global maxima and minima is harder. There is one case that is of interest. We say that a domain is convex if every line drawn between two points in the domain lies within the domain. We say that a function f is convex if the line connecting any two points lies above the function. That is, for all x; y in the domain and 0 < < 1, we have f ( x + (1 , )y )  f (x) + (1 , )f (y ), as before (see Chapter 2).

42CHAPTER 3. UNCONSTRAINED OPTIMIZATION: FUNCTIONS OF SEVERAL VARIABLES

 If a function is convex on a convex domain, then any local minimum is a global minimum.  If a function is concave on a convex domain, then any local maximum is a global maximum. To check that a function is convex on a domain, check that its Hessian matrix H (x) is positive semide nite for every point x in the domain. To check that a function is concave, check that its Hessian is negative semide nite for every point in the domain.

Example 3.3.1 Show that the function f (x1; x2; x3) = x41 + (x1 + x2)2 + (x1 + x3)2 is convex over 0, det(H2 ) = 12x21  0 and det(H3) = 48x21  0. So H (x1; x2; x3) is positive semide nite for all (x1; x2; x3) in