MATRICES, PARTIAL DERIVATIVES AND THE CHAIN RULE. The definition of the dot-product can be easily extended to dimensions > 3

MATRICES, PARTIAL DERIVATIVES AND THE CHAIN RULE STEFAN GESCHKE 1. The dot-product in higher dimensions The definition of the dot-product can be easi...
Author: Lorin Newton
1 downloads 1 Views 101KB Size
MATRICES, PARTIAL DERIVATIVES AND THE CHAIN RULE STEFAN GESCHKE

1. The dot-product in higher dimensions The definition of the dot-product can be easily extended to dimensions > 3. Definition 1.1. If x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) are vectors in Rn , then the dot-product x · y is defined by x · y = (x1 , . . . , xn ) · (y1 , . . . , yn ) =

n X

xi yi = x1 y1 + · · · + xn yn .

i=1

Note that the dot-product of two vectors is a real number. Example 1.2. We compute the dot-product of two vectors in R4 : (1, 2, 3, 4) · (1, 2, 0, −1) = 1 · 1 + 2 · 2 + 3 · 0 + 4 · (−1) = 1 + 4 + 0 − 4 = 1 The dot-product in dimension n behaves as well as in dimension 3. Theorem 1.3. Let x, y, z ∈ Rn and let λ ∈ R. Then the following hold: (1) x · y = y · x (2) x · (y + z) = x · y + x · z (3) (x + y) · z = x · z + y · z (4) (λx) · y = x · (λy) = λ(x · y) (5) ~0 · x = x · ~0 = 0 2. Matrices Definition 2.1. Let m and n be natural numbers (positive integers). An m-by-n matrix is an array 

a11

a12

...

a1n



  a21   .  ..  am1

a22 .. .

... .. .

a2n .. .

     

am2

...

amn

of real numbers with m rows and n columns. Each entry has two indices, the first denoting the row and the second the column. The matrix above is often denoted by (aij )1≤i≤m,1≤j≤n or just (aij ) if m and n are clear from the context. If A = (aij )1≤i≤m,1≤j≤n and B = (bij )1≤i≤m,1≤j≤n are matrices of the same format, then their sum A + B is defined componentwise, i.e., A + B = (aij + bij )1≤i≤m,1≤j≤n . 1

2

STEFAN GESCHKE

Example 2.2. a) The vector (1, 2, 3) is a  1.5   π  5

1-by-3 matrix. The array  2  4   e

is a 3-by-2 matrix. b) 

1.5

  π  5

2





1

0





1 + 1.5

   1  = π+0 5+1 e

   4  + 0 1 e

2+0





2.5

2



 5   2e

   4+1  = π 6 e+e

Using the dot-product, we can define products of matrices of suitable formats. Definition 2.3. Let `, m, n be natural numbers. If A = (aij )1≤i≤`,1≤j≤m and B = (bjk )1≤j≤m,1≤k≤n are matrices, then the product A · B is defined to be the `-by-n matrix C = (cik )1≤i≤`,1≤k≤n where m X

cik =

aij bjk = ai1 b1k + · · · + aim bmk .

j=1

In other words, A · B is the matrix whose entry in the ith row and k th column is the dot-product of the ith row of A and the k th column of B. Note that the product A · B can only be formed  if A has as many of columns as B has rows. Moreover, if A is a 1-by-n matrix a1 a2 . . . an and B is an n-by-1 matrix   b1    b2     . ,  ..    bn then A · B is simply the dot-product (a1 , . . . , an ) · (b1 , . . . , bn ). Example 2.4. a) ! e π 0 · 0 1 2

1

! =

1

e·0+π·2

e·1+π·1

0·0+1·2

0·1+1·1

! =



e+π

2

1

!

b) 0

1

2

1

! ·

e

π

0

1

! =

0

1

2e

2π + 1

!

Together with a), this shows that matrix multiplication is not commutative. c)  

1

2

3

e

  ·  1 0

π



  0  = e+2 1

π+3



MATRICES, PARTIAL DERIVATIVES AND THE CHAIN RULE

3

d) 

 

1

2

3

  1  0

0

   0  · 1 1 0

1

e

π





   0  = 1

e+2

π+3



e

π

1

1

  

Theorem 2.5. Let A and B be `-by-m matrices and let C and D be m-by-n matrices. Then A · (C + D) = A · C + A · D

(A + B) · C = A · C + B · C.

and

It follows that (A + B) · (C + D) = A · C + A · B + B · C + B · D. 3. Derivatives Definition 3.1. Let f : Rn → Rm be a function and let f1 , . . . , fm : Rn → R be its component functions (coordinate functions). We assume that all the partial derivatives

∂fi ∂xj ,

1 ≤ i ≤ m, 1 ≤ j ≤ n, exist and are continuous, at least on some

open region U ⊆ Rn . Then for each (a1 , . . . , an ) ∈ U we define the derivative of f at (a1 , . . . , an ) to be the m-by-n matrix  ∂f1 ∂f1 (a1 , . . . , an ) ∂dx (a1 , . . . , an ) . . . 1 2  ∂dx  ∂f2 (a1 , . . . , an ) ∂f2 (a1 , . . . , an ) . . . ∂dx2  ∂dx1 Df (a1 , . . . , an ) =  .. .. ..  . . .  ∂fm ∂fm ∂dx1 (a1 , . . . , an ) ∂dx2 (a1 , . . . , an ) . . .

∂f1 ∂dxn (a1 , . . . , an ) ∂f2 ∂dxn (a1 , . . . , an )

.. .

∂fm ∂dxn (a1 , . . . , an )

      

Example 3.2. a) Let f : R → R2 be defined by f (t) = (cos t, sin t). Then for each a ∈ R we have ∂ cos t ∂t (a) ∂ sin t ∂t (a)

Df (a) =

!

− sin a

=

! .

cos a

Note that this differs in notation from the previously defined f 0 (a) = (− sin a, cos a). For the chain rule that we will discuss below, it is however important to pay attention to the fact that Df (a) is a 2-by-1 matrix, i.e., a vector written vertically, as opposed to a 1-by-2 matrix, i.e., a vector written horizontally. b) Let f : R3 → R be defined by f (x, y, z) = x2 + y 2 + z 2 . Then for all (a, b, c) ∈ R3 , Df (a, b, c) =



∂f ∂x (a, b, c)

∂f ∂y (a, b, c)

∂f ∂z (a, b, c)



=



2a 2b 2c



.

c) Let f : R2 → R2 be defined by f (x, y) =

1

2

0

1

! ·

x y

!

x + 2y

=

! .

y

The component functions are f1 (x, y) = x + 2y and f2 (x, y) = y. Now for all (a, b) ∈ R2 , Df (a, b) = What do you observe?

∂f1 ∂x (a, b) ∂f1 ∂x (a, b)

∂f1 ∂y (a, b) ∂f1 ∂y (a, b)

! =

1

2

0

1

! .

4

STEFAN GESCHKE

d) In the previous examples we considered functions of the form f (x1 , . . . , xn ) and computed the derivative at a point (a1 , . . . , an ). This was to point out the distinction between the variables x1 , . . . , xn with respect to which we take partial derivatives and the points at which we compute the derivative. In the future we will not be as careful, see the following example. Let f : R3 → R3 be defined by f (r, ϕ, z) = (r cos ϕ, r sin ϕ, z). Note that f computes from cylindrical coordinates the cartesian coordinates of a point. We have



cos ϕ

 Df (r, ϕ, z) =   −r sin ϕ 0

sin ϕ

0



 r cos ϕ 0  . 0 1

Theorem 3.3. Let f, g : Rn → Rm be functions and assume that all the relevant derivatives exist. Then the following hold: (1) If f is constant, then Df = 0, where 0 denotes the m-by-n matrix whose entries are all the real number 0. (2) D(f + g) = Df + Dg, where the first + denote the sum of two functions and the second + denotes the sum of two matrices. (3) If f is of the form 

x1



   f (x1 , . . . , xn ) = A ·   

x2 .. .

     

xn for some m-by-n matrix A, then for all (x1 , . . . , xn ) ∈ Rn , Df (x1 , . . . , xn ) = A. (See Example 3.2 c).) 4. The chain rule in higher dimensions Definition 4.1. Let f : Rn → Rm and g : Rm → R` be functions. Then their composition g ◦ f : Rn → R` is defined by (g ◦ f )(x1 . . . , xn ) = g(f (x1 , . . . , xn )). Note that this is a reasonable definition because the range of f is contained in R

m

and g is defined on Rm .

Example 4.2. a) h(t) = sin2 t is the composition g ◦ f of the functions g(x) = x2 and f (t) = sin t. Note that (f ◦ g)(x) = sin x2 . b) Let f (t) = (cos t, sin t) and let g(x, y) = x2 + y 2 . Then (g ◦ f )(t) = cos2 t + sin2 t = 1 for all t ∈ R. There is a close connection between matrix multiplication and composition of functions.

MATRICES, PARTIAL DERIVATIVES AND THE CHAIN RULE

5

Theorem 4.3. If f : R` → Rm and g : Rm → Rn are functions such that there is an `-by-m matrix A and an n-by-m matrix B such that    x1  .   .  f (x1 , . . . , x` ) = A ·  and g(y1 , . . . , ym ) = B ·   .   x`

 y1 ..   . , ym

then 

 x1  .  .  (g ◦ f )(x1 , . . . , x` ) = (B · A) ·   . . x` This theorem is just a special case of the fact that matrix multiplication satisfies the associative law: if A, B and C are matrices of suitable formats, then (A·B)·C = A · (B · C). More precisely, if f , g, A and B are as in the theorem above, then      x1 x1   .   .   .   .  (g ◦ f )(x1 , . . . , x` ) = B ·  A ·  .  = (B · A) ·  .  . x` x` Theorem 4.4 (Chain Rule). Let f : Rn → Rm and g : Rm → R` and assume that all the relevant partial derivatives exist and are continuous. Then for all (a1 , . . . , an ) ∈ Rn , D(g ◦ f )(a1 , . . . , an ) = Dg(f (a1 , . . . , an )) · Df (a1 , . . . , an ). Note that for functions from R to R this is just the usual 1-dimensional chain rule. Example 4.5. a) Let f and g be as in Example 4.2 b). Since g ◦ f is constant, D(g ◦ f )(t) = (g ◦ f )0 (t) = 0. On the other hand, D(g ◦ f )(t) = Dg(f (t)) · Df (t) = Dg(cos t, sin t) · Df (t) = !   − sin t = −2 cos t sin t + 2 sin t cos t = 0 2 cos t 2 sin t · cos t b) Let f (x, y, z) = (x2 + y − z, x − y 2 + 3z) and g(u, v) = (u + v, u − v). Then ! 1 1 Dg(u, v) = 1 −1 and therefore D(g ◦ f )(x, y, z) =

1

1

1

−1

! ·

2x

1

−1

1

−2y

3 =

!

2x + 1

1 − 2y

2

2x − 1

1 + 2y

−4

!

6

STEFAN GESCHKE

Definition 4.6. If f : Rn → R, then   ∂f ∂f (x1 , . . . , xn ), . . . , (x1 , . . . , xn ) ∂x1 ∂xn is called the gradient of f at (x1 , . . . , xn ) and denoted by ∇f (x1 , . . . , xn ). Note that ∇f (x1 , . . . , xn ) is the vector with the same entries as the 1-by-n matrix Df (x1 , . . . , xn ). Example 4.7. Let f (x, y, z) = x2 + 2y − z 3 . Then ∇f (x, y, z) = (2x, 2, −3z 2 ). Corollary 4.8. If f : R → Rm and g : Rm → R, then the chain rule reduces to D(g ◦ f )(t) = ∇g(f (t)) · f 0 (t) =

∂g ∂g (f (t)) · f1 (t) + · · · + (f (t)) · fm (t), ∂x1 ∂xm

where f1 , . . . , fm are the component functions of f . (See Example 4.4 a).) Example 4.9. A typical application of this corollary is the following: f : R → R3 describes the position of an airplane at time t, for instance f (t) = (100 cos t, 100 sin t, t). The function g : R3 → R describe the temperature at a point (x, y, z), for instance g(x, y, z) = 70 − z. The gradient of g at (x, y, z) is ∇g(x, y, z) = (0, 0, −1). The derivative of f at t is f 0 (t) = (−100 sin t, 100 cos t, 1). Now D(g ◦ f )(t) = (g ◦ f )0 (t) = ∇g(100 cos t, 100 sin t, t) · f 0 (t) = (0, 0, −1) · (−100 sin t, 100 cos t, 1) = −1. The reason this is so simple in this particular case is that g(x, y, z) only depends on z. It is actually easier to compute the derivative of the composition by first computing the composition and then its derivative. We have (g ◦ f )(t) = 70 − t. If g is more complicated, the chain rule actually helps. Suppose now that  x2 x g(x, y, z) = 70 + 200 − z. Then ∇g(x, y, z) = 100 , 0, −1 . Hence D(g ◦ f )(t) = (g ◦ f )0 (t) = ∇g(100 cos t, 100 sin t, t) · f 0 (t) = (cos t, 0, −1) · (−100 sin t, 100 cos t, 1) = −100 cos t sin t − 1.

Suggest Documents