Symmetric and Hermitian Matrices

i i i Chapter 5 Symmetric and Hermitian Matrices In this chapter, we discuss the special classes of symmetric and Hermitian matrices. We will con...
1 downloads 0 Views 144KB Size
i

i

i

Chapter 5

Symmetric and Hermitian Matrices

In this chapter, we discuss the special classes of symmetric and Hermitian matrices. We will conclude the chapter with a few words about so-called Normal matrices. Before we begin, we mention one consequence of the last chapter that will be useful in a proof of the unitary diagonalization of Hermitian matrices. Let A be an m × n matrix with m ≥ n, and assume (for the moment) that A has linearly independent columns. Then if the Gram-Schmidt process is applied to the columns of A, the result can be expressed in terms of a matrix factorization ˜ R, ˜ where the orthogonal vectors are the columns in Q, ˜ and R ˜ is unit upper A=Q ˜ triangular n × n with the inner products as the entries in R. Pi−1 qH a For example, consider that qi := ai − k=1 qHk q i qk . Rearranging the equak k tion, i−1 n X X qH k ai ai = qi + q + 0qk k qH q k=i+1 k=1 k k where the right hand side is a linear combination of the columns of the first i (only!) ˜ So this equation can be rewritten columns of the matrix Q.   H        ˜ ai = Q       

q1 ai qH 1 q1 qH 2 ai qH 2 q2

...

qH i−1 ai qH i−1 qi−1

1 0 .. . 0

       .       

Putting such equations together for i = 1, . . . , n, we arrive at the desired result, ˜ R. ˜ A=Q 73

“notes2” 2013/4/9 page 73 i

i

i

i

74

Chapter 5. Symmetric and Hermitian Matrices

˜ have orthonormal columns. That means Now, we really would prefer that Q ˜ column scaling each column of Q by 1 divided by its length in the 2-norm. This ˜ by a diagonal matrix D ˜ that contains corresponds to postmultiplication of Q 1/kqi k2 . ˜R ˜ =Q ˜D ˜D ˜ −1 R ˜ = QR, where now Q has orthonormal columns Thus, A = Q and R is still upper triangular, but it’s rows have been scaled by the entries in the ˜ −1 . diagonal matrix D The factorization A = QR is called the QR factorization of A. Although we assumed in the preceding for ease of discussion that A had linearly independent columns, in fact such a factorization exists for any matrix A, the fine details are omitted.

5.1

Diagonalization of Hermitian Matrices

Definition 5.1. A matrix is said to be Hermitian if AH = A, where the H superscript means Hermitian (i.e. conjugate) transpose. Some texts may use an asterisk for conjugate transpose, that is, A∗ means the same as A. If A is Hermitian, it means that aij = a¯ji for every i, j pair. Thus, the diagonal of a Hermitian matrix must be real. Definition 5.2. A matrix is said to be symmetric if AT = A. Clearly, if A is real, then AH = AT , so a real-valued Hermitian matrix is symmetric. However, if A has complex entries, symmetric and Hermitian have different meanings. There is such a thing as a complex-symmetric matrix (aij = aji ) - a complex symmetric matrix need not have real diagonal entries. Here are a few examples. Symmetric Matrices:       0 −2 4 2 − i 3 + 4i 2 1 5 ; . A= A= ; A =  −2 7 3 + 4i 8 − 7i 1 4 4 5 −8 Hermitian Matrices: A=



6 8 + 4i 8 − 4i 9



;



 1 −2 + 3i 8 4 6 − 7i  ; A =  −2 − 3i 8 6 + 7i 5

A=



3 5 5 8

As the examples show, the set of all real symmetric matrices is included within the set of all Hermitian matrices, since in the case that A is real-valued, AH = AT . On the other hand, one example illustrates that complex-symmetric matrices are not Hermitian. Theorem 5.3. Suppose that A is Hermitian. Then all the eigenvalues of A are real.



.

“notes2” 2013/4/9 page 74 i

i

i

i

5.1. Diagonalization of Hermitian Matrices

75

Proof. . Suppose that Ax = λx for (λ, x) an eigenpair of A. Multiply both sides of the eigen-equation by xH . (Recall that by definition of an eigenvector, x 6= 0.) Then we have xH Ax

= =

xH λx λxH x

xH AH x (Ax)H x

= =

λkxk22 λkxk22

(λx)H x = λkxk22 2 ¯ λkxk 2 = λkxk2 ¯ =λ λ and the last equality implies that λ must be real. Recall that A is diagonalizable (over Cn ) only when A has a set of n linearly independent eigenvectors. We will show that Hermitian matrices are always diagonalizable, and that furthermore, that the eigenvectors have a very special relationship. Theorem 5.4. If A is Hermitian, then any two eigenvectors from different eigenspaces are orthogonal in the standard inner-product for Cn (Rn , if A is real symmetric). Proof. Let v1 , v2 be two eigenvectors that belong to two distinct eigenvalues, say λ1 , λ2 , respectively. We need to show that vH 1 v2 = 0. Since this is true iff H vH (λ v ) = λ (v )v = 0, let us start there: 2 2 2 2 1 1 H vH 1 (λ2 v2 ) = v1 (Av2 ) H H = v1 A v2

= (Av1 )H v2 = (λ1 v1 )H v2 = λ¯1 vH v2 1

= λ1 vH 1 v2 , where the last equality follows using the previous theorem. It follows that (λ2 − H λ1 )vH 1 v2 = 0. But we assumed that λ2 6= λ1 , it must be the case v1 v2 = 0, as desired.

Definition 5.5. A real, square matrix A is said to be orthogonally diagonalizable if there exists an orthogonal matrix Q and diagonal matrix D such that A = QDQT . Definition 5.6. A matrix A ∈ Cn×n is called unitarily diagonalizable if there

“notes2” 2013/4/9 page 75 i

i

i

i

76

Chapter 5. Symmetric and Hermitian Matrices

exists a unitary matrix U and diagonal matrix D such that A = UDUH . Theorem 5.7 (Spectral Theorem). Let A be Hermitian. Then A is unitarily diagonalizable. Proof. Let A have Jordan decomposition A = WJW−1 . Since W is square, we can factor (see beginning of this chapter) W = QR where Q is unitary and R is upper triangular. Thus, A = QRJR−1 QH = QTQH where T is upper triangular because it is the product of upper triangular matrices13 , and Q is unitary14 . So A is unitarily similar to an upper triangular matrix T, and we may pre-multiply by QH and post-multiply by Q to obtain QH AQ = T. Taking the conjugate transpose of both sides, QH AH Q = TH However, A = AH and so we get T = TH . But T was upper triangular, and this can only happen if T is diagonal. Thus A = QDQH as desired.

Corollary 5.8. In summary, if A is n×n Hermitian, it has the following properties: • A has n real eigenvalues, counting multiplicities. • The algebraic and geometric mulitplicites of each distinct eigenvalue match. • The eigenspaces are mutually orthogonal in teh sense that eigenvectors correspoinidng to different eignevalues are orthogonal. • A is unitarily diagonalizable. Exercise 5.1. Let A and B both be orthogonally diagonalizable real matrices. a) Show A and B are symmetric b) Show that if AB = BA, then AB is orthogonally diagonalizable. 13 As mentioned elsewhere in the text, it is straightforward to show the product of upper triangular matrices is upper triangular. It is likewise straightforward to show that the inverse of an upper triangular matrix is upper triangular, so the expression RJR−1 is the product of 3 upper triangular matrices and is upper triangular 14 This is in fact the Schur factorization.

“notes2” 2013/4/9 page 76 i

i

i

i

5.1. Diagonalization of Hermitian Matrices

5.1.1

77

Spectral Decomposition

Definition 5.9. The set of eigenvalues of a matrix A is sometimes called the spectrum of A. The spectral radius is the largest magnitude eigenvalue of A. We know that if A is Hermitian, A = QDQH , so let us write triple matrix product out explictly:   H  λ1 0 · · · 0 q1 H     0 λ 0  2   q2  A = q1 , . . . , qn   .   0 . . . . . . . . .   ..  0



  = [λ1 q1 , · · · , λn qn   =

n X

···  H q1  qH 2  ..  . 

0

λn

qH n

qH n

λi qi qH i

i=1

This expression for A is called the spectral decomposition of A. Note that H each qi qH i is a rank-one matrix AND that each qi qi is an orthogonal projection matrix onto Span(qi ).

5.1.2

Positive Definite, Negative Definitie, Indefinite

Definition 5.10. Let A be a real symmetric matrix. We say that A is also positive definite if for every non-zero x ∈ Rn , xT Ax > 0. A similar result holds for Hermitian matrices Definition 5.11. Let A be a complex Hermitian matrix. We say that A is also positive definite if for every non-zero x ∈ CN , xH Ax > 0. A useful consequence of HPD (SPD) matrices is that their eigenvalues (which we already know are real due to the Hermitian property) must be NON-NEGATIVE. Therefore, HPD (SPD) matrices MUST BE INVERTIBLE! Theorem 5.12. A Hermitian (symmetric) matrix with all positive eigenvalues must be positive definite. Proof. Since from the previous section we know A = QDQH exists, from what we are given, the entries of D must be positive. Let x 6= 0. Then xH Ax = xH QDQH x

“notes2” 2013/4/9 page 77 i

i

i

i

78

Chapter 5. Symmetric and Hermitian Matrices = (QH x)H D(QH x) = zH Dz n X = λi |zi |2 . i=1

Since the λi > 0 and z := QH x cannot be zero (why?), the result follows.

 4 1 . Compute the eigenvalues, observe they are 1 2 both positive. By the previous theorem, this matrix is SPD. Example 5.13 Let A =



Exercise 5.2. Let A be HPD. Show < q, z >:= zH Aq defines a valid inner product on Cn . A close cousin is the positive semi-definite matrix. Definition 5.14. A Hermitian (symmetric) matrix is semi-definite if for every non-zero x ∈ Cn (x ∈ Rn ), xH Ax ≥ 0. We also have the concept of negative-definite matrices. Definition 5.15. If A is Hermitian, then it is negative definite if for every non-zero x ∈ Cn , xH Ax < 0. A negative definite Hermitian (symmetric) matrix must have all strictly negative eigenvalues. So it, too, is invertible. A symmetric (Hermitian) indefinte matrix is one that has some positive and some negative (and possibly zero) eigenvalues.

5.2

Quadratic Forms

A motivating quote from David Lay’s Third Ed., Linear Algebra and Its Applications: quadratic forms, occur frequently in applications of linear algebra to engineering (in design criteria and optimization) and signal processing (as output noise power). They also arise, for example, in physics (as potential and kinetic energy) differential geometry (as normal curvature of surfaces), economics (as utility functions), and statistics (in confidence ellipsoids).

“notes2” 2013/4/9 page 78 i

i

i

i

5.2. Quadratic Forms

79

In fact, you saw a quadratic form already in the definitions of the previous subsection. Definition 5.16. A quadratic form on Rn (Cn ) is a function Q : Rn → R (Q : Cn → R) defined as follows: Q(x) = xT Ax

(Q(x) = xH Ax)

where A is an n × n symmetric matrix (Hermitian matrix). Here, A is called the matrix of the quadratic form. Example 5.17 A =



5 0

0 4

 . Compute the quadratic form. xT Ax = 5x21 + 4x22 .

 5 −1 . If we compute the quadratic form here, there −1 4 are cross terms due to the presence of non-zero off diagonals: Example 5.18 A =



xT Ax = 5x21 − 2x1 x2 + 4x22 .

In the 2nd example, it’s difficult to see whether or not this quadratic term will always give something that’s positive. However, there is an easy way to investigate the possibility (well, easy provided someone has handed you the eigendecomposition!): Theorem 5.19. Let A be an n×n Hermitian (symmetric). Then there is a unitary (orthogonal) change of variable of the form x = Qy that transforms the quadratic form xH Ax (xT Ax) into a quadratic form yH Dy (xH Dx) where the latter has no cross product term.

5.2.1

Geometry and Principal Axes

In R2 , we can get some geometric intuition of the quadratic form for a symmetric A by looking at its level sets. Consider f (x) = xT Ax as a map from R2 to R. Now consider Wc = {x ∈ R2 |xT Ax = c} for a fixed, real, constant c. The set Wc is called the c-level set of the quadratic function f (x). One of the following will occur: • The c-level set will be an ellipse (or circle, if both semi-axes have the same length) • The c-level set will be a hyperbola • The c-level set will be one or two lines, or a single point, or contain no points at all

“notes2” 2013/4/9 page 79 i

i

i

i

80

a11 x21

Chapter 5. Symmetric and Hermitian Matrices To see this, we start with A being a 2×2 real, diagonal matrix. Then xT Ax = + a22 x22 . Let’s assume that A is invertible (no 0’s on the diagonal). Then a11 x21 + a22 x22 = c ⇒

a11 2 a22 2 x + x = 1. c 1 c 2

(5.1)

Consider first the case that a11 > 0, a22 > 0. Note that in this case A is symmetric positive definite. Then if c > 0, ac11 = α12 for some positive α and a22 1 √ 1 . Thus the rightmost c = β 2 for some positive β. For example, if c = 1, α = a11 equation in (5.1) is in fact the equation for an ellipse centered at the origin, with α being the length of the semi-axis oriented on the x1 axis, and β be being the length of the semi-axis oriented in the vertical x2 component. On the other hand, if c = 0 and a11 , a22 are both positive, there are no solutions to the equation; i.e. the -2 level set would contain no points at all for instance. If A is negative definite, a11 < 0, a22 < 0. If you look at a level set for which c < 0, the same analysis as above goes through - the picture will be an ellipse. Now, without loss of generality, assume that a11 > 0 but a22 < 0. This means that A is indefinite. Looking back at (5.1), it can be rewritten x21 x2 − 22 = 1, 2 α β

α > 0, β > 0

for α as above and β 2 = |ac22 | . The former is an equation for a hyperbola centered at the origin opening on the left and right. To draw it, you draw a rectangle centered at the origin extending to (α, 0) on the x1 axis and (−α, 0) the other direction, (0, β) and (0, −β) on the x2 axis, then 2 diagonal lines through the origin and the corners of the rectangle give the outline for the hyperbola. (obviously if a11 < 0 and a22 > 0 you will also get a hyperbola opening up and down.) And if one of the diagonal elements is zero (assume without loss of generality that a22 = 0), the level set equation reduces to x21 = ac11 . If c = 0, then the line x1 = 0 is a solution. Take c be non-zero q and to have the same sign as a11 . Then the solution to the equation is x1 = ± ac11 , so we get 2 parallel lines for the level curves. If both diagonal elements are 0, and 0 is 0, the level curve is the point (0,0). Now the interesting question: What if A is not diagonal, but it is symmetric? The analysis proceeds easily if we use a change of coordinates provided by the eigenvectors of A. Since A is symmetric, A = QDQT . So xT Ax = yT Dy where y = QT x. So if we used the y1 , y2 coordinate system, the matrix is diagonal in that coordinate system, so shape-wise, the analysis goes through as above. We want to convince ourselves that the level set picture (ellipse, hyperbola, etc) when sketched in x1 , x2 space, though, should be the same, just aligned with q1 , q2 (which are orthonormal!) as the coordinate axes.   1 Now since y = QT x, Qy = x. Let y = , whose span represents the 0 “horizontal” axis in (y1 , y2 ) space, which, by the analysis above, is one  of the  2 x1 principle axes for representing the level curves in that space. Then = x2

“notes2” 2013/4/9 page 80 i

i

i

i

5.2. Quadratic Forms

 ; so the the corresponding axis in (x1 , x2 ) space must be q1 .    0 x1 Similarly, if y = , the vertical direction in (y1 , y2 ) space, = Qy = q2 , 1 x2 so the corresponding axes in (x1 , x2 ) space must be q2 . See boardwork during in class lecture for pictures. Finally, now that we have level curves, we can draw a 3D picture by setting z = f (x) and plotting (x1 , x2 , z). Verify the following: Qy = q1 =



81

q11 q 21

• When A is SPD, the surface is bowl shaped, concave up, with a unique minimum. • When A is symmetric negative definite, the surface is bowl shaped, concave down, with a unique maximum. • When A is symmetric indefinite (no zero evalues) we get a saddle shape, concave up or down. • When A is only symmetric but singular, you get a paraboloid (an entire valley of points that are all maxima or minima) We can now use these pictures and the intuition developed by looking at them to discuss optimization.

5.2.2

Constrained Optimization; Optimality Characterization of Eigenvalues

In Calculus III, you saw pictures of quadratic forms before, but they were developed without the matrix theory behind them. The test for existence of (global) min is the same as determining if A is SPD. For global max, it’s the same as checking for negative definite. Often in practice, we are looking for a max or a min value over a restricted set of points in Rn (we could do this for Cn too of course, but the graphical intuition is for R2 , R3 so we will constrain the discussion to the real case here.) We motivate our discussion by the following example Example 5.20 Find the maximum and minimum values of f (x) = xT Dx, with D a diagonal matrix with entries 7, 5, 4, respectively, subject to the constraint kxk2 = 1. (So how large, small, value(s) can the quadratic form take on over all vectors in R3 with unit length). Note that this means D is symmetric positive definite. We have f (x) = 7x21 + 5x22 + 4x23 ≤ 7x21 + 7x22 + 7x23 = 7(x21 + x22 + x23 ) =7

“notes2” 2013/4/9 page 81 i

i

i

i

82

Chapter 5. Symmetric and Hermitian Matrices

where the last equality follows from the constraint. So f (x) ≤ 7. Can it ever equal 7? Yes. Set x = (1, 0, 0) (admissible since it has unit length), and the upper bound is achieved exactly. Therefore, 7 is the max value of the quadratic form when kxk2 = 1. Similarly, f (x) = 7x21 + 5x22 + 4x23 ≥ 4x21 + 4x22 + 4x23 = 4(x21 + x22 + x23 ) =4

where again the last equality follows from the constraint. To see that this lower bound can be achieved, substitute x = (0, 0, 1) (admissible since it has unit length. The key observation is this: the max and min of the quadratic form over all unit length vectors correspond to the maximum and minimum eigenvalues of this positive definite matrix.. The following example can be illustrated graphically since we restrict ourselves to a 2-d matrix (2 variable quadratic form) and therefore a 3D picture.   6 0 Example 5.21 Let f (x) = xT Dx where D = . Define g(x) = kxk22 . 0 3 Now the set of all unit-length vectors satisfies g(x) = x21 + x22 = 1 (unit circle in 2D), which corresponds to the points on the surface (x1 , x2 , 1) (you get a cylinder in 3D). Finding the min and max of the quadratic subject to the unit length constraint is a problem of finding the highest and lowest points on the intersection curve of those two surfaces. Proceeding logically as in the preceding example, we find the max to be 6 and the min to be 3. The vectors for which the max and min values are achieved are ±(1, 0) (2 maxes) and ±(0, 1) (2 mins) respectively. A key take away is that since D was diagonal, the max and min values occur at the (normalized) eigenvector directions, respectively. These two examples illustrate something about symmetric matrices (not just SPD ones) that is true more generally. Theorem 5.22. Let A be a symmetric matrix. Define a = min{xT Ax|kxk2 = 1},

b = max{xT Ax|kxk2 = 1}.

Then every eigenvalue λ of A satisfies a ≤ λ ≤ b. Specifically, a is the minimum eigenvalue (label it λn ) of A and b is the maximum eigenvalue (label it λ1 ) of A. Moreover, b is attained for any (normalized) eigenvector corresponding to λ1 and a is attained for any (normalized) eigenvector corresponding to λn .

“notes2” 2013/4/9 page 82 i

i

i

i

5.2. Quadratic Forms

83

Proof. We’ll sketch the proof here. It’s along the lines of the first two examples. We just need to change into the correct coordinate system, and show that doing so doesn’t change the length of the vectors in question. Recall xT Ax = yT Dy, y = QT x( equivalently x = Qy). Here, assume that then entries in D are ordered from largest to smallest (not in magnitude, but in actual value, including sign.) q p √ T kxk2 = x x = yT QT Qy = yT y = kyk2 , since Q is an orthogonal matrix. Therefore the quadratic form can be represented in either coordinate system, it assumes the same set of values when x, y are allowed to vary over the set of all unit vectors. Now we are able to use the same trick as in the first example to deduce the max and min of the quadratic function as represented in the diagonal coordinate system. We are able to deduce (along the lines of the 2nd example) that the max and min are achevied when y = ±e1 and y = ±en due to the ordering of the entries in D that was assumed. But when y = ±e1 , x = Qy = ±q1 and y = ±en , x = Qy = ±qn . Let a, b as defined above. Recall q1 , qn are orthogonal. for α ∈ [0, 1]. Verify first of all then that√c ∈ [a,√ b]. Now let √Let c = (1−α)a+αb √ x = 1 − αq1 + αqn . kxk22 = xT x = (1−α)qT1 q1 +(α)qT2 q2 +2 1 − α αqT1 q2 = (1 − α) + α + 0. Also xT Ax = c. So for each number c between a and b, there is a unit vector x such that c = xT Ax, which shows that the set of all possible values of xT Ax for kxk2 = 1 is a closed interval on the real axis. Finally, to show that every λ ∈ [a, b], it suffices to find x so that λ = xT Ax. But this is done by choosing x to be any of the other n − 2 eigenvectors.

5.2.3

Motivation for Chapter 6....

Suppose that B ∈ Cm×n (Rm×n ). Certainly if m 6= n, it doesn’t even make sense to talk about B being Hermitian (symmetric), let alone positive (semi)-definite. On the other hand, BH B which is (n × n) and BBH (which is m × m) are Hermitian (symmetric, in real case) – this is easy to establish. They are also at least positive semi-definite, if not positive definite – the proof is left as an exercise. And therefore each of these matrices is unitarily (orthogonally) diagonalizable, with real, non-negative eigenvalues. We mention this here because it is an excellent motivation for considering if one can use the above fact to derive a “nice” way to “diagonalize” (it’s in quotes because the diagonal term is not necessarily square if m 6= n) the m × n matrix B. Read on to Chapter 6 for the answer....

“notes2” 2013/4/9 page 83 i

i

i

i

84

Chapter 5. Symmetric and Hermitian Matrices

“notes2” 2013/4/9 page 84 i

i

i

i

Chapter 6

Singular Value Decomposition

In this section, we again appeal to the fact that every linear transformation can be identified with a corresponding matrix of the transformation, A. So it sufficies to study what’s going on with these matrices.

6.1

Matrix 2-norm and Frobenius norm

Previously, we defined norms in terms of inner products on vector spaces. The space C m×n is also a vector space. Consider the following inner product definition: =

n m X X

¯ ij Aij = B

i=1 j=1

n X

B∗:,j A:,j ,

j=1

so it’s basically the sum of the inner products of corresponding columns of the matrices A and B. Exercise 6.1. Show that this is a valid inner product. Exercise 6.2. The trace of a matrix is the sum of the diagonal entries in the matrix. Show that = trace(A∗ A). PP In particular, we have that = |Aij |2 . From this, we will define the Frobenius norm: qX X p kAkF = = |Aij |2 .

This is slightly annoying when A is dense, but if the matrix is diagonal, the double sum collapses to a single sum. It is also possible to use standard vector norms to induce matrix norms. In this class, we will consider only one such example, called the matrix 2-norm. Define kAk2 = maxkxk2 =1 kAxk2 . Since x has unit length, it is on the unit ball. Multiplication by A is a linear transformation. Thus, the 2-norm measures 85

“notes2” 2013/4/9 page 85 i

Suggest Documents