Chapter 11. Eigensystems http://www.nr.com or call 11.0 Introduction 11-800-72 Cambridge An N x N matrix A is said to have an eigenvector x and corresponding NUMERICAL RECIPES IN eigenvalue入if A·X=入x (11.0.1) compu Press Obviously any multiple of an eigenvector x will also be an eigenvector,but we won't consider such multiples as being distinct eigenvectors.(The zero vector is not considered to be an eigenvector at all.)Evidently (11.0.1)can hold only if SCIENTIFIC detA-λi=0 (11.0.2) 6 which,if expanded out,is an Nth degree polynomial in A whose roots are the eigen- COMPUTING values.This proves that there are always N (not necessarily distinct)eigenvalues. Equal eigenvalues coming from multiple roots are called degenerate.Root-searching r Numerical 188-1892 Further in the characteristic equation(11.0.2)is usually a very poor computational method for finding eigenvalues.We will learn much better ways in this chapter,as well as Recipes efficient ways for finding corresponding eigenvectors. The above two equations also prove that every one of the N eigenvalues has Recipes a (not necessarily distinct)corresponding eigenvector:If A is set to an eigenvalue. then the matrix A-AI is singular,and we know that every singular matrix has at (outside least one nonzero vector in its nullspace (see 82.6 on singular value decomposition). North Software. If you add rx to both sides of(11.0.1),you will easily see that the eigenvalues of any matrix can be changed or shified by an additive constant r by adding to the matrix that constant times the identity matrix.The eigenvectors are unchanged by America). visit website this shift.Shifting,as we will see,is an important part of many algorithms for machine- computing eigenvalues.We see also that there is no special significance to a zero eigenvalue.Any eigenvalue can be shifted to zero,or any zero eigenvalue can be shifted away from zero. 456
Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Chapter 11. Eigensystems 11.0 Introduction An N × N matrix A is said to have an eigenvector x and corresponding eigenvalue λ if A · x = λx (11.0.1) Obviously any multiple of an eigenvector x will also be an eigenvector, but we won’t consider such multiples as being distinct eigenvectors. (The zero vector is not considered to be an eigenvector at all.) Evidently (11.0.1) can hold only if det |A − λ1| =0 (11.0.2) which, if expanded out, is an Nth degree polynomial in λ whose roots are the eigenvalues. This proves that there are always N (not necessarily distinct) eigenvalues. Equal eigenvalues coming from multiple roots are called degenerate. Root-searching in the characteristic equation (11.0.2) is usually a very poor computational method for finding eigenvalues. We will learn much better ways in this chapter, as well as efficient ways for finding corresponding eigenvectors. The above two equations also prove that every one of the N eigenvalues has a (not necessarily distinct) corresponding eigenvector: If λ is set to an eigenvalue, then the matrix A − λ1 is singular, and we know that every singular matrix has at least one nonzero vector in its nullspace (see §2.6 on singular value decomposition). If you add τx to both sides of (11.0.1), you will easily see that the eigenvalues of any matrix can be changed or shifted by an additive constant τ by adding to the matrix that constant times the identity matrix. The eigenvectors are unchanged by this shift. Shifting, as we will see, is an important part of many algorithms for computing eigenvalues. We see also that there is no special significance to a zero eigenvalue. Any eigenvalue can be shifted to zero, or any zero eigenvalue can be shifted away from zero. 456
11.0 Introduction 457 Definitions and Basic Facts A matrix is called symmetric if it is equal to its transpose, A=AT or ai订=0ji (11.0.3) It is called Hermitian or self-adjoint if it equals the complex-conjugate of its transpose (its Hermitian conjugate,denoted by "" A-AT ai财=aji* (11.0.4) It is termed orthogonal if its transpose equals its inverse, 菲 过昌分墨子 AT.A=A·AT=1 (11.0.5) ICAL and unitary if its Hermitian conjugate equals its inverse.Finally,a matrix is called normal if it commutes with its Hermitian conjugate, A·At=At.A (11.0.6 ,令。 9 For real matrices,Hermitian means the same as symmetric,unitary means the same as orthogonal,and both of these distinct classes are normal. The reason that"Hermitian"is an important concept has to do with eigenvalues. The eigenvalues of a Hermitian matrix are all real.In particular,the eigenvalues of a real symmetric matrix are all real.Contrariwise,the eigenvalues of a real 92是69 nonsymmetric matrix may include real values,but may also include pairs of complex conjugate values;and the eigenvalues of a complex matrix that is not Hermitian will in general be complex. The reason that"normal"is an important concept has to do with the eigen- vectors.The eigenvectors of a normal matrix with nondegenerate (i.e.,distinct) eigenvalues are complete and orthogonal,spanning the N-dimensional vector space. For a normal matrix with degenerate eigenvalues,we have the additional freedom of replacing the eigenvectors corresponding to a degenerate eigenvalue by linear com- ōoa Numerica 10621 binations of themselves.Using this freedom,we can always perform Gram-Schmidt orthogonalization(consult any linear algebra text)and find a set of eigenvectors that 43106 are complete and orthogonal,just as in the nondegenerate case.The matrix whose columns are an orthonormal set of eigenvectors is evidently unitary.A special case is that the matrix of eigenvectors of a real,symmetric matrix is orthogonal,since the eigenvectors of that matrix are all real. When a matrix is not normal,as typified by any random,nonsymmetric,real matrix,then in general we cannot find any orthonormal set of eigenvectors,nor even any pairs of eigenvectors that are orthogonal (except perhaps by rare chance).While the N non-orthonormal eigenvectors will "usually"span the N-dimensional vector space,they do not always do so;that is,the eigenvectors are not always complete. Such a matrix is said to be defective
11.0 Introduction 457 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Definitions and Basic Facts A matrix is called symmetric if it is equal to its transpose, A = AT or aij = aji (11.0.3) It is called Hermitian orself-adjoint if it equals the complex-conjugateof its transpose (its Hermitian conjugate, denoted by “†”) A = A† or aij = aji* (11.0.4) It is termed orthogonal if its transpose equals its inverse, AT · A = A · AT = 1 (11.0.5) and unitary if its Hermitian conjugate equals its inverse. Finally, a matrix is called normal if it commutes with its Hermitian conjugate, A · A† = A† · A (11.0.6) For real matrices, Hermitian means the same as symmetric, unitary means the same as orthogonal, and both of these distinct classes are normal. The reason that “Hermitian” is an important concept has to do with eigenvalues. The eigenvalues of a Hermitian matrix are all real. In particular, the eigenvalues of a real symmetric matrix are all real. Contrariwise, the eigenvalues of a real nonsymmetric matrix may include real values, but may also include pairs of complex conjugate values; and the eigenvalues of a complex matrix that is not Hermitian will in general be complex. The reason that “normal” is an important concept has to do with the eigenvectors. The eigenvectors of a normal matrix with nondegenerate (i.e., distinct) eigenvalues are complete and orthogonal, spanning the N-dimensional vector space. For a normal matrix with degenerate eigenvalues, we have the additional freedom of replacing the eigenvectors corresponding to a degenerate eigenvalue by linear combinations of themselves. Using this freedom, we can always perform Gram-Schmidt orthogonalization (consult any linear algebra text) and find a set of eigenvectors that are complete and orthogonal, just as in the nondegenerate case. The matrix whose columns are an orthonormal set of eigenvectors is evidently unitary. A special case is that the matrix of eigenvectors of a real, symmetric matrix is orthogonal, since the eigenvectors of that matrix are all real. When a matrix is not normal, as typified by any random, nonsymmetric, real matrix, then in general we cannot find any orthonormal set of eigenvectors, nor even any pairs of eigenvectors that are orthogonal (except perhaps by rare chance). While the N non-orthonormal eigenvectors will “usually” span the N-dimensional vector space, they do not always do so; that is, the eigenvectors are not always complete. Such a matrix is said to be defective
458 Chapter 11. Eigensystems Left and Right Eigenvectors While the eigenvectors of a non-normal matrix are not particularly orthogonal among themselves,they do have an orthogonality relation with a different set of vectors,which we must now define.Up to now our eigenvectors have been column vectors that are multiplied to the right of a matrix A,as in (11.0.1).These,more explicitly,are termed right eigemvectors.We could also,however,try to find row vectors,which multiply A to the left and satisfy 三 81 X·A=λx (11.0.7) These are called left eigemvectors.By taking the transpose of equation(11.0.7),we 虽 see that every left eigenvector is the transpose of a right eigenvector of the transpose 淡茶 of A.Now by comparing to(11.0.2),and using the fact that the determinant of a matrix equals the determinant of its transpose,we also see that the left and right eigenvalues of A are identical. If the matrix A is symmetric,then the left and right eigenvectors are just transposes of each other,that is,have the same numerical values as components. 9 Likewise,if the matrix is self-adjoint,the left and right eigenvectors are Hermitian conjugates of each other.For the general nonnormal case,however,we have the following calculation:Let XR be the matrix formed by columns from the right eigenvectors,and XL be the matrix formed by rows from the left eigenvectors.Then (11.0.1)and (11.0.7)can be rewritten as 是o A·XR=XR·diag(1..λw)XL·A=diag(A1..λw)·Xz (11.0.8 6 Multiplying the first of these equations on the left by XL,the second on the right by XR,and subtracting the two,gives (XL·Xr)·diag(A.λw)=diag(..λw)(XL·Xr) (11.0.9) This says that the matrix of dot products of the left and right eigenvectors commutes 10621 with the diagonal matrix of eigenvalues.But the only matrices that commute with a Numerica diagonal matrix ofdistinct elements are themselves diagonal.Thus,if the eigenvalues E 4310 are nondegenerate,each left eigenvector is orthogonal to all right eigenvectors except its corresponding one,and vice versa.By choice of normalization,the dot products (outside ecipes of corresponding left and right eigenvectors can always be made unity for any matrix with nondegenerate eigenvalues. North If some eigenvalues are degenerate,then either the left or the right eigenvec- tors corresponding to a degenerate eigenvalue must be linearly combined among themselves to achieve orthogonality with the right or left ones,respectively.This can always be done by a procedure akin to Gram-Schmidt orthogonalization.The normalization can then be adjusted to give unity for the nonzero dot products between corresponding left and right eigenvectors.If the dot product of corresponding left and right eigenvectors is zero at this stage,then you have a case where the eigenvectors are incomplete!Note that incomplete eigenvectors can occur only where there are degenerate eigenvalues,but do not always occur in such cases (in fact,never occur for the class of"normal"matrices).See [1]for a clear discussion
458 Chapter 11. Eigensystems Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Left and Right Eigenvectors While the eigenvectors of a non-normal matrix are not particularly orthogonal among themselves, they do have an orthogonality relation with a different set of vectors, which we must now define. Up to now our eigenvectors have been column vectors that are multiplied to the right of a matrix A, as in (11.0.1). These, more explicitly, are termed right eigenvectors. We could also, however, try to find row vectors, which multiply A to the left and satisfy x · A = λx (11.0.7) These are called left eigenvectors. By taking the transpose of equation (11.0.7), we see that every left eigenvector is the transpose of a right eigenvector of the transpose of A. Now by comparing to (11.0.2), and using the fact that the determinant of a matrix equals the determinant of its transpose, we also see that the left and right eigenvalues of A are identical. If the matrix A is symmetric, then the left and right eigenvectors are just transposes of each other, that is, have the same numerical values as components. Likewise, if the matrix is self-adjoint, the left and right eigenvectors are Hermitian conjugates of each other. For the general nonnormal case, however, we have the following calculation: Let XR be the matrix formed by columns from the right eigenvectors, and XL be the matrix formed by rows from the left eigenvectors. Then (11.0.1) and (11.0.7) can be rewritten as A · XR = XR · diag(λ1 ...λN ) XL · A = diag(λ1 ...λN ) · XL (11.0.8) Multiplying the first of these equations on the left by XL, the second on the right by XR, and subtracting the two, gives (XL · XR) · diag(λ1 ...λN ) = diag(λ1 ...λN ) · (XL · XR) (11.0.9) This says that the matrix of dot products of the left and right eigenvectors commutes with the diagonal matrix of eigenvalues. But the only matrices that commute with a diagonal matrix of distinct elements are themselves diagonal. Thus, if the eigenvalues are nondegenerate, each left eigenvector is orthogonal to all right eigenvectors except its corresponding one, and vice versa. By choice of normalization, the dot products of corresponding left and right eigenvectors can always be made unity for any matrix with nondegenerate eigenvalues. If some eigenvalues are degenerate, then either the left or the right eigenvectors corresponding to a degenerate eigenvalue must be linearly combined among themselves to achieve orthogonality with the right or left ones, respectively. This can always be done by a procedure akin to Gram-Schmidt orthogonalization. The normalization can then be adjusted to give unity for the nonzero dot products between corresponding left and right eigenvectors. If the dot product of corresponding left and right eigenvectors is zero at this stage, then you have a case where the eigenvectors are incomplete! Note that incomplete eigenvectors can occur only where there are degenerate eigenvalues, but do not always occur in such cases (in fact, never occur for the class of “normal” matrices). See [1] for a clear discussion
11.0 Introduction 459 In both the degenerate and nondegenerate cases,the final normalization to unity of all nonzero dot products produces the result:The matrix whose rows are left eigenvectors is the inverse matrix of the matrix whose columns are right eigenvectors,if the inverse exists. Diagonalization of a Matrix Multiplying the first equation in (11.0.8)by XL,and using the fact that XL and XR are matrix inverses,we get XR·A·Xr=diag(1.λw) (11.0.10) B This is a particular case of a similarity transform of the matrix A, 酒 A→Z-1.A·Z (11.0.11) RECIPES for some transformation matrix Z.Similarity transformations play a crucial role in the computation of eigenvalues,because they leave the eigenvalues of a matrix 9 unchanged.This is easily seen from detZ-1·A·Z-1=detZ-1.(A-A1)·Z det Z det A-A1]detZ (11.0.12) =detA-λI IENTIFIC Equation(11.0.10)shows that any matrix with complete eigenvectors(which includes all normal matrices and "most"random nonnormal ones)can be diagonalized by a similarity transformation,that the columns of the transformation matrix that effects the diagonalization are the right eigenvectors,and that the rows of its inverse are the left eigenvectors. 9 For real,symmetric matrices,the eigenvectors are real and orthonormal,so the transformation matrix is orthogonal.The similarity transformation is then also an Recipes Numerica 10.621 orthogonal transformation of the form 431 A→Z.A.Z (11.0.13) (outside Recipes While real nonsymmetric matrices can be diagonalized in their usual case of complete eigenvectors,the transformation matrix is not necessarily real.It turns out,however. that a real similarity transformation can"almost"do the job.It can reduce the matrix down to a form with little two-by-two blocks along the diagonal,all other elements zero.Each two-by-two block corresponds to a complex-conjugate pair of complex eigenvalues.We will see this idea exploited in some routines givenlater in the chapter. The"grand strategy"of virtually all modern eigensystem routines is to nudge the matrix A towards diagonal form by a sequence of similarity transformations, A→P1.A.P1→P21.P1.AP1·P2 (11.0.14) →P51.P21.P1.A·P1·P2·P3→etc
11.0 Introduction 459 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). In both the degenerate and nondegenerate cases, the final normalization to unity of all nonzero dot products produces the result: The matrix whose rows are left eigenvectors is the inverse matrix of the matrix whose columns are right eigenvectors, if the inverse exists. Diagonalization of a Matrix Multiplying the first equation in (11.0.8) by X L, and using the fact that XL and XR are matrix inverses, we get X−1 R · A · XR = diag(λ1 ...λN ) (11.0.10) This is a particular case of a similarity transform of the matrix A, A → Z−1 · A · Z (11.0.11) for some transformation matrix Z. Similarity transformations play a crucial role in the computation of eigenvalues, because they leave the eigenvalues of a matrix unchanged. This is easily seen from det Z−1 · A · Z − λ1 = det Z−1 · (A − λ1) · Z = det |Z| det |A − λ1| det Z−1 = det |A − λ1| (11.0.12) Equation (11.0.10) shows that any matrix with complete eigenvectors (which includes all normal matrices and “most” random nonnormal ones) can be diagonalized by a similarity transformation, that the columns of the transformation matrix that effects the diagonalization are the right eigenvectors, and that the rows of its inverse are the left eigenvectors. For real, symmetric matrices, the eigenvectors are real and orthonormal, so the transformation matrix is orthogonal. The similarity transformation is then also an orthogonal transformation of the form A → ZT · A · Z (11.0.13) While real nonsymmetric matrices can be diagonalized in their usual case of complete eigenvectors, the transformation matrix is not necessarily real. It turns out, however, that a real similarity transformation can “almost” do the job. It can reduce the matrix down to a form with little two-by-two blocks along the diagonal, all other elements zero. Each two-by-two block corresponds to a complex-conjugate pair of complex eigenvalues. We will see this idea exploited in some routines given later in the chapter. The “grand strategy” of virtually all modern eigensystem routines is to nudge the matrix A towards diagonal form by a sequence of similarity transformations, A → P−1 1 · A · P1 → P−1 2 · P−1 1 · A · P1 · P2 → P−1 3 · P−1 2 · P−1 1 · A · P1 · P2 · P3 → etc. (11.0.14)
460 Chapter 11.Eigensystems If we get all the way to diagonal form,then the eigenvectors are the columns of the accumulated transformation XR=P1·P2·P3 (11.0.15) Sometimes we do not want to go all the way to diagonal form.For example,if we are interested only in eigenvalues,not eigenvectors,it is enough to transform the matrix A to be triangular,with all elements below(or above)the diagonal zero.In this case the diagonal elements are already the eigenvalues,as you can see by mentally 81 evaluating (11.0.2)using expansion by minors. There are two rather different sets of techniques for implementing the grand strategy (11.0.14).It turns out that they work rather well in combination,so most modern eigensystem routines use both.The first set of techniques constructs individ- ual Pi's as explicit"atomic"transformations designed to perform specific tasks,for example zeroing a particular off-diagonal element (Jacobi transformation,811.1),or a whole particular row or column(Householder transformation,811.2:elimination method,811.5).In general,a finite sequence of these simple transformations cannot g子p% 令 completely diagonalize a matrix.There are then two choices:either use the finite sequence of transformations to go most of the way (e.g.,to some special form like tridiagonal or Hessenberg,see $11.2 and $11.5 below)and follow up with the second set of techniques about to be mentioned:or else iterate the finite sequence of simple transformations over and over until the deviation of the matrix from diagonal is negligibly small.This latter approach is conceptually simplest,so we will discuss 三兰。合 it in the next section;however,for N greater than~10,it is computationally inefficient by a roughly constant factor ~5. The second set of techniques,called factorization methods,is more subtle. 61 Suppose that the matrix A can be factored into a left factor FL and a right factor FR.Then A=FL·FR or equivalently F·A=FR (11.0.16) c5y\Nw 10.621 If we now multiply back together the factors in the reverse order,and use the second Numerica equation in (11.0.16)we get 43106 Fr·FL=F·A·FL (11.0.17) North which we recognize as having effected a similarity transformation on A with the transformation matrix being FL!In $11.3 and $11.6 we will discuss the OR method which exploits this idea. Factorization methods also do not converge exactly in a finite number of transformations.But the better ones do converge rapidly and reliably,and,when following an appropriate initial reduction by simple similarity transformations,they are the methods of choice
460 Chapter 11. Eigensystems Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). If we get all the way to diagonal form, then the eigenvectors are the columns of the accumulated transformation XR = P1 · P2 · P3 · ... (11.0.15) Sometimes we do not want to go all the way to diagonal form. For example, if we are interested only in eigenvalues, not eigenvectors, it is enough to transform the matrix A to be triangular, with all elements below (or above) the diagonal zero. In this case the diagonal elements are already the eigenvalues, as you can see by mentally evaluating (11.0.2) using expansion by minors. There are two rather different sets of techniques for implementing the grand strategy (11.0.14). It turns out that they work rather well in combination, so most modern eigensystem routines use both. The first set of techniques constructs individual Pi’s as explicit “atomic” transformations designed to perform specific tasks, for example zeroing a particular off-diagonal element (Jacobi transformation, §11.1), or a whole particular row or column (Householder transformation, §11.2; elimination method, §11.5). In general, a finite sequence of these simple transformations cannot completely diagonalize a matrix. There are then two choices: either use the finite sequence of transformations to go most of the way (e.g., to some special form like tridiagonal or Hessenberg, see §11.2 and §11.5 below) and follow up with the second set of techniques about to be mentioned; or else iterate the finite sequence of simple transformations over and over until the deviation of the matrix from diagonal is negligibly small. This latter approach is conceptually simplest, so we will discuss it in the next section; however, for N greater than ∼ 10, it is computationally inefficient by a roughly constant factor ∼ 5. The second set of techniques, called factorization methods, is more subtle. Suppose that the matrix A can be factored into a left factor F L and a right factor FR. Then A = FL · FR or equivalently F−1 L · A = FR (11.0.16) If we now multiply back together the factors in the reverse order, and use the second equation in (11.0.16) we get FR · FL = F−1 L · A · FL (11.0.17) which we recognize as having effected a similarity transformation on A with the transformation matrix being FL! In §11.3 and §11.6 we will discuss the QR method which exploits this idea. Factorization methods also do not converge exactly in a finite number of transformations. But the better ones do converge rapidly and reliably, and, when following an appropriate initial reduction by simple similarity transformations, they are the methods of choice