2.6 Singular Value Decomposition 59 2.6 Singular Value Decomposition There exists a very powerful set of techniques for dealing with sets ofequations or matrices that are either singular or else numerically very close to singular.In many cases where Gaussian elimination and LU decomposition fail to give satisfactory results,this set of techniques,known as singular value decomposition,or SVD, will diagnose for you precisely what the problem is.In some cases,SVD will not only diagnose the problem,it will also solve it,in the sense of giving you a useful numerical answer,although,as we shall see,not necessarily "the"answer that you thought you should get. SVD is also the method of choice for solving most linear least-squares problems. We will outline the relevant theory in this section,but defer detailed discussion of the use of SVD in this application to Chapter 15,whose subject is the parametric modeling of data. SVD methods are based on the following theorem oflinear algebra,whose proof is beyond our scope:Any M x N matrix A whose number of rows M is greater than or equal to its number of columns N,can be written as the product of an M x N column-orthogonal matrix U,an N x N diagonal matrix W with positive or zero (Nort serve elements (the singular values),and the transpose of an N x N orthogonal matrix V. America computer, make one paper University Press. THE The various shapes of these matrices will be made clearer by the following tableau: ART Programs send ! email to Copyright (C) ectcustser OF SCIENTIFIC COMPUTING (ISBN 1988-199200 v@cam 10-:6211 (2.6.1) ridge.org Numerical Recipes (outside The matrices U and V are each orthogonal in the sense that their columns are North Software. orthonormal. M 1≤k≤N ∑ttUm=6t 1≤n≤W (2.6.2) N 1≤k≤N 人VVn=6n 1≤n≤N (2.6.3)
2.6 Singular Value Decomposition 59 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). 2.6 Singular Value Decomposition There exists a very powerful set of techniques for dealing with sets of equations or matrices that are either singular or else numerically very close to singular. In many cases where Gaussian elimination and LU decomposition fail to give satisfactory results, this set of techniques, known as singular value decomposition, or SVD, will diagnose for you precisely what the problem is. In some cases, SVD will not only diagnose the problem, it will also solve it, in the sense of giving you a useful numerical answer, although, as we shall see, not necessarily “the” answer that you thought you should get. SVD is also the method of choice for solving most linear least-squares problems. We will outline the relevant theory in this section, but defer detailed discussion of the use of SVD in this application to Chapter 15, whose subject is the parametric modeling of data. SVD methods are based on the following theorem of linear algebra, whose proof is beyond our scope: Any M × N matrix A whose number of rows M is greater than or equal to its number of columns N, can be written as the product of an M × N column-orthogonal matrix U, an N × N diagonal matrix W with positive or zero elements (the singular values), and the transpose of an N × N orthogonal matrix V. The various shapes of these matrices will be made clearer by the following tableau: A = U · w1 w2 ··· ··· wN · VT (2.6.1) The matrices U and V are each orthogonal in the sense that their columns are orthonormal, M i=1 UikUin = δkn 1 ≤ k ≤ N 1 ≤ n ≤ N (2.6.2) N j=1 VjkVjn = δkn 1 ≤ k ≤ N 1 ≤ n ≤ N (2.6.3)
60 Chapter 2.Solution of Linear Algebraic Equations or as a tableau, U 。f4 83 granted for 198891992 11800 (2.6.4) from NUMERICAL RECIPESI Since V is square,it is also row-orthonormal,V.VT =1. The SVD decomposition can also be carried out when M<N.In this case (Nort server 9 the singular values wj for j=M+1,...,N are all zero,and the corresponding columns of U are also zero.Equation (2.6.2)then holds only for k,n <M. America The decomposition (2.6.1)can always be done,no matter how singular the matrix is,and it is"almost"unique.That is to say,it is unique up to (i)making the same permutation of the columns of U,elements of W,and columns of V(or Programs rows of V),or (ii)forming linear combinations of any columns of U and V whose corresponding elements of W happen to be exactly equal.An important consequence of the permutation freedom is that for the case M<N,a numerical algorithm for the decomposition need not return zero wi's for j=M+1,...,N;the N-M 6 zero singular values can be scattered among all positionsj=1,2,...,N. At the end of this section,we give a routine,svdcmp,that performs SVD on an arbitrary matrix A,replacing it by U(they are the same shape)and giving back W and V separately.The routine svdcmp is based on a routine by Forsythe et al.[1],which is in turn based on the original routine of Golub and Reinsch,found,in various forms,in [2-4]and elsewhere.These references include extensive discussion of the algorithm used.As much as we dislike the use of black-box routines,we are Fuurggoglrion going to ask you to accept this one,since it would take us too far afield to cover Numerical Recipes 10621 43108. its necessary background material here.Suffice it to say that the algorithm is very stable,and that it is very unusual for it ever to misbehave.Most of the concepts that (outside enter the algorithm(Householder reduction to bidiagonal form,diagonalization by North Software. QR procedure with shifts)will be discussed further in Chapter 11. Ifyou are as suspicious of black boxes as we are,you will want to verify yourself that svdcmp does what we say it does.That is very easy to do:Generate an arbitrary matrix A,call the routine,and then verify by matrix multiplication that (2.6.1)and (2.6.4)are satisfied.Since these two equations are the only defining requirements for SVD,this procedure is(for the chosen A)a complete end-to-end check. Now let us find out what SVD is good for
60 Chapter 2. Solution of Linear Algebraic Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). or as a tableau, UT · U = VT · V = 1 (2.6.4) Since V is square, it is also row-orthonormal, V · VT = 1. The SVD decomposition can also be carried out when M<N. In this case the singular values wj for j = M + 1,...,N are all zero, and the corresponding columns of U are also zero. Equation (2.6.2) then holds only for k, n ≤ M. The decomposition (2.6.1) can always be done, no matter how singular the matrix is, and it is “almost” unique. That is to say, it is unique up to (i) making the same permutation of the columns of U, elements of W, and columns of V (or rows of VT ), or (ii) forming linear combinations of any columns of U and V whose corresponding elements of W happen to be exactly equal. An important consequence of the permutation freedom is that for the case M<N, a numerical algorithm for the decomposition need not return zero wj ’s for j = M + 1,...,N; the N − M zero singular values can be scattered among all positions j = 1, 2,...,N. At the end of this section, we give a routine, svdcmp, that performs SVD on an arbitrary matrix A, replacing it by U (they are the same shape) and giving back W and V separately. The routine svdcmp is based on a routine by Forsythe et al. [1], which is in turn based on the original routine of Golub and Reinsch, found, in various forms, in [2-4] and elsewhere. These references include extensive discussion of the algorithm used. As much as we dislike the use of black-box routines, we are going to ask you to accept this one, since it would take us too far afield to cover its necessary background material here. Suffice it to say that the algorithm is very stable, and that it is very unusual for it ever to misbehave. Most of the concepts that enter the algorithm (Householder reduction to bidiagonal form, diagonalization by QR procedure with shifts) will be discussed further in Chapter 11. If you are as suspicious of black boxes as we are, you will want to verify yourself that svdcmp does what we say it does. That is very easy to do: Generate an arbitrary matrix A, call the routine, and then verify by matrix multiplication that (2.6.1) and (2.6.4) are satisfied. Since these two equations are the only defining requirements for SVD, this procedure is (for the chosen A) a complete end-to-end check. Now let us find out what SVD is good for
2.6 Singular Value Decomposition 61 SVD of a Square Matrix If the matrix A is square,N x N say,then U,V,and W are all square matrices of the same size.Their inverses are also trivial to compute:U and V are orthogonal, so their inverses are equal to their transposes;W is diagonal,so its inverse is the diagonal matrix whose elements are the reciprocals of the elements w.From(2.6.1) it now follows immediately that the inverse of A is A-1=V.[diag(1/u】.UT (2.6.5) The only thing that can go wrong with this construction is for one of the wi's to be zero,or (numerically)for it to be so small that its value is dominated by roundoff error and therefore unknowable.If more than one of the wi's have this problem,then the matrix is even more singular.So,first of all,SVD gives you a clear diagnosis of the situation. Formally,the condition number of a matrix is defined as the ratio of the largest (in magnitude)of the wi's to the smallest of the wi's.A matrix is singular if its RECIPES condition number is infinite,and it is ill-conditioned if its condition number is too large,that is,if its reciprocal approaches the machine's floating-point precision(for 9 example,less than 10-6 for single precision or 10-12 for double). For singular matrices,the concepts of nullspace and range are important. Consider the familiar set of simultaneous equations A·X=b (2.6.6) 。4包A的 where A is a square matrix,b and x are vectors.Equation (2.6.6)defines A as a linear mapping from the vector space x to the vector space b.If A is singular,then there is some subspace of x,called the nullspace,that is mapped to zero,A.x=0. The dimension of the nullspace (the number of linearly independent vectors x that can be found in it)is called the nullity of A. Now,there is also some subspace of b that can be"reached"by A.in the sense that there exists some x which is mapped there.This subspace of b is called the range of A.The dimension of the range is called the rank of A.If A is nonsingular,then its Numerica 10621 range will be all of the vector space b,so its rank is N.If A is singular,then the rank will be less than N.In fact,the relevant theorem is"rank plus nullity equals N." What has this to do with SVD?SVD explicitly constructs orthonormal bases for the nullspace and range of a matrix.Specifically,the columns of U whose 。指 same-numbered elements wj are nonzero are an orthonormal set of basis vectors that span the range;the columns of V whose same-numbered elements wj are zero are an orthonormal basis for the nullspace. Now let's have another look at solving the set of simultaneous linear equations (2.6.6)in the case that A is singular.First,the set of homogeneous equations,where b=0,is solved immediately by SVD:Any column of V whose corresponding wj is zero yields a solution. When the vector b on the right-hand side is not zero,the important question is whether it lies in the range of A or not.If it does,then the singular set of equations does have a solution x:in fact it has more than one solution,since any vector in the nullspace (any column of V with a corresponding zero wi)can be added to x in any linear combination
2.6 Singular Value Decomposition 61 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). SVD of a Square Matrix If the matrix A is square, N × N say, then U, V, and W are all square matrices of the same size. Their inverses are also trivial to compute: U and V are orthogonal, so their inverses are equal to their transposes; W is diagonal, so its inverse is the diagonal matrix whose elements are the reciprocals of the elements wj . From (2.6.1) it now follows immediately that the inverse of A is A−1 = V · [diag (1/wj )] · UT (2.6.5) The only thing that can go wrong with this construction is for one of the w j ’s to be zero, or (numerically) for it to be so small that its value is dominated by roundoff error and therefore unknowable. If more than one of the w j ’s have this problem, then the matrix is even more singular. So, first of all, SVD gives you a clear diagnosis of the situation. Formally, the condition number of a matrix is defined as the ratio of the largest (in magnitude) of the wj ’s to the smallest of the wj ’s. A matrix is singular if its condition number is infinite, and it is ill-conditioned if its condition number is too large, that is, if its reciprocal approaches the machine’s floating-point precision (for example, less than 10−6 for single precision or 10−12 for double). For singular matrices, the concepts of nullspace and range are important. Consider the familiar set of simultaneous equations A · x = b (2.6.6) where A is a square matrix, b and x are vectors. Equation (2.6.6) defines A as a linear mapping from the vector space x to the vector space b. If A is singular, then there is some subspace of x, called the nullspace, that is mapped to zero, A · x = 0. The dimension of the nullspace (the number of linearly independent vectors x that can be found in it) is called the nullity of A. Now, there is also some subspace of b that can be “reached” by A, in the sense that there exists some x which is mapped there. This subspace of b is called the range of A. The dimension of the range is called the rank of A. If A is nonsingular, then its range will be all of the vector space b, so its rank is N. If A is singular, then the rank will be less than N. In fact, the relevant theorem is “rank plus nullity equals N.” What has this to do with SVD? SVD explicitly constructs orthonormal bases for the nullspace and range of a matrix. Specifically, the columns of U whose same-numbered elements wj are nonzero are an orthonormal set of basis vectors that span the range; the columns of V whose same-numbered elements wj are zero are an orthonormal basis for the nullspace. Now let’s have another look at solving the set of simultaneous linear equations (2.6.6) in the case that A is singular. First, the set of homogeneous equations, where b = 0, is solved immediately by SVD: Any column of V whose corresponding w j is zero yields a solution. When the vector b on the right-hand side is not zero, the important question is whether it lies in the range of A or not. If it does, then the singular set of equations does have a solution x; in fact it has more than one solution, since any vector in the nullspace (any column of V with a corresponding zero wj ) can be added to x in any linear combination
62 Chapter 2.Solution of Linear Algebraic Equations If we want to single out one particular member of this solution-set of vectors as a representative,we might want to pick the one with the smallest lengthx2.Here is how to find that vector using SVD:Simply replace 1/wj by zero ifwj =0.(It is not very often that one gets to set oo =0!)Then compute (working from right to left) x=V·[diag(1/u】·(ur.b) (2.6.7) This will be the solution vector of smallest length:the columns of V that are in the nullspace complete the specification of the solution set. Proof:Considerx+x',where x'lies in the nullspace.Then,if W-denotes the modified inverse of W with some elements zeroed. x+x=V.W-1.UT.b+x ICAL =V.(w-1.UT.b+vT.x) (2.6.8) =w-1.UT.b+vT.xl Here the first equality follows from(2.6.7),the second and third from the orthonor- mality of V.If you now examine the two terms that make up the sum on the 玉梦 9 right-hand side,you will see that the first one has nonzero j components only where 0,while the second one,since x'is in the nullspace,has nonzero j components only where w;=0.Therefore the minimum length obtains for x'=0,q.e.d. 里的 9 If b is not in the range of the singular matrix A,then the set of equations(2.6.6) has no solution.But here is some good news:If b is not in the range of A,then equation (2.6.7)can still be used to construct a "solution"vector x.This vector x will not exactly solve A.x b.But,among all possible vectors x,it will do the closest possible job in the least squares sense.In other words(2.6.7)finds x which minimizes r≡lA·x-bl (2.6.9) The number r is called the residual of the solution. 10621 The proof is similar to(2.6.8):Suppose we modify x by adding some arbitrary Numerica x'.Then A·x-b is modified by adding some b'≡A·x'.Obviously b'isin the range of A.We then have A.x-b+b/l=|(U.W.vT).(V.W-1.UT.b)-b+b'] =(U.W.W-1.U-1).b+b/ =U.[(W.w-1-1).UT.b+UT.b']l (2.6.10) (W .W-1-1).UT.b+UT.b' Now,(W.W-1-1)is a diagonal matrix which has nonzero j components only for w=0,while UTb'has nonzero j components only for wj0,since b'lies in the range of A.Therefore the minimum obtains for b'=0,g.e.d. Figure 2.6.1 summarizes our discussion of SVD thus far
62 Chapter 2. Solution of Linear Algebraic Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). If we want to single out one particular member of this solution-set of vectors as a representative, we might want to pick the one with the smallest length |x| 2 . Here is how to find that vector using SVD: Simply replace 1/wj by zero if wj = 0. (It is not very often that one gets to set ∞ = 0 !) Then compute (working from right to left) x = V · [diag (1/wj )] · (UT · b) (2.6.7) This will be the solution vector of smallest length; the columns of V that are in the nullspace complete the specification of the solution set. Proof: Consider |x + x |, where x lies in the nullspace. Then, if W−1 denotes the modified inverse of W with some elements zeroed, |x + x | = V · W−1 · UT · b + x = V · (W−1 · UT · b + VT · x ) = W−1 · UT · b + VT · x (2.6.8) Here the first equality follows from (2.6.7), the second and third from the orthonormality of V. If you now examine the two terms that make up the sum on the right-hand side, you will see that the first one has nonzero j components only where wj = 0, while the second one, since x is in the nullspace, has nonzero j components only where wj = 0. Therefore the minimum length obtains for x = 0, q.e.d. If b is not in the range of the singular matrix A, then the set of equations (2.6.6) has no solution. But here is some good news: If b is not in the range of A, then equation (2.6.7) can still be used to construct a “solution” vector x. This vector x will not exactly solve A · x = b. But, among all possible vectors x, it will do the closest possible job in the least squares sense. In other words (2.6.7) finds x which minimizes r ≡ |A · x − b| (2.6.9) The number r is called the residual of the solution. The proof is similar to (2.6.8): Suppose we modify x by adding some arbitrary x . Then A · x − b is modified by adding some b ≡ A · x . Obviously b is in the range of A. We then have A · x − b + b = (U · W · VT ) · (V · W−1 · UT · b) − b + b = (U · W · W−1 · UT − 1) · b + b = U · (W · W−1 − 1) · UT · b + UT · b = (W · W−1 − 1) · UT · b + UT · b (2.6.10) Now, (W · W−1 − 1) is a diagonal matrix which has nonzero j components only for wj = 0, while UT b has nonzero j components only for wj = 0, since b lies in the range of A. Therefore the minimum obtains for b = 0, q.e.d. Figure 2.6.1 summarizes our discussion of SVD thus far.
2.6 Singular Value Decomposition 63 A A·x=b mnttt granted for (a) (including this one) interne null /Cambridge space solutions of n NUMERICAL RECIPES IN C: solutions of A·X=c A·X=d -7423 (North America to any server computer, uae us e University Press. THE SVD“solution" ofA·X=c 是 range of A send d st st copyfor thei Programs to dir Copyright (C) SVD solution of ART OF SCIENTIFIC COMPUTING(ISBN A·x=d (b) v@cam Figure 2.6.1.(a)A nonsingular matrix A maps a vector space into one of the same dimension.The 1988-1992 by Numerical Recipes 10-621 vector x is mapped into b,so that x satisfies the equation A.x b.(b)A singular matrix A maps a vector space into one of lower dimensionality,here a plane into a line,called the "range"of A.The "nullspace"of A is mapped to zero.The solutions of A.x d consist of any one particular solution plus 43108.5 any vector in the nullspace,here forming a line parallel to the nullspace.Singular value decomposition (SVD)selects the particular solution closest to zero,as shown.The point c lies outside of the range of A,so A.x =c has no solution.SVD finds the least-squares best compromise solution,namely a (outside 膜 solution of A.x ='as shown. North Software. In the discussion since equation(2.6.6).we have been pretending that a matrix either is singular or else isn't.That is of course true analytically.Numerically visit website however,the far more common situation is that some of the w;'s are very small machine but nonzero,so that the matrix is ill-conditioned.In that case,the direct solution methods of LU decomposition or Gaussian elimination may actually give a formal solution to the set of equations(that is,a zero pivot may not be encountered);but the solution vector may have wildly large components whose algebraic cancellation. when multiplying by the matrix A,may give a very poor approximation to the right-hand vector b.In such cases,the solution vector x obtained by zeroing the
2.6 Singular Value Decomposition 63 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). A ⋅ x = b SVD “solution” of A ⋅ x = c solutions of solutions of A ⋅ x = c′ A ⋅ x = d null space of A SVD solution of A ⋅ x = d range of A d c (b) (a) A x b c′ Figure 2.6.1. (a) A nonsingular matrix A maps a vector space into one of the same dimension. The vector x is mapped into b, so that x satisfies the equation A · x = b. (b) A singular matrix A maps a vector space into one of lower dimensionality, here a plane into a line, called the “range” of A. The “nullspace” of A is mapped to zero. The solutions of A · x = d consist of any one particular solution plus any vector in the nullspace, here forming a line parallel to the nullspace. Singular value decomposition (SVD) selects the particular solution closest to zero, as shown. The point c lies outside of the range of A, so A · x = c has no solution. SVD finds the least-squares best compromise solution, namely a solution of A · x = c, as shown. In the discussion since equation (2.6.6), we have been pretending that a matrix either is singular or else isn’t. That is of course true analytically. Numerically, however, the far more common situation is that some of the wj ’s are very small but nonzero, so that the matrix is ill-conditioned. In that case, the direct solution methods of LU decomposition or Gaussian elimination may actually give a formal solution to the set of equations (that is, a zero pivot may not be encountered); but the solution vector may have wildly large components whose algebraic cancellation, when multiplying by the matrix A, may give a very poor approximation to the right-hand vector b. In such cases, the solution vector x obtained by zeroing the