gtn THE DECOMPOSITIONAL APPROACH TO MATRIX COMPUTATION The introduction of matrix decomposition into numerical linear algebra revolutionized matrix computations.This article outlines the decompositional approach,comments on its history,and surveys the six most widely used decompositions. n 1951,Paul S.Dwyer published Linear used scalar equations whereas Householder used lecomnosition-the factorization of a matris cal calculators.Nonetheless,the book was state into the product of lower and upper triangular matrices 10544 ples of Nmerical Analysis,one of the first mod- proach is that t isnot the ess of the matrix algorithmists tosolve par is strik- ing.The most obvious difference is that Dwyer swing by the mid-1960s,has revolutionized ma- itiona 1521-9%15001000200E uences.I begin with a G.W.STEWART ofthe Choeyh roach and its cons University of Maryland 50 COMPUTING IN SCIENCE ENGINEERING
I n 1951, Paul S. Dwyer published Linear Computations, perhaps the first book devoted entirely to numerical linear algebra.1 Digital computing was in its infancy, and Dwyer focused on computation with mechanical calculators. Nonetheless, the book was state of the art. Figure 1 reproduces a page of the book dealing with Gaussian elimination. In 1954, Alston S. Householder published Principles of Numerical Analysis, 2 one of the first modern treatments of high-speed digital computation. Figure 2 reproduces a page from this book, also dealing with Gaussian elimination. The contrast between these two pages is striking. The most obvious difference is that Dwyer used scalar equations whereas Householder used partitioned matrices. But a deeper difference is that while Dwyer started from a system of equations, Householder worked with a (block) LU decomposition—the factorization of a matrix into the product of lower and upper triangular matrices. Generally speaking, a decomposition is a factorization of a matrix into simpler factors. The underlying principle of the decompositional approach to matrix computation is that it is not the business of the matrix algorithmists to solve particular problems but to construct computational platforms from which a variety of problems can be solved. This approach, which was in full swing by the mid-1960s, has revolutionized matrix computation. To illustrate the nature of the decompositional approach and its consequences, I begin with a discussion of the Cholesky decomposition and the solution of linear systems—essentially the decomposition that Gauss computed in his elim- 50 COMPUTING IN SCIENCE & ENGINEERING THE DECOMPOSITIONAL APPROACH TO MATRIX COMPUTATION The introduction of matrix decomposition into numerical linear algebra revolutionized matrix computations. This article outlines the decompositional approach, comments on its history, and surveys the six most widely used decompositions. T HEME A RTICLE G.W. STEWART University of Maryland 1521-9615/00/$10.00 © 2000 IEEE the Top
.6 rtdTephvo dingin +++= ) an4-(9-(e)} 阿4=- 器波兰 b可the rystem 2-2w ()-8 端 a10 )0-()- 兰滋出 Figure 1.This page from Linear Computations Figure 2.On this page from Principles of Numerical ey Acan be factored in the form A=RTR where Risa ria Adisclaimer is in order.This article deals pri- A marily with dense matrix computations.Al ation 2 can be used matrices,the ways in which it has affectedthem hen x is the solution of the triangular system are different from what I describe here. However,by definitionyisthesotionof The Cholesky decomposition and used the pr thee rian nsequently,we have re linear systems gorithm: ochto 3) Ar=b (1) Because triangular systems are easy to solve,the where Ais positive definite.It is well known that introduction of the Cholesky decomposition has ANUARY/FEBRUARY 2000 51
JANUARY/FEBRUARY 2000 51 ination algorithm. This article also provides a tour of the five other major matrix decompositions, including the pivoted LU decomposition, the QR decomposition, the spectral decomposition, the Schur decomposition, and the singular value decomposition. A disclaimer is in order. This article deals primarily with dense matrix computations. Although the decompositional approach has greatly influenced iterative and direct methods for sparse matrices, the ways in which it has affected them are different from what I describe here. The Cholesky decomposition and linear systems We can use the decompositional approach to solve the system Ax = b (1) where A is positive definite. It is well known that A can be factored in the form A = RTR (2) where R is an upper triangular matrix. The factorization is called the Cholesky decomposition of A. The factorization in Equation 2 can be used to solve linear systems as follows. If we write the system in the form RTRx = b and set y = R−Tb, then x is the solution of the triangular system Rx = y. However, by definition y is the solution of the system RTy = b. Consequently, we have reduced the problem to the solution of two triangular systems, as illustrated in the following algorithm: 1. Solve the system RTy = b. 2. Solve the system Rx = y. (3) Because triangular systems are easy to solve, the introduction of the Cholesky decomposition has Figure 1. This page from Linear Computations shows that Paul Dwyer’s approach begins with a system of scalar equations. Courtesy of John Wiley & Sons. Figure 2. On this page from Principles of Numerical Analysis, Alston Householder uses partitioned matrices and LU decomposition. Courtesy of McGraw-Hill
used to solve a system at one point in a compu tation,you can reuse the decomposition to do pute it solved the system in Equation I by reducing it elimination when a new right-hand side presents itself.A naive programmer is in danger of per from the tion is in the background can reuse it as needed (By the way,the problem of rec *s call to a single rou such as Matlab and Mathematica have the same problem- hese varleties of Gaussian eliminatior are an numer cally ch are proach to the consumers of matrixa rithms Another advantage of working with decompo d in n in more than one problem.For example,the fol- and Cholesky's algorithm in particular.Figure3 lowing algorithm solves the system A'x=: area contains partially processed element the boundary strips contain elements about to rocessed.Most of these variants w lly pres nted in sc (=(R)(RT)(5)volved.it is easy to see the essential unity of the we can compute pas follows: 1.Solve the system Ry=x all For ex c.the 2.0=vTy. (0 gorithm,in whatever guise,is backward stable the computed factor Rsatisfies ThedeompeoioaleramthCaolkga composition requires)op (A+E)=RTR rations to com (7 pute,whereas the solution of triangular systems where E is of the size of the rounding unit rela COMPUTING IN SCIENCE ENGINEERINC
52 COMPUTING IN SCIENCE & ENGINEERING transformed our problem into one for which the solution can be readily computed. We can use such decompositions to solve more than one problem. For example, the following algorithm solves the system ATx = b: 1. Solve the system Ry = b. 2. Solve the system RTx = y. (4) Again, in many statistical applications we want to compute the quantity ρ = xTA−1 x. Because xTA−1 x = xT(RTR) −1 x = (R−Tx) T(R−Tx) (5) we can compute ρ as follows: 1. Solve the system RTy = x. 2. ρ = yTy. (6) The decompositional approach can also save computation. For example, the Cholesky decomposition requires O(n3 ) operations to compute, whereas the solution of triangular systems requires only O(n2 ) operations. Thus, if you recognize that a Cholesky decomposition is being used to solve a system at one point in a computation, you can reuse the decomposition to do the same thing later without having to recompute it. Historically, Gaussian elimination and its variants (including Cholesky’s algorithm) have solved the system in Equation 1 by reducing it to an equivalent triangular system. This mixes the computation of the decomposition with the solution of the first triangular system in Equation 3, and it is not obvious how to reuse the elimination when a new right-hand side presents itself. A naive programmer is in danger of performing the reduction from the beginning, thus repeating the lion’s share of the work. On the other hand, a program that knows a decomposition is in the background can reuse it as needed. (By the way, the problem of recomputing decompositions has not gone away. Some matrix packages hide the fact that they repeatedly compute a decomposition by providing drivers to solve linear systems with a call to a single routine. If the program calls the routine again with the same matrix, it recomputes the decomposition—unnecessarily. Interpretive matrix systems such as Matlab and Mathematica have the same problemthey hide decompositions behind operators and function calls. Such are the consequences of not stressing the decompositional approach to the consumers of matrix algorithms.) Another advantage of working with decompositions is unity. There are different ways of organizing the operations involved in solving linear systems by Gaussian elimination in general and Cholesky’s algorithm in particular. Figure 3 illustrates some of these arrangements: a white area contains elements from the original matrix, a dark area contains the factors, a light gray area contains partially processed elements, and the boundary strips contain elements about to be processed. Most of these variants were originally presented in scalar form as new algorithms. Once you recognize that a decomposition is involved, it is easy to see the essential unity of the various algorithms. All the variants in Figure 3 are numerically equivalent. This means that one rounding-error analysis serves all. For example, the Cholesky algorithm, in whatever guise, is backward stable: the computed factor R satisfies (A + E) = RTR (7) where E is of the size of the rounding unit relative to A. Establishing this backward is usually the most difficult part of an analysis of the use Figure 3. These varieties of Gaussian elimination are all numerically equivalent
of a decor rounding errors involved in the solutions of the the decompoonal elative ease.I hus,ano but many problems e the of original matrix. In general, The decompositional approach often shows that apparently Cholest dec compute the new decomposition directly from Many matrix decompositions can be updated,sometimes with great savings in computatio s instead of a host o position of from that of a in ou op erations,an enormous savings over the initio produce highly effective matrix packages the reduction of a quadratic form x)=xA iety of specific applica con nac m most important applications.Thisappr oach has )=++g+ =p(x)+p防(x)+K+p(x) 1-6A wheren is the ith row of R.Thus Gauss re- found that most algorithms have broad compu- duced )toasum of squares of linear functions nal features in common s than car P Becaus Ris upper triangular,the function are the elements of RGauss,by showing how decomposionalaproaCh elacobi in introduced in othe roduced the LU decomposition as a decomposition of a bi- linear form into a sum of products of linear fund duced the decomp osition that now hears his osition made its an earance as an ort name.However,with the exception of the Schur nal change of variables that diagonalizeda bilin decomposition,they were the lar form.Eventually,all these decomposition some historical background for the individual so important to matrix computations was slow nd incremental.Gauss certainly had the spirit d in the He used used it to updatee uares solutions but tems defined by the normal equations for least Gauss never regarded his decomposition as a squares,described his elimination procedure as matrix factorization,and it would be anachro- ANUARY/FEBRUARY 2000
JANUARY/FEBRUARY 2000 53 of a decomposition to solve a problem. For example, once Equation 7 has been established, the rounding errors involved in the solutions of the triangular systems in Equation 3 can be incorporated in E with relative ease. Thus, another advantage of the decompositional approach is that it concentrates the most difficult aspects of rounding-error analysis in one place. In general, if you change the elements of a positive definite matrix, you must recompute its Cholesky decomposition from scratch. However, if the change is structured, it may be possible to compute the new decomposition directly from the old—a process known as updating. For example, you can compute the Cholesky decomposition of A + xxT from that of A in O(n2 ) operations, an enormous savings over the ab initio computation of the decomposition. Finally, the decompositional approach has greatly affected the development of software for matrix computation. Instead of wasting energy developing code for a variety of specific applications, the producers of a matrix package can concentrate on the decompositions themselves, perhaps with a few auxiliary routines to handle the most important applications. This approach has informed the major public-domain packages: the Handbook series,3 Eispack, 4 Linpack, 5 and Lapack.6 A consequence of this emphasis on decompositions is that software developers have found that most algorithms have broad computational features in common—features than can be relegated to basic linear-algebra subprograms (such as Blas), which can then be optimized for specific machines.7−9 For easy reference, the sidebar “Benefits of the decompositional approach” summarizes the advantages of decomposition. History All the widely used decompositions had made their appearance by 1909, when Schur introduced the decomposition that now bears his name. However, with the exception of the Schur decomposition, they were not cast in the language of matrices (in spite of the fact that matrices had been introduced in 185810). I provide some historical background for the individual decompositions later, but it is instructive here to consider how the originators proceeded in the absence of matrices. Gauss, who worked with positive definite systems defined by the normal equations for least squares, described his elimination procedure as the reduction of a quadratic form ϕ(x) = xTAx (I am simplifying a little here). In terms of the Cholesky factorization A = RTR, Gauss wrote ϕ(x) in the form (8) where is the ith row of R. Thus Gauss reduced ϕ(x) to a sum of squares of linear functions ρi. Because R is upper triangular, the function ρi(x) depends only on the components xi,…xn of x. Since the coefficients in the linear forms ρi are the elements of R, Gauss, by showing how to compute the ρi, effectively computed the Cholesky decomposition of A. Other decompositions were introduced in other ways. For example, Jacobi introduced the LU decomposition as a decomposition of a bilinear form into a sum of products of linear functions having an appropriate triangularity with respect to the variables. The singular value decomposition made its appearance as an orthogonal change of variables that diagonalized a bilinear form. Eventually, all these decompositions found expressions as factorizations of matrices.11 The process by which decomposition became so important to matrix computations was slow and incremental. Gauss certainly had the spirit. He used his decomposition to perform many tasks, such as computing variances, and even used it to update least-squares solutions. But Gauss never regarded his decomposition as a matrix factorization, and it would be anachrori T ϕ ρ ρ ρ ( ) ( ) ( ) ( ) x r x r x r x x x x n n = ( ) + ( ) + + ( ) ≡ + + + 1 2 2 2 2 1 2 2 2 2 T T T K K Benefits of the decompositional approach • A matrix decomposition solves not one but many problems. • A matrix decomposition, which is generally expensive to compute, can be reused to solve new problems involving the original matrix. • The decompositional approach often shows that apparently different algorithms are actually computing the same object. • The decompositional approach facilitates rounding-error analysis. • Many matrix decompositions can be updated, sometimes with great savings in computation. • By focusing on a few decompositions instead of a host of specific problems, software developers have been able to produce highly effective matrix packages
nistic to consider him the father of the decom- algorithms that compute them have a satisfac tory backward rounding-error analysis (see ms for s ences H.H.Goldstine.in their ground-breaking error merical linear algebra, analysis of the solution oflinear systems,pointed ries,or the LINPACK Users'Guide. ng the a positive definite matrix We may therefore interpret the elimination er triangular matrix R ses the inverting of an x A on the of tw A=RTR In this form,the decomposition is known as th verses by a simple,e ctive proces he6rrdcomposiontsoienwmlians In the 1950s and early 1960s,Householder A=LDLT y explored the relation bet ms in m terms. 6 th mathe mpositi al) Applications The Cholesky decomposition is red h to reduce a sym orm by or ms, qu ons and 6.1 a way station to the computation of the eia tics as in Equation 4 values of A,and at the time no one thought of it Algorithms.A Cholesky decomposition can riants of Gauss terme to take aduanta y All the . comp ach algorithms take ap ima evn/6f oating e the first back nt add ons and multiplications.The alg ward rounding the so y prop respon s to the complete.He gives s one analysis of the comp Updating.Given a Cholesk tation of the LU decomposition and another of A=RTR,you can calculate the Cholesky d rom R a analyzing nd x in O(n Cholmg-po nt a tions.introducing uniform techniques for deal culated with the same number of operations. ing with the transformations used in the com The latter process,which is called downdating,is time hi boo updating stablished nd Aip nite,then PTAPis said to be a diagonal permu tation of A (an The big six ong other thing,it permutes the agonals o Any diagona old Such a factorization is called a pivoted daily.Nonetheless,six decompositions hold the factorization.There are many ways to pivot a e useful and -they have important applications and one pro COMPUTING IN SCIENCE ENGINEERING
54 COMPUTING IN SCIENCE & ENGINEERING nistic to consider him the father of the decompositional approach. In the 1940s, awareness grew that the usual algorithms for solving linear systems involved matrix factorization.12,13 John Von Neumann and H.H. Goldstine, in their ground-breaking error analysis of the solution of linear systems, pointed out the division of labor between computing a factorization and solving the system:14 We may therefore interpret the elimination method as one which bases the inverting of an arbitrary matrix A on the combination of two tricks: First it decomposes A into the product of two semi-diagonal matrices C, B′ …, and consequently the inverse of A obtains immediately from those of C and B′. Second it forms their inverses by a simple, explicit, inductive process. In the 1950s and early 1960s, Householder systematically explored the relation between various algorithms in matrix terms. His book The Theory of Matrices in Numerical Analysis is the mathematical epitome of the decompositional approach.15 In 1954, Givens showed how to reduce a symmetric matrix A to tridiagonal form by orthogonal transformation.16 The reduction was merely a way station to the computation of the eigenvalues of A, and at the time no one thought of it as a decomposition. However, it and other intermediate forms have proven useful in their own right and have become a staple of the decompositional approach. In 1961, James Wilkinson gave the first backward rounding-error analysis of the solutions of linear systems.17 Here, the division of labor is complete. He gives one analysis of the computation of the LU decomposition and another of the solution of triangular systems and then combines the two. Wilkinson continued analyzing various algorithms for computing decompositions, introducing uniform techniques for dealing with the transformations used in the computations. By the time his book Algebraic Eigenvalue Problem18 appeared in 1965, the decompositional approach was firmly established. The big six There are many matrix decompositions, old and new, and the list of the latter seems to grow daily. Nonetheless, six decompositions hold the center. The reason is that they are useful and stable—they have important applications and the algorithms that compute them have a satisfactory backward rounding-error analysis (see Equation 7). In this brief tour, I provide references only for details that cannot be found in the many excellent texts and monographs on numerical linear algebra,18−26 the Handbook series,3 or the LINPACK Users’ Guide. 5 The Cholesky decomposition Description. Given a positive definite matrix A, there is a unique upper triangular matrix R with positive diagonal elements such that A = RTR. In this form, the decomposition is known as the Cholesky decomposition. It is often written in the form A = LDLT where D is diagonal and L is unit lower triangular (that is, L is lower triangular with ones on the diagonal). Applications. The Cholesky decomposition is used primarily to solve positive definite linear systems, as in Equations 3 and 6. It can also be employed to compute quantities useful in statistics, as in Equation 4. Algorithms. A Cholesky decomposition can be computed using any of the variants of Gaussian elimination (see Figure 3)modified, of course, to take advantage of symmetry. All these algorithms take approximately n3 /6 floatingpoint additions and multiplications. The algorithm Cholesky proposed corresponds to the diagram in the lower right of Figure 3. Updating. Given a Cholesky decomposition A = RTR, you can calculate the Cholesky decomposition of A + xxT from R and x in O(n2 ) floating-point additions and multiplications. The Cholesky decomposition of A − xxT can be calculated with the same number of operations. The latter process, which is called downdating, is numerically less stable than updating. The pivoted Cholesky decomposition. If P is a permutation matrix and A is positive definite, then PTAP is said to be a diagonal permutation of A (among other things, it permutes the diagonals of A). Any diagonal permutation of A is positive definite and has a Cholesky factor. Such a factorization is called a pivoted Cholesky factorization. There are many ways to pivot a Cholesky decomposition, but the most common one produces a factor satisfying