《矩阵计算》课程教学资源（文献资料）The decompositional approach to matrix computation（Stewart, 2000）.pdf

I n 1951, Paul S. Dwyer published Linear Computations, perhaps the first book devoted entirely to numerical linear algebra.1 Digital computing was in its infancy, and Dwyer focused on computation with mechanical calculators. Nonetheless, the book was state of the art. Figure 1 reproduces a page of the book dealing with Gaussian elimination. In 1954, Alston S. Householder published Principles of Numerical Analysis, 2 one of the first modern treatments of high-speed digital computation. Figure 2 reproduces a page from this book, also dealing with Gaussian elimination. The contrast between these two pages is striking. The most obvious difference is that Dwyer used scalar equations whereas Householder used partitioned matrices. But a deeper difference is that while Dwyer started from a system of equations, Householder worked with a (block) LU decomposition—the factorization of a matrix into the product of lower and upper triangular matrices. Generally speaking, a decomposition is a factorization of a matrix into simpler factors. The underlying principle of the decompositional approach to matrix computation is that it is not the business of the matrix algorithmists to solve particular problems but to construct computational platforms from which a variety of problems can be solved. This approach, which was in full swing by the mid-1960s, has revolutionized matrix computation. To illustrate the nature of the decompositional approach and its consequences, I begin with a discussion of the Cholesky decomposition and the solution of linear systems—essentially the decomposition that Gauss computed in his elim- 50 COMPUTING IN SCIENCE & ENGINEERING THE DECOMPOSITIONAL APPROACH TO MATRIX COMPUTATION The introduction of matrix decomposition into numerical linear algebra revolutionized matrix computations. This article outlines the decompositional approach, comments on its history, and surveys the six most widely used decompositions. T HEME A RTICLE G.W. STEWART University of Maryland 1521-9615/00/$10.00 © 2000 IEEE the Top

52 COMPUTING IN SCIENCE & ENGINEERING transformed our problem into one for which the solution can be readily computed. We can use such decompositions to solve more than one problem. For example, the following algorithm solves the system ATx = b: 1. Solve the system Ry = b. 2. Solve the system RTx = y. (4) Again, in many statistical applications we want to compute the quantity ρ = xTA−1 x. Because xTA−1 x = xT(RTR) −1 x = (R−Tx) T(R−Tx) (5) we can compute ρ as follows: 1. Solve the system RTy = x. 2. ρ = yTy. (6) The decompositional approach can also save computation. For example, the Cholesky decomposition requires O(n3 ) operations to compute, whereas the solution of triangular systems requires only O(n2 ) operations. Thus, if you recognize that a Cholesky decomposition is being used to solve a system at one point in a computation, you can reuse the decomposition to do the same thing later without having to recompute it. Historically, Gaussian elimination and its variants (including Cholesky’s algorithm) have solved the system in Equation 1 by reducing it to an equivalent triangular system. This mixes the computation of the decomposition with the solution of the first triangular system in Equation 3, and it is not obvious how to reuse the elimination when a new right-hand side presents itself. A naive programmer is in danger of performing the reduction from the beginning, thus repeating the lion’s share of the work. On the other hand, a program that knows a decomposition is in the background can reuse it as needed. (By the way, the problem of recomputing decompositions has not gone away. Some matrix packages hide the fact that they repeatedly compute a decomposition by providing drivers to solve linear systems with a call to a single routine. If the program calls the routine again with the same matrix, it recomputes the decomposition—unnecessarily. Interpretive matrix systems such as Matlab and Mathematica have the same problemthey hide decompositions behind operators and function calls. Such are the consequences of not stressing the decompositional approach to the consumers of matrix algorithms.) Another advantage of working with decompositions is unity. There are different ways of organizing the operations involved in solving linear systems by Gaussian elimination in general and Cholesky’s algorithm in particular. Figure 3 illustrates some of these arrangements: a white area contains elements from the original matrix, a dark area contains the factors, a light gray area contains partially processed elements, and the boundary strips contain elements about to be processed. Most of these variants were originally presented in scalar form as new algorithms. Once you recognize that a decomposition is involved, it is easy to see the essential unity of the various algorithms. All the variants in Figure 3 are numerically equivalent. This means that one rounding-error analysis serves all. For example, the Cholesky algorithm, in whatever guise, is backward stable: the computed factor R satisfies (A + E) = RTR (7) where E is of the size of the rounding unit relative to A. Establishing this backward is usually the most difficult part of an analysis of the use Figure 3. These varieties of Gaussian elimination are all numerically equivalent

JANUARY/FEBRUARY 2000 53 of a decomposition to solve a problem. For example, once Equation 7 has been established, the rounding errors involved in the solutions of the triangular systems in Equation 3 can be incorporated in E with relative ease. Thus, another advantage of the decompositional approach is that it concentrates the most difficult aspects of rounding-error analysis in one place. In general, if you change the elements of a positive definite matrix, you must recompute its Cholesky decomposition from scratch. However, if the change is structured, it may be possible to compute the new decomposition directly from the old—a process known as updating. For example, you can compute the Cholesky decomposition of A + xxT from that of A in O(n2 ) operations, an enormous savings over the ab initio computation of the decomposition. Finally, the decompositional approach has greatly affected the development of software for matrix computation. Instead of wasting energy developing code for a variety of specific applications, the producers of a matrix package can concentrate on the decompositions themselves, perhaps with a few auxiliary routines to handle the most important applications. This approach has informed the major public-domain packages: the Handbook series,3 Eispack, 4 Linpack, 5 and Lapack.6 A consequence of this emphasis on decompositions is that software developers have found that most algorithms have broad computational features in common—features than can be relegated to basic linear-algebra subprograms (such as Blas), which can then be optimized for specific machines.7−9 For easy reference, the sidebar “Benefits of the decompositional approach” summarizes the advantages of decomposition. History All the widely used decompositions had made their appearance by 1909, when Schur introduced the decomposition that now bears his name. However, with the exception of the Schur decomposition, they were not cast in the language of matrices (in spite of the fact that matrices had been introduced in 185810). I provide some historical background for the individual decompositions later, but it is instructive here to consider how the originators proceeded in the absence of matrices. Gauss, who worked with positive definite systems defined by the normal equations for least squares, described his elimination procedure as the reduction of a quadratic form ϕ(x) = xTAx (I am simplifying a little here). In terms of the Cholesky factorization A = RTR, Gauss wrote ϕ(x) in the form (8) where is the ith row of R. Thus Gauss reduced ϕ(x) to a sum of squares of linear functions ρi. Because R is upper triangular, the function ρi(x) depends only on the components xi,…xn of x. Since the coefficients in the linear forms ρi are the elements of R, Gauss, by showing how to compute the ρi, effectively computed the Cholesky decomposition of A. Other decompositions were introduced in other ways. For example, Jacobi introduced the LU decomposition as a decomposition of a bilinear form into a sum of products of linear functions having an appropriate triangularity with respect to the variables. The singular value decomposition made its appearance as an orthogonal change of variables that diagonalized a bilinear form. Eventually, all these decompositions found expressions as factorizations of matrices.11 The process by which decomposition became so important to matrix computations was slow and incremental. Gauss certainly had the spirit. He used his decomposition to perform many tasks, such as computing variances, and even used it to update least-squares solutions. But Gauss never regarded his decomposition as a matrix factorization, and it would be anachrori T ϕ ρ ρ ρ ( ) ( ) ( ) ( ) x r x r x r x x x x n n = ( ) + ( ) + + ( ) ≡ + + + 1 2 2 2 2 1 2 2 2 2 T T T K K Benefits of the decompositional approach • A matrix decomposition solves not one but many problems. • A matrix decomposition, which is generally expensive to compute, can be reused to solve new problems involving the original matrix. • The decompositional approach often shows that apparently different algorithms are actually computing the same object. • The decompositional approach facilitates rounding-error analysis. • Many matrix decompositions can be updated, sometimes with great savings in computation. • By focusing on a few decompositions instead of a host of specific problems, software developers have been able to produce highly effective matrix packages

54 COMPUTING IN SCIENCE & ENGINEERING nistic to consider him the father of the decompositional approach. In the 1940s, awareness grew that the usual algorithms for solving linear systems involved matrix factorization.12,13 John Von Neumann and H.H. Goldstine, in their ground-breaking error analysis of the solution of linear systems, pointed out the division of labor between computing a factorization and solving the system:14 We may therefore interpret the elimination method as one which bases the inverting of an arbitrary matrix A on the combination of two tricks: First it decomposes A into the product of two semi-diagonal matrices C, B′ …, and consequently the inverse of A obtains immediately from those of C and B′. Second it forms their inverses by a simple, explicit, inductive process. In the 1950s and early 1960s, Householder systematically explored the relation between various algorithms in matrix terms. His book The Theory of Matrices in Numerical Analysis is the mathematical epitome of the decompositional approach.15 In 1954, Givens showed how to reduce a symmetric matrix A to tridiagonal form by orthogonal transformation.16 The reduction was merely a way station to the computation of the eigenvalues of A, and at the time no one thought of it as a decomposition. However, it and other intermediate forms have proven useful in their own right and have become a staple of the decompositional approach. In 1961, James Wilkinson gave the first backward rounding-error analysis of the solutions of linear systems.17 Here, the division of labor is complete. He gives one analysis of the computation of the LU decomposition and another of the solution of triangular systems and then combines the two. Wilkinson continued analyzing various algorithms for computing decompositions, introducing uniform techniques for dealing with the transformations used in the computations. By the time his book Algebraic Eigenvalue Problem18 appeared in 1965, the decompositional approach was firmly established. The big six There are many matrix decompositions, old and new, and the list of the latter seems to grow daily. Nonetheless, six decompositions hold the center. The reason is that they are useful and stable—they have important applications and the algorithms that compute them have a satisfactory backward rounding-error analysis (see Equation 7). In this brief tour, I provide references only for details that cannot be found in the many excellent texts and monographs on numerical linear algebra,18−26 the Handbook series,3 or the LINPACK Users’ Guide. 5 The Cholesky decomposition Description. Given a positive definite matrix A, there is a unique upper triangular matrix R with positive diagonal elements such that A = RTR. In this form, the decomposition is known as the Cholesky decomposition. It is often written in the form A = LDLT where D is diagonal and L is unit lower triangular (that is, L is lower triangular with ones on the diagonal). Applications. The Cholesky decomposition is used primarily to solve positive definite linear systems, as in Equations 3 and 6. It can also be employed to compute quantities useful in statistics, as in Equation 4. Algorithms. A Cholesky decomposition can be computed using any of the variants of Gaussian elimination (see Figure 3)modified, of course, to take advantage of symmetry. All these algorithms take approximately n3 /6 floatingpoint additions and multiplications. The algorithm Cholesky proposed corresponds to the diagram in the lower right of Figure 3. Updating. Given a Cholesky decomposition A = RTR, you can calculate the Cholesky decomposition of A + xxT from R and x in O(n2 ) floating-point additions and multiplications. The Cholesky decomposition of A − xxT can be calculated with the same number of operations. The latter process, which is called downdating, is numerically less stable than updating. The pivoted Cholesky decomposition. If P is a permutation matrix and A is positive definite, then PTAP is said to be a diagonal permutation of A (among other things, it permutes the diagonals of A). Any diagonal permutation of A is positive definite and has a Cholesky factor. Such a factorization is called a pivoted Cholesky factorization. There are many ways to pivot a Cholesky decomposition, but the most common one produces a factor satisfying