APPLIED MATHEMATICS AN COMP风TATION ELSEVIER Applied Mathematics and Computation 144 (2003)441-455 www.elsevier.com/locate/amc Analysis of peaks and plateaus in a Galerkin/minimal residual pair of methods for solving Ax=b☆ Yuming Shen ab,Jinxi Zhao a a State Key Laboratory for Novel Software Technology,Nanjing University. Nanjing 210093.PR China Department of Mathematics,Nanjing University.Nanjing 210093.PR China Abstract Irregular peaks often appear if we use Galerkin methods for solving linear systems of equations Ax=b.These peaks bring about too difficult to identify convergence.To remedy this disadvantage,we have to spend more work and memory,that is we use norm minimizing methods for solving Ax =b.However,plateaus cannot be avoided.In this paper we give a sufficient and necessary condition for occurring of peaks.Also we present some related factors for this behavior. 2002 Elsevier Inc.All rights reserved. 1.Introduction and definitions For solving linear systems of equations Ax=b (1.1) there are two classes of iterative methods commonly used.One is Galerkin methods such as Lanczos [10],BCG [5]and FOM [11].The other is norm minimizing methods such as MINRES [12],QMR [6]and GMRES [13]. *Supported by the State 863-plan High Science and Technology of China. Corresponding author.Address:Department of Computer Science and Technology,21000 Nanjing,China. E-mail address:jxzhao@nju.edu.cn (J.Zhao). 0096-3003/02/S-see front matter 2002 Elsevier Inc.All rights reserved. doi10.1016/S0096-3003(02)00419-8
Analysis of peaks and plateaus in a Galerkin/minimal residual pair of methods for solving Ax ¼ b q Yuming Shen a,b, Jinxi Zhao a,* a State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, PRChina b Department of Mathematics, Nanjing University, Nanjing 210093, PRChina Abstract Irregular peaks often appear if we use Galerkin methods for solving linear systems of equations Ax ¼ b. These peaks bring about too difficult to identify convergence. To remedy this disadvantage, we have to spend more work and memory, that is we use norm minimizing methods for solving Ax ¼ b. However, plateaus cannot be avoided. In this paper we give a sufficient and necessary condition for occurring of peaks. Also we present some related factors for this behavior. 2002 Elsevier Inc. All rights reserved. 1. Introduction and definitions For solving linear systems of equations Ax ¼ b ð1:1Þ there are two classes of iterative methods commonly used. One is Galerkin methods such as Lanczos [10], BCG [5] and FOM [11]. The other is norm minimizing methods such as MINRES [12], QMR [6] and GMRES [13]. q Supported by the State 863-plan High Science and Technology of China. * Corresponding author. Address: Department of Computer Science and Technology, 210008 Nanjing, China. E-mail address: jxzhao@nju.edu.cn (J. Zhao). 0096-3003/02/$ - see front matter 2002 Elsevier Inc. All rights reserved. doi:10.1016/S0096-3003(02)00419-8 Applied Mathematics and Computation 144 (2003) 441–455 www.elsevier.com/locate/amc
442 Y.Shen,J.Zhao Appl.Math.Comput.144 (2003)441-455 The convergence of any iterative method is said to have occurred at iteration k if for some specified convergence tolerance & lxl/lol≤e (1.2) where ro is the initial residual,r&=-Ax+b,and x&is the kth iteration Without any loss of generality,we will assume that A is real and nonsingular. The initial guess xo=0 so that ro=b.In this paper we focus on a pair of Lanczos/MINRES methods for solving Eq.(1.1)when A is an n x n symmetric matrix.See [2]for results of the pairs GMRES/FOM and QMR/BCG and for details of theorems and proofs are not include in this paper.It is shown that using Galerkin methods for solving linear system (1.1)with 4 either real symmetric or nonsymmetric the residual norms,r,k=1,2,...,are not always monotonically decreasing as a function of the iteration number.Irregu- lar peaks can appear in such curves,making it difficult to identify convergence, and making the user feel insecure about using the method.See for example Fig.1. One way of copying with this problem is to use norm minimizing methods. Since the Krylov subspaces generated are nested,the residual normr must be a monotone decreasing function of the iteration number k.However,these 0m5o110 15 w.Iou [enp!sa. 20 25 35 102030405060 708090100 iteration number Fig.1.The convergence curves of Lanczos
The convergence of any iterative method is said to have occurred at iteration k if for some specified convergence tolerance e, krkk=kr0k 6 e ð1:2Þ where r0 is the initial residual, rk ¼ Axk þ b, and xk is the kth iteration. Without any loss of generality, we will assume that A is real and nonsingular. The initial guess x0 ¼ 0 so that r0 ¼ b. In this paper we focus on a pair of Lanczos/MINRES methods for solving Eq. (1.1) when A is an n n symmetric matrix. See [2] for results of the pairs GMRES/FOM and QMR/BCG and for details of theorems and proofs are not include in this paper. It is shown that using Galerkin methods for solving linear system (1.1) with A either real symmetric or nonsymmetric the residual norms, krkk, k ¼ 1; 2; ... ; are not always monotonically decreasing as a function of the iteration number. Irregular peaks can appear in such curves, making it difficult to identify convergence, and making the user feel insecure about using the method. See for example Fig. 1. One way of copying with this problem is to use norm minimizing methods. Since the Krylov subspaces generated are nested, the residual norm krkk must be a monotone decreasing function of the iteration number k. However, these Fig. 1. The convergence curves of Lanczos. 442 Y. Shen, J. Zhao / Appl. Math. Comput. 144 (2003) 441–455
Y.Shen,J.Zhao Appl.Math.Comput.144 (2003)441-455 443 methods have been devised that require a great deal of work and memory Moreover,plateaus can appear in such plots,intervals of iterations over which the norm of the residual decrease at an unacceptably slow rate of change.See for example Fig.2. In this paper we examine,both experimentally and theoretically,peak and plateau formation generated by the Lanczos/MINRES.In Section 2 we present relationships between peaks and plateaus.In Section 3 we identify some factors which initiate peak formations in the Lanczos residual norm plots.In Section 4 we give some numerical experiments to examine our conclusions. The norm is the Euclidean two norm,or spectral norm.The and denote the Lanczos residual and MINRES residual,respectively. The Krylov subspace is defined by K三Ke(4ro)三span{o,Aro,,A-o,k≥1 and the corresponding Krylov matrix is K≡(m,A0,,A-1o),k≥1 Throughout the paper we refer to peaks and plateaus in residual norm plots as follows. 0 20 3 0102030405060708090100 iteration number Fig.2.The convergence curves of MINRES
methods have been devised that require a great deal of work and memory. Moreover, plateaus can appear in such plots, intervals of iterations over which the norm of the residual decrease at an unacceptably slowrate of change. See for example Fig. 2. In this paper we examine, both experimentally and theoretically, peak and plateau formation generated by the Lanczos/MINRES. In Section 2 we present relationships between peaks and plateaus. In Section 3 we identify some factors which initiate peak formations in the Lanczos residual norm plots. In Section 4 we give some numerical experiments to examine our conclusions. The norm k:k is the Euclidean two norm, or spectral norm. The rLR k and rMR k denote the Lanczos residual and MINRES residual, respectively. The Krylov subspace is defined by Kk : Kk ðA;r0Þ spanfr0; Ar0; ... ; Ak1 r0g; k P 1 and the corresponding Krylov matrix is Kk ðr0; Ar0; ... ; Ak1 r0Þ; k P1 Throughout the paper we refer to peaks and plateaus in residual norm plots as follows. Fig. 2. The convergence curves of MINRES. Y. Shen, J. Zhao / Appl. Math. Comput. 144 (2003) 441–455 443
444 Y.Shen,J.Zhao Appl.Math.Comput.144 (2003)441-455 Definition 1.1 (Cullum,1995 [1)).A peak is any consecutive section of a re- sidual norm plot during which the residual norms increase to a local maximum and then decrease to a local minimum. Definition 1.2 (Cullum,1995 [1]).A plateau is any consecutive section of a residual norm plot during which the norm of the residual decrease at an un- acceptably slow rate of change. 2.Peaks,plateaus and angles between subspaces Thanks to J.Cullum and A.Greenbaum,in [1,2]they indicate a correlation between peaks and plateaus.Whenever a peak occurs there is a plateau under it.The converse however may not be true.It is possible for a plateau to occur in a MINRES residual norm plot without a visible corresponding peak in the corresponding Lanczos residual norm plot.They also indicate that whenever the residual norm plot for the MINRES is decreasing rapidly the corre- sponding residual norm plot for the Lanczos iterates is also decreasing rapidly The corresponding residual norm plots appear to track each other.In this section we consider the same problem in another way.We recall that in many MINRES implementations a least squares problems is solved in each iteration by reducing a Hessenberg matrix to upper triangular form via Givens rotation [11].At iteration k a Givens rotation, 0 Gk= Ck Sk (2.1) 一Sk is generated to eliminate the trailing element of the Hessenberg matrix.Notice that these se and c&are not merely artifacts of the computational scheme but are the sines and cosines of the angles AK and K&. The following relations are fundamental for our later investigations(see [4] Section 2). Theorem 2.1 (Eiermann and Ernst,2001 [4)).For k =1,2,...,L-1,there holds Sk=sin∠(Kk,AKk)and ck=cos∠(Kk,AK)】 (2.2)
Definition 1.1 (Cullum, 1995 [1]). A peak is any consecutive section of a residual norm plot during which the residual norms increase to a local maximum and then decrease to a local minimum. Definition 1.2 (Cullum, 1995 [1]). A plateau is any consecutive section of a residual norm plot during which the norm of the residual decrease at an unacceptably slowrate of change. 2. Peaks, plateaus and angles between subspaces Thanks to J. Cullum and A. Greenbaum, in [1,2] they indicate a correlation between peaks and plateaus. Whenever a peak occurs there is a plateau under it. The converse however may not be true. It is possible for a plateau to occur in a MINRES residual norm plot without a visible corresponding peak in the corresponding Lanczos residual norm plot. They also indicate that whenever the residual norm plot for the MINRES is decreasing rapidly the corresponding residual norm plot for the Lanczos iterates is also decreasing rapidly. The corresponding residual norm plots appear to track each other. In this section we consider the same problem in another way. We recall that in many MINRES implementations a least squares problems is solved in each iteration by reducing a Hessenberg matrix to upper triangular form via Givens rotation [11]. At iteration k a Givens rotation, Gk ¼ 1 0 .. . 1 ck sk sk ck 1 .. . 0 1 0 BBBBBBBBBBB@ 1 CCCCCCCCCCCA ð2:1Þ is generated to eliminate the trailing element of the Hessenberg matrix. Notice that these sk and ck are not merely artifacts of the computational scheme but are the sines and cosines of the angles AKk and Kk . The following relations are fundamental for our later investigations (see [4] Section 2). Theorem 2.1 (Eiermann and Ernst, 2001 [4]). For k ¼ 1; 2; ... ; L 1, there holds sk ¼ sin \ðKk ; AKk Þ and ck ¼ cos \ðKk ; AKk Þ ð2:2Þ 444 Y. Shen, J. Zhao / Appl. Math. Comput. 144 (2003) 441–455
Y.Shen,J.Zhao Appl.Math.Comput.144 (2003)441-455 445 where /(Kk,AKg)denotes the largest canonical angle between the spaces Kx and AKg.The quantities c and sk are given by the Givens rotations Gg of (2.1) L :minfk:xMR =4-16)=minfk:xR=4-1b). Theorem 2.2 (Eiermann and Ernst,2001 [4]).With s&sin /(K&,AK&)and c&cos (K&,AKk)the MINRES residual and Lanczos residual approximations with respect to the Krylov subspaces K&satisfy lrMR‖=clR‖ (2.3) l4R‖=sl‖=ss2.sellroll (2.4) TMR STMR+CHIR (2.5) In view of s=lrMRll/llvgll,i.e.,=1-Rwe obtain the fol- lowing theorem Theorem 2.3 (Eiermann and Ernst,2001 [4]).In exact arithmetic,if c&0 at each iteration k,then the Lanczos residuals and MINRES residuals are related by R‖=4I/√1-R/r (2.6) Using the expression of sin (K&,4K)=l/l,we can rewrite (2.6)as follows: IlR‖=l4/W1-(sin∠(K,AKx)月 (2.7) (2.7)shows that if (K,AK)is reduced by a significant factor at step k,then the Lanczos residual norm will be approximately equal to the MINRES re- sidual norm at step k,since the denominator in the right-hand side of(2.7)will be close to 1.If the∠(Kk,AKk)remains almostπ/2,however,then the de- nominator in the right-hand side of(2.7)is close to 0 and the Lanczos residual norm will be much larger. As shown before the behavior of the angles (K&,AK)as k approaches oo play a crucial role in the convergence of the MR and LR approximates.If the angles actually tend to zeros rapidly,in view of (2.4),implies superlinear convergence.Based on the Theorem 2.3 we can prove the following two propositions. I Given orthogonal bases and wj of two m-dimensional subspaces V and W,then the cosines of the canonical angles between V and W are the singular values of the matrix of inner products [(w)]ER"x.We remark that the sine of the largest canonical angle between the spaces V and W of equal dimension is given by (I-P)Pl [14.Theorem 4.37)
where \ðKk ; AKk Þ denotes the largest canonical angle between the spaces Kk and AKk . 1 The quantities ck and sk are given by the Givens rotations Gk of (2.1) L :¼ minfk: xMR k ¼ A1bg ¼ minfk: xLR k ¼ A1bg. Theorem 2.2 (Eiermann and Ernst, 2001 [4]). With sk ¼ sin \ðKk ; AKk Þ and ck ¼ cos \ðKk ; AKk Þ the MINRES residual and Lanczos residual approximations with respect to the Krylov subspaces Kk satisfy kr MR k k ¼ ckkr LR k k ð2:3Þ kr MR k k ¼ skkr MR k1k ¼ s1s2 ...skkr0k ð2:4Þ r MR k ¼ s 2 k r MR k1 þ c2 k r LR k ð2:5Þ In viewof sk ¼ krMR k k=krMR k1k, i.e., c2 k ¼ 1 krMR k k2 =krMR k1k2 we obtain the following theorem. Theorem 2.3 (Eiermann and Ernst, 2001 [4]). In exact arithmetic, if ck 6¼ 0 at each iteration k, then the Lanczos residuals and MINRES residuals are related by kr LR k k¼kr MR k k= ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 krMR k k 2 =krMR k1k 2 q ð2:6Þ Using the expression of sin \ðKk ; AKk Þ¼krMR k k=krMR k1k, we can rewrite (2.6) as follows: kr LR k k¼kr MR k k= ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 ðsin \ðKk ; AKk ÞÞ2 q ð2:7Þ (2.7) shows that if \ðKk ; AKk Þ is reduced by a significant factor at step k, then the Lanczos residual norm will be approximately equal to the MINRES residual norm at step k, since the denominator in the right-hand side of (2.7) will be close to 1. If the \ðKk ; AKk Þ remains almost p=2, however, then the denominator in the right-hand side of (2.7) is close to 0 and the Lanczos residual norm will be much larger. As shown before the behavior of the angles \ðKk ; AKk Þ as k approaches 1 play a crucial role in the convergence of the MR and LR approximates. If the angles actually tend to zeros rapidly, in viewof (2.4), implies superlinear convergence. Based on the Theorem 2.3 we can prove the following two propositions. 1 Given orthogonal bases fvjgm j¼1 and fwjgm j¼1 of two m-dimensional subspaces V and W, then the cosines of the canonical angles between V and W are the singular values of the matrix of inner products ½ðvj; wk Þ 2 Rmm. We remark that the sine of the largest canonical angle between the spaces V and W of equal dimension is given by kðI PvÞPwk [14, Theorem 4.37]. Y. Shen, J. Zhao / Appl. Math. Comput. 144 (2003) 441–455 445