附录C高性能计算-科学计算软件介绍 C.1科学计算发展 本节内容摘自Predictions for scientific computing50 years from now,byN.Trefethen in200 ·Before1940 -Newton's method -Gaussian elimination Gauss quadrature -Least-squares fitting -Adams and Runge-Kutta formulas -Richardson extrapolation ·1940-1970 -Floating point arithmetic -Fortran -Finite differences -Finite elements Monte Carlo -Orthogonal linear algebra -Splines -FFT ·1970-2000 -Quasi-Newton iterations Adaptivity -Stiff ODE solvers -Software libraries -MATΠIAB -Multigrid Sparse and iterative linear algebra -Spectral methods -Interior point method -Wavelets ·2000- 277
附录 C 高性能计算 – 科学计算软件介绍 C.1 科学计算发展 本节内容摘自 Predictions for scientific computing 50 years from now, by N. Trefethen in 2000. • Before 1940 – Newton’s method – Gaussian elimination – Gauss quadrature – Least-squares fitting – Adams and Runge-Kutta formulas – Richardson extrapolation • 1940 – 1970 – Floating point arithmetic – Fortran – Finite differences – Finite elements – Simplex algorithm – Monte Carlo – Orthogonal linear algebra – Splines – FFT • 1970 – 2000 – Quasi-Newton iterations – Adaptivity – Stiff ODE solvers – Software libraries – MATLAB – Multigrid – Sparse and iterative linear algebra – Spectral methods – Interior point method – Wavelets • 2000 – 277
.278. 附录C高性能计算-科学计算软件介绍 -Multipole methods -Breakthroughs in preconditioners,spectral methods,time stepping for PDE C.1.1数值分析经典论文 这里是Trefethen建议的I3篇论文阅读列表: Classic Papers in Numerical Analysis,by N.Trefethen,1993) (1)Cooley Tukey(1965)the Fast Fourier Transform James W.Cooley and John W.Tukey."An algorithm for the machine calculation of comple Fourier series, Mathematics of Computation,19(1965),297-301. (2)Courant,Friedrichs&Lewy(1928)finite difference methods for PDE R.Courant,K.O.Friedrichs and H.Lewy,"Ueber die partiellen Differenzengleichungen der mathema- tischen Physik,Mathematische Annalen,100(198),32-74.Translated as:"On the partial difference equations of mathematical physics,"IBM Journal of Resarch and Development,11(1967).215-234. (3)Houscholder(1958) QR factorization of matrices A.S.Householder,"Unitary triangularization of a nonsymmetric matrix,"Journal of the Association of Computing Machinery,5(1958),339-342. (4)Curtiss Hirschfelder(1952)stiffness of ODEs:BD formulas C.F Curtiss and J.O.Hirschfelder,"Integration of stiff equations,"Proceedings of the National Academy of Sciences.38(1952).235243. (5)de Boor(1972)calculations with B-splines C.de Boor,"On calculating with B-splines,"Journal of Approximation Theory.6(1972),50-62. (6)Courant(1943)finite element methods for PDE R.Courant,"Variational methods for the solution of problems of equilibrium and vibrations,"Bulletin of the American Mathematical Society,49(1943),1-23. (7)Golub&Kahan (1965)the singular value decomposition G.Golub and W.Kahan,"Calculating the singular values and pseudo-inverse of a matrix,"SIAM Journal on Numerical Analysis,(965),205-224. (8)Brandt(l977))→multigrid algorithms A.Brandt,"Multi-level adaptive solutionsto boundary-value problems,"Mathematics of Computation 31(1977).333-390. (9)Hestenes&Stiefel (195)the conjugate gradient iteration Magnus R.Hestenes and Eduard Stiefel,"Methods of conjugate gradients for solving linear systems,"Jour- nal of Research of the National Bureau of Standards,49(1952),409-436. (10)Fletcher Powell(1963)optimization via quasi-Newton updates R.Fletcher and M.J.D.Powell,"A rapidly convergent descent method for minimization,"Compute Journal,,6(1963),163-168. (11)Wanner,Hairer&Norset(1978)order stars and applications to ODE G.Wanner,E.Hairer and S.P.Norsett,"Order stars and stability theorems,"BIT,18(1974),475-489
· 278 · 附录 C 高性能计算 – 科学计算软件介绍 – Multipole methods – Breakthroughs in preconditioners, spectral methods, time stepping for PDE – · · · · · · C.1.1 数值分析经典论文 这里是 Trefethen 建议的 13 篇论文阅读列表: (见 Classic Papers in Numerical Analysis, by N. Trefethen, 1993) (1) Cooley & Tukey (1965) → the Fast Fourier Transform James W. Cooley and John W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Mathematics of Computation, 19 (1965), 297–301. (2) Courant, Friedrichs & Lewy (1928) → finite difference methods for PDE R. Courant, K. O. Friedrichs and H. Lewy, “Ueber die partiellen Differenzengleichungen der mathematischen Physik,” Mathematische Annalen, 100 (1928), 32–74. Translated as: “On the partial difference equations of mathematical physics,” IBM Journal of Resarch and Development, 11 (1967), 215–234. (3) Householder (1958) → QR factorization of matrices A. S. Householder, “Unitary triangularization of a nonsymmetric matrix,” Journal of the Association of Computing Machinery, 5 (1958), 339–342. (4) Curtiss & Hirschfelder (1952) → stiffness of ODEs; BD formulas C. F. Curtiss and J. O. Hirschfelder, “Integration of stiff equations,” Proceedings of the National Academy of Sciences, 38 (1952), 235–243. (5) de Boor (1972) → calculations with B-splines C. de Boor, “On calculating with B-splines,” Journal of Approximation Theory, 6 (1972), 50–62. (6) Courant (1943) → finite element methods for PDE R. Courant, “Variational methods for the solution of problems of equilibrium and vibrations,” Bulletin of the American Mathematical Society, 49 (1943), 1–23. (7) Golub & Kahan (1965) → the singular value decomposition G. Golub and W. Kahan, “Calculating the singular values and pseudo-inverse of a matrix,” SIAM Journal on Numerical Analysis, 2 (1965), 205–224. (8) Brandt (1977) → multigrid algorithms A. Brandt, “Multi-level adaptive solutions to boundary-value problems,” Mathematics of Computation, 31 (1977), 333–390. (9) Hestenes & Stiefel (1952) → the conjugate gradient iteration Magnus R. Hestenes and Eduard Stiefel, “Methods of conjugate gradients for solving linear systems,” Journal of Research of the National Bureau of Standards, 49 (1952), 409–436. (10) Fletcher & Powell (1963) → optimization via quasi-Newton updates R. Fletcher and M. J. D. Powell, “A rapidly convergent descent method for minimization,” Computer Journal, 6 (1963), 163–168. (11) Wanner, Hairer & Norsett (1978) → order stars and applications to ODE G. Wanner, E. Hairer and S. P. Norsett, “Order stars and stability theorems,” BIT, 18 (1974), 475–489
C1科学计算发展 ·279 (12)Karmarkar(1984)interior point methods for linear programming N.Karmarkar,A new polynomial-time algorithm for linear programming."Combinatorica,4(1984). 373-395. (13)Greengard Rokhlin(1987)multipole methods for particles L.Greengard and V.Rokhlin,"A fast algorithm for particle simulations,"Journal of Computational Physics, 72(1987),325-348. C.1.2 Longer list of papers 一个更长的阅读论文列表N.Trefethen,1993) LINEAR ALGEBRA-SYSTEMS OF EQUATIONS AND LEAST-SQUARES -Frankel (1950)optimal omega for SOR iteration Hestenes&Sticfel (1952)the conjugate gradient iteratior -Young (1954)theory of classical iterative methods -Householder(1958)QR decomposition -Wilkinson(1961)error analysis for systems of eqs -Golub(1965)least-squares problems Strassen(19)Gaussian eimination is not optimal George (1973)nested dissection Gill,Golub,Murray Saunders (1974)updating matrix factorizations -Concus,Golub&OLeary (197)preconditioned conjugate gradicents -Meijerink van der Vorst(1977)incomplete LU preconditioning Skeel(180)iterative refinement and stability Saad Schultz(1986)GMRES for nonsymmetric systems LINEAR ALGEBRA-EIGENVALUES AND SVD -Jacobi(1846)Jacobi's method for matrix cigenvalues -Henrici(1958)convergence of the Jacobi method -Rutishauser(1958)the LR algorithm -Kublanovskaya(1961)the QR algorithm -Francis(1961)the QR algorithm -Golub&Kahan(1965)computation of the SVD -Moler Stewart(1973)QZ algorithm for gen'd eigenvalues -Cuppen(1981)divide and conquer for eigenvalues ·OPTIMIZATION -Dantzig(1951)simplex method for linear programming -Davidon (1959)variable metric methods -Fletcher&Powell (1963)DFP quasi-Newton update formul -Broyden,Fletcher,Goldfarb Shanno(1970)BFGS quasi-Newton update formula -Karmarkar(1984)interior pt methods for linear prog
C.1 科学计算发展 · 279 · (12) Karmarkar (1984) → interior point methods for linear programming N. Karmarkar, “A new polynomial-time algorithm for linear programming,” Combinatorica, 4 (1984), 373–395. (13) Greengard & Rokhlin (1987) → multipole methods for particles L. Greengard and V. Rokhlin, “A fast algorithm for particle simulations,” Journal of Computational Physics, 72 (1987), 325–348. C.1.2 Longer list of papers 一个更长的阅读论文列表 (N. Trefethen, 1993) • LINEAR ALGEBRA – SYSTEMS OF EQUATIONS AND LEAST-SQUARES – Frankel (1950) optimal omega for SOR iteration – Hestenes & Stiefel (1952) the conjugate gradient iteration – Young (1954) theory of classical iterative methods – Householder (1958) QR decomposition – Wilkinson (1961) error analysis for systems of eqs. – Golub (1965) least-squares problems – Strassen (1969) Gaussian elimination is not optimal – George (1973) nested dissection – Gill, Golub, Murray & Saunders (1974) updating matrix factorizations – Concus, Golub & O’Leary (1976) preconditioned conjugate gradients – Meijerink & van der Vorst (1977) incomplete LU preconditioning – Skeel (1980) iterative refinement and stability – Saad & Schultz (1986) GMRES for nonsymmetric systems • LINEAR ALGEBRA - EIGENVALUES AND SVD – Jacobi (1846) Jacobi’s method for matrix eigenvalues – Henrici (1958) convergence of the Jacobi method – Rutishauser (1958) the LR algorithm – Kublanovskaya (1961) the QR algorithm – Francis (1961) the QR algorithm – Golub & Kahan (1965) computation of the SVD – Moler & Stewart (1973) QZ algorithm for gen’d eigenvalues – Cuppen (1981) divide and conquer for eigenvalues • OPTIMIZATION – Dantzig (1951) simplex method for linear programming – Davidon (1959) variable metric methods – Fletcher & Powell (1963) DFP quasi-Newton update formula – Broyden, Fletcher, Goldfarb & Shanno (1970) BFGS quasi-Newton update formula – Karmarkar (1984) interior pt methods for linear prog
·280· 附录C高性能计算-科学计算软件介绍 ·NTEGRATION -Golub Welsch (1969)Gauss quadrature rules -de Boor(1971)adaptive quadrature algorithms ·APPROXIMATION -Remes(1934)Remes algorithm for Chebyshev approx. -Schoenberg (1946)splines -Powell(1967)near-optimality of Chebyshev interp. -Reinsch(1967)smoothing with splines -Cox(1972)calculation with B-splines -de Boor(1972)calculation with B-splines ·OTHER -Aitken(193)Aitken extrapolation -Cooley Tukey (1965)the fast Fourier transform -Greengard&Rokhlin (1987)fast multipole methods ODEs -Curtiss Hirschfelder(1952)stiffness and BD formulas -Dahlquist (156)stability and convergence -Dahlquist(1963)A-stability -Butcher(1965)Runge-Kutta methods Gear(1969)stiff ODEs -Wanner,Hairer Norsett(1978)order stars and stability theorems ·ELLIPTIC PDEs -Peaceman Rachford (1955)ADI Douglas(1955)ADI -Strang(1971 or 1973)finite elements and approx.theory -Buzbee,Golub&Nielsen(1970)fast Poisson via cyclic reduction -Hockney(1965)fast Poisson via FFT -Fedorenko(1961)multigrid methods -Brandt(1977)multigrid methods PARABOLIC AND HYPERBOLIC PDEs -Courant,Friedrichs&Lewy (1928)the CFL condition -Crank Nicolson (1947)finite differences for parabolic PDE O'Brien,Hyman Kaplan(1951)Von Neumann stability analysis -Lax&Richtmyer(196)general stability theory -Lax Wendroff (1960,1962,1964)methods for solving conservation laws -Kreiss(1962)more general stability theory -Orszag (1971)spectral methods -Kreiss Oliger(1972)spectral methods
· 280 · 附录 C 高性能计算 – 科学计算软件介绍 • INTEGRATION – Golub & Welsch (1969) Gauss quadrature rules – de Boor (1971) adaptive quadrature algorithms • APPROXIMATION – Remes (1934) Remes algorithm for Chebyshev approx. – Schoenberg (1946) splines – Powell (1967) near-optimality of Chebyshev interp. – Reinsch (1967) smoothing with splines – Cox (1972) calculation with B-splines – de Boor (1972) calculation with B-splines • OTHER – Aitken (1932) Aitken extrapolation – Cooley & Tukey (1965) the fast Fourier transform – Greengard & Rokhlin (1987) fast multipole methods • ODEs – Curtiss & Hirschfelder (1952) stiffness and BD formulas – Dahlquist (1956) stability and convergence – Dahlquist (1963) A-stability – Butcher (1965) Runge-Kutta methods – Gear (1969) stiff ODEs – Wanner, Hairer & Norsett (1978) order stars and stability theorems • ELLIPTIC PDEs – Peaceman & Rachford (1955) ADI – Douglas (1955) ADI – Strang (1971 or 1973) finite elements and approx. theory – Buzbee, Golub & Nielsen (1970) fast Poisson via cyclic reduction – Hockney (1965) fast Poisson via FFT – Fedorenko (1961) multigrid methods – Brandt (1977) multigrid methods • PARABOLIC AND HYPERBOLIC PDEs – Courant, Friedrichs & Lewy (1928) the CFL condition – Crank & Nicolson (1947) finite differences for parabolic PDE – O’Brien, Hyman & Kaplan (1951) Von Neumann stability analysis – Lax & Richtmyer (1956) general stability theory – Lax & Wendroff (1960,1962,1964) methods for solving conservation laws – Kreiss (1962) more general stability theory – Orszag (1971) spectral methods – Kreiss & Oliger (1972) spectral methods
C.2矩阵运算的复杂度 ·281 -Gustafsson,Kreiss&Sundstrom(1972)stability of boundary conditions -Chorin(1973)vortex methods for CFD -Engquist Majda(1977)absorbing boundary conditions C.2矩阵运算的复杂度 算法的执行效率依赖于算法中所涉及的运算类型和运算次数。一般来说,数值算法所涉及的运算主 要是加减乘除,以及开根号运算.开根号运算依赖于具体的实现算法,一般最终也可以归结为加减乘除 运算,而且开根号运算次数往往比加减乘除运算具有更低的数量级.因此,评价算法的一个重要标准就 是加诚乘除运算的次数 一般情况下,加减运算次数与乘法运算次数具有相同的数量级,而除法运算次数往往比乘法运算具 有更低的数量级。 常见数值运算的计算量 0(n)量级 0(m2)量级 O(n3)量级 x←0x x←Ax B←-aAB y←y+ax u←-aAx+Bu C←aAB+BC s←x(内积 x←Alx(A三角矩阵 s←-lx2 A←A+axy(秩1修正) s←zl A←A+ay°+ayr*(对称秩2修正) 向量的交换,复制,比较 ↑为了提高程序执行效率,节约程序开发成本,建议尽可能地使用现有的高性能优秀程序库,如BLAS, LAPACK等 C.3数值线性代数程序库 BLAS,LAPACK,ARPACK是当前数值计算中优秀程序库的典型代表,许多科学计算软件都是基于 这些程序库,如著名的MATLAB. C.3.1 BLAS BLAS(Basic Linear Algebra Subprograms))是一组高质量的子程序,用于实现基本的向量和矩阵运算, 最初发布于1979年.在BLS中,所有的子程序都经过精心的优化以确保其高效性,同时,程序的书写 和语句的选择都很规范,以便于移植。 长期以来,专家们一直呼吁使用BLAS来建立和开发自己的线性代数软件包.这样做,不仅可以缩 短开发周期,而且便于形成统一的调用接口.在高性能计算领域,BLAS被广泛使用.合理的调用BLAS 子程序,可以大大提高程序的性能, 目前BLAS库有多种不同的优化实现,如BLAS,Goto BLAS,.ATLAS等.为提高性能,各软硬件厂商 都会对其产品的BLAS接口实现进行高度优化
C.2 矩阵运算的复杂度 · 281 · – Gustafsson, Kreiss & Sundstrom (1972) stability of boundary conditions – Chorin (1973) vortex methods for CFD – Engquist & Majda (1977) absorbing boundary conditions C.2 矩阵运算的复杂度 算法的执行效率依赖于算法中所涉及的运算类型和运算次数. 一般来说, 数值算法所涉及的运算主 要是加减乘除, 以及开根号运算. 开根号运算依赖于具体的实现算法, 一般最终也可以归结为加减乘除 运算, 而且开根号运算次数往往比加减乘除运算具有更低的数量级. 因此, 评价算法的一个重要标准就 是加减乘除运算的次数. 一般情况下, 加减运算次数与乘法运算次数具有相同的数量级, 而除法运算次数往往比乘法运算具 有更低的数量级. 常见数值运算的计算量: O(n) 量级 O(n 2 ) 量级 O(n 3 ) 量级 x ← αx x ← Ax B ← αAB y ← y + αx y ← αAx + βy C ← αAB + βC s ← y ∗x (内积) x ← A−1x (A 三角矩阵) s ← ∥x∥2 A ← A + αxy∗ (秩 1 修正) s ← ∥x∥1 A ← A + αxy∗ + αyx∗ (对称秩 2 修正) 向量的交换, 复制, 比较 † 为了提高程序执行效率, 节约程序开发成本, 建议尽可能地使用现有的高性能优秀程序库, 如BLAS, LAPACK 等. C.3 数值线性代数程序库 BLAS, LAPACK, ARPACK 是当前数值计算中优秀程序库的典型代表, 许多科学计算软件都是基于 这些程序库, 如著名的 MATLAB. C.3.1 BLAS BLAS (Basic Linear Algebra Subprograms) 是一组高质量的子程序, 用于实现基本的向量和矩阵运算, 最初发布于 1979 年. 在 BLAS 中, 所有的子程序都经过精心的优化以确保其高效性, 同时, 程序的书写 和语句的选择都很规范, 以便于移植. 长期以来, 专家们一直呼吁使用 BLAS 来建立和开发自己的线性代数软件包. 这样做, 不仅可以缩 短开发周期, 而且便于形成统一的调用接口. 在高性能计算领域, BLAS 被广泛使用. 合理的调用 BLAS 子程序, 可以大大提高程序的性能. 目前 BLAS 库有多种不同的优化实现, 如 BLAS, Goto BLAS, ATLAS 等. 为提高性能, 各软硬件厂商 都会对其产品的 BLAS 接口实现进行高度优化