《数字信号处理》教学参考资料（Numerical Recipes in C，The Art of Scientific Computing Second Edition）Chapter 20.6.pdf

20.6 Arithmetic at Arbitrary Precision 915 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). CITED REFERENCES AND FURTHER READING: Bell, T.C., Cleary, J.G., and Witten, I.H. 1990, Text Compression (Englewood Cliffs, NJ: PrenticeHall). Nelson, M. 1991, The Data Compression Book (Redwood City, CA: M&T Books). Witten, I.H., Neal, R.M., and Cleary, J.G. 1987, Communications of the ACM, vol. 30, pp. 520– 540. [1] 20.6 Arithmetic at Arbitrary Precision Let’s compute the number π to a couple of thousand decimal places. In doing so, we’ll learn some things about multiple precision arithmetic on computers and meet quite an unusual application of the fast Fourier transform (FFT). We’ll also develop a set of routines that you can use for other calculations at any desired level of arithmetic precision. To start with, we need an analytic algorithm for π. Useful algorithms are quadratically convergent, i.e., they double the number of significant digits at each iteration. Quadratically convergent algorithms for π are based on the AGM (arithmetic geometric mean) method, which also finds application to the calculation of elliptic integrals (cf. §6.11) and in advanced implementations of the ADI method for elliptic partial differential equations (§19.5). Borwein and Borwein [1] treat this subject, which is beyond our scope here. One of their algorithms for π starts with the initializations X0 = √ 2 π0 =2+ √ 2 Y0 = √4 2 (20.6.1) and then, for i = 0, 1,... , repeats the iteration Xi+1 = 1 2 Xi + 1 √Xi πi+1 = πi Xi+1 + 1 Yi + 1 Yi+1 = Yi Xi+1 + 1 Xi+1 Yi + 1 (20.6.2) The value π emerges as the limit π∞. Now, to the question of how to do arithmetic to arbitrary precision: In a high-level language like C, a natural choice is to work in radix (base) 256, so that character arrays can be directly interpreted as strings of digits. At the very end of our calculation, we will want to convert our answer to radix 10, but that is essentially a frill for the benefit of human ears, accustomed to the familiar chant, “three point

916 Chapter 20.Less-Numerical Algorithms one four one five nine...."For any less frivolous calculation,we would likely never leave base 256(or the thence trivially reachable hexadecimal,octal,or binary bases). We will adopt the convention of storing digit strings in the "human"ordering, that is,with the first stored digit in an array being most significant,the last stored digit being least significant.The opposite convention would,of course,also be possible. "Carries."where we need to partition a number larger than 255 into a low-order byte and a high-order carry,present a minor programming annoyance,solved,in the routines below,by the use of the macros LOBYTE and HIBYTE. It is easy at this point,following Knuth [2].to write a routine for the "fast" arithmetic operations:short addition (adding a single byte to a string),addition, 8 subtraction,short multiplication (multiplying a string by a single byte),short division,ones-complement negation;and a couple of utility operations,copying and left-shifting strings.(On the diskette,these functions are all in the single file mpops.c.) #define LOBYTE(x)((unsigned char)((x)&Oxff)) #define HIBYTE(x)((unsigned char)((x)>>8&Oxff)) Multiple precision arithmetic operations done on character strings,interpreted as radix 256 令 numbers. This set of routines collects the simpler operations. void mpadd(unsigned char w[],unsigned char u[],unsigned char v[],int n) Press. Adds the unsigned radix 256 integers u[1..n]and v[1..n]yielding the unsigned integer ART w[1..n+1] 9 Programs int j; unsigned short ireg=0; for(j=n;j>=1;j--){ ireg-u[j]+v[j]+HIBYTE(ireg); w[j+1]=LOBYTE(ireg); to dir w[1]=H肛BYTE(1reg); 2 19881992 OF SCIENTIFIC COMPUTING(ISBN void mpsub(int is,unsigned char w[],unsigned char u[],unsigned char v], int n) v@cam Subtracts the unsigned radix 256 integer v[1..n]from u[1..n]yielding the unsigned integer Numerical Recipes 10-621 w[1..n].If the result is negative (wraps around),is is returned as-1;otherwise it is returned as 0. 43108 int j; unsigned short ireg=256; (outside 膜 for (j=n;j>=1;j--){ ireg=255+u[i]-v[i]+HIBYTE(ireg); North Software. w[j]=LOBYTE(ireg); *is=HIBYTE(ireg)-1; void mpsad(unsigned char w[],unsigned char u[],int n,int iv) Short addition:the integer iv (in the range 0<iv 255)is added to the unsigned radix 256 integer u[1..n],yielding w[1..n+1]. int j; unsigned short ireg; 1reg=256*1v;

916 Chapter 20. Less-Numerical Algorithms Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). one four one five nine... .” For any less frivolous calculation, we would likely never leave base 256 (or the thence trivially reachable hexadecimal, octal, or binary bases). We will adopt the convention of storing digit strings in the “human” ordering, that is, with the first stored digit in an array being most significant, the last stored digit being least significant. The opposite convention would, of course, also be possible. “Carries,” where we need to partition a number larger than 255 into a low-order byte and a high-order carry, present a minor programming annoyance, solved, in the routines below, by the use of the macros LOBYTE and HIBYTE. It is easy at this point, following Knuth [2], to write a routine for the “fast” arithmetic operations: short addition (adding a single byte to a string), addition, subtraction, short multiplication (multiplying a string by a single byte), short division, ones-complement negation; and a couple of utility operations, copying and left-shifting strings. (On the diskette, these functions are all in the single file mpops.c.) #define LOBYTE(x) ((unsigned char) ((x) & 0xff)) #define HIBYTE(x) ((unsigned char) ((x) >> 8 & 0xff)) Multiple precision arithmetic operations done on character strings, interpreted as radix 256 numbers. This set of routines collects the simpler operations. void mpadd(unsigned char w[], unsigned char u[], unsigned char v[], int n) Adds the unsigned radix 256 integers u[1..n] and v[1..n] yielding the unsigned integer w[1..n+1]. { int j; unsigned short ireg=0; for (j=n;j>=1;j--) { ireg=u[j]+v[j]+HIBYTE(ireg); w[j+1]=LOBYTE(ireg); } w[1]=HIBYTE(ireg); } void mpsub(int *is, unsigned char w[], unsigned char u[], unsigned char v[], int n) Subtracts the unsigned radix 256 integer v[1..n] from u[1..n] yielding the unsigned integer w[1..n]. If the result is negative (wraps around), is is returned as −1; otherwise it is returned as 0. { int j; unsigned short ireg=256; for (j=n;j>=1;j--) { ireg=255+u[j]-v[j]+HIBYTE(ireg); w[j]=LOBYTE(ireg); } *is=HIBYTE(ireg)-1; } void mpsad(unsigned char w[], unsigned char u[], int n, int iv) Short addition: the integer iv (in the range 0 ≤ iv ≤ 255) is added to the unsigned radix 256 integer u[1..n], yielding w[1..n+1]. { int j; unsigned short ireg; ireg=256*iv;

918 Chapter 20.Less-Numerical Algorithms Full multiplication of two digit strings,if done by the traditional hand method. is not a fast operation:In multiplying two strings of length N,the multiplicand would be short-multiplied in turn by each byte of the multiplier,requiring O(N2) operations in all.We will see,however,that all the arithmetic operations on numbers of length N can in fact be done in O(N x log N x log log N)operations. The trick is to recognize that multiplication is essentially a comolution(813.1) of the digits of the multiplicand and multiplier,followed by some kind of carry operation.Consider,for example,two ways of writing the calculation 456 x 789: 456 456 ×789 ×789 4104 364554 3648 324048 3192 359784 283542 28671189354 359784 The tableau on the left shows the conventional method of multiplication,in which three separate short multiplications of the full multiplicand(by 9,8,and 7)are added to obtain the final result.The tableau on the right shows a different method (sometimes taught for mental arithmetic),where the single-digit cross products are 9」 all computed (e.g.8 x 6 =48).then added in columns to obtain an incompletely carried result (here,the list 28,67,118,93,54).The final step is a single pass from OF SCIENTIFIC right to left,recording the single least-significant digit and carrying the higher digit or digits into the total to the left (e.g.93+5=98,record the 8,carry 9). You can see immediately that the column sums in the right-hand method are components of the convolution of the digit strings,for example 118=4 x 9+5x 8+6 x 7.In 813.1 we learned how to compute the convolution of two vectors by the fast Fourier transform(FFT):Each vector is FFT'd,the two complex transforms 9 are multiplied,and the result is inverse-FFT'd.Since the transforms are done with floating arithmetic,we need sufficient precision so that the exact integer value of each component of the result is discernible in the presence of roundoff error.We Recipes Numerica 10621 should therefore allow a(conservative)few times log2(log2 N)bits for roundoff 431 in the FFT.A number of length N bytes in radix 256 can generate convolution Recipes components as large as the order of(256)2N,thus requiring 16+log2 N bits of 腿 precision for exact storage.If it is the number of bits in the floating mantissa (cf.$20.1),we obtain the condition North 16+log2 N+few x log2 log2 N<it (20.6.3) We see that single precision,say with it =24,is inadequate for any interesting value of N,while double precision,say with it =53,allows N to be greater than 106,corresponding to some millions of decimal digits.The following routine therefore presumes double precision versions of realft (812.3)and four1(812.2). here called drealft and dfour1.(These routines are included on the Numerical Recipes diskettes.)

918 Chapter 20. Less-Numerical Algorithms Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Full multiplication of two digit strings, if done by the traditional hand method, is not a fast operation: In multiplying two strings of length N, the multiplicand would be short-multiplied in turn by each byte of the multiplier, requiring O(N 2) operations in all. We will see, however, that all the arithmetic operations on numbers of length N can in fact be done in O(N × log N × log log N) operations. The trick is to recognize that multiplication is essentially a convolution (§13.1) of the digits of the multiplicand and multiplier, followed by some kind of carry operation. Consider, for example, two ways of writing the calculation 456 × 789: 456 × 789 4104 3648 3192 359784 456 × 789 36 45 54 32 40 48 28 35 42 28 67 118 93 54 359 784 The tableau on the left shows the conventional method of multiplication, in which three separate short multiplications of the full multiplicand (by 9, 8, and 7) are added to obtain the final result. The tableau on the right shows a different method (sometimes taught for mental arithmetic), where the single-digit cross products are all computed (e.g. 8 × 6 = 48), then added in columns to obtain an incompletely carried result (here, the list 28, 67, 118, 93, 54). The final step is a single pass from right to left, recording the single least-significant digit and carrying the higher digit or digits into the total to the left (e.g. 93 + 5 = 98, record the 8, carry 9). You can see immediately that the column sums in the right-hand method are components of the convolution of the digit strings, for example 118 = 4 × 9+5 × 8+6 × 7. In §13.1 we learned how to compute the convolution of two vectors by the fast Fourier transform (FFT): Each vector is FFT’d, the two complex transforms are multiplied, and the result is inverse-FFT’d. Since the transforms are done with floating arithmetic, we need sufficient precision so that the exact integer value of each component of the result is discernible in the presence of roundoff error. We should therefore allow a (conservative) few times log 2(log2 N) bits for roundoff in the FFT. A number of length N bytes in radix 256 can generate convolution components as large as the order of (256)2N, thus requiring 16 + log2 N bits of precision for exact storage. If it is the number of bits in the floating mantissa (cf. §20.1), we obtain the condition 16 + log2 N + few × log2 log2 N < it (20.6.3) We see that single precision, say with it = 24, is inadequate for any interesting value of N, while double precision, say with it = 53, allows N to be greater than 106, corresponding to some millions of decimal digits. The following routine therefore presumes double precision versions of realft (§12.3) and four1 (§12.2), here called drealft and dfour1. (These routines are included on the Numerical Recipes diskettes.)

20.6 Arithmetic at Arbitrary Precision 919 #include "nrutil.h" #define RX 256.0 void mpmul(unsigned char w[],unsigned char u[],unsigned char v[],int n, int m) Uses Fast Fourier Transform to multiply the unsigned radix 256 integers u [1..n]and v[1..m], yielding a product w[1..n+m]. void drealft(double data[],unsigned long n,int isign); double version of realft. int j,mn,nn=1; double cy,t,*a,*b; mn=IMAX (m,n); while (nn mn)nn <<1; Find the smallest usable power of two for the trans- nn<<=1; form. a=dvector(1,nn); 19881992 b=dvector(1,nn); for (j=1;j<=n;j++) Move U to a double precision floating array. a[j]=(double)u[j]; for(j=n+1;j<nn;j++)a[j]=0.0; for （j=1;j<=m;j+） Move V to a double precision floating array. from NUMERICAL RECIPES I b[j]=(double)v[]; for(j=m+1;j<=nn;j++)b[j]=0.0; drealft(a,nn,1); Perform the convolution:First,the two Fourier trans- (Nort server drealft(b,nn,1); forms. b[1]*=a[1]: Then multiply the complex results (real and imagi- b[2]*=a[2]; nary parts). Americ computer, for(j=3;j<=nn;j+=2){ ART b[j]=(t=b[j])*a[j]-b[j+1]*a[j+1]; b[j+1]=t*a[j+1]+b[j+1]*a[j]; Programs drealft(b,nn,-1); Then do the inverse Fourier transform. cy=0.0; Make a final pass to do all the carries. for (i=nn;i>=1;--){ 1CIYP ic t=b[]/(nn>>1)+cy+0.5; The 0.5 allows for roundoff error. cy=(unsigned long)(t/RX); to dir b[j]=t-cy*RX; if (cy >=RX)nrerror("cannot happen in fftmul"); OF SCIENTIFIC COMPUTING(ISBN w[1]=(unsigned char)cy; Copy answer to output for (j=2;j<=n+m;j++) w[j]=(unsigned char)b[j-1]; free_dvector(b,1,nn); v@cambri 1988-1992 by Numerical Recipes 10-:6211 free_dvector(a,1,nn); 43108 With multiplication thus a "fast"operation,division is best performed by (outside multiplying the dividend by the reciprocal of the divisor.The reciprocal of a value V is calculated by iteration of Newton's rule, North Software. U:+1=U(2-VU) (20.6.4) visit website machine which results in the quadratic convergence of 0 to 1/V,as you can easily prove.(Many supercomputers and RISC machines actually use this iteration to perform divisions.)We can now see where the operations count Nlog N log log N. mentioned above,originates:Nlog N is in the Fourier transform,with the iteration to converge Newton's rule giving an additional factor of log log N

20.6 Arithmetic at Arbitrary Precision 919 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). #include "nrutil.h" #define RX 256.0 void mpmul(unsigned char w[], unsigned char u[], unsigned char v[], int n, int m) Uses Fast Fourier Transform to multiply the unsigned radix 256 integers u[1..n] and v[1..m], yielding a product w[1..n+m]. { void drealft(double data[], unsigned long n, int isign); double version of realft. int j,mn,nn=1; double cy,t,*a,*b; mn=IMAX(m,n); while (nn < mn) nn <<= 1; Find the smallest usable power of two for the transnn <<= 1; form. a=dvector(1,nn); b=dvector(1,nn); for (j=1;j<=n;j++) Move U to a double precision floating array. a[j]=(double)u[j]; for (j=n+1;j<=nn;j++) a[j]=0.0; for (j=1;j<=m;j++) Move V to a double precision floating array. b[j]=(double)v[j]; for (j=m+1;j<=nn;j++) b[j]=0.0; drealft(a,nn,1); Perform the convolution: First, the two Fourier transdrealft(b,nn,1); forms. b[1] *= a[1]; Then multiply the complex results (real and imagib[2] *= a[2]; nary parts). for (j=3;j<=nn;j+=2) { b[j]=(t=b[j])*a[j]-b[j+1]*a[j+1]; b[j+1]=t*a[j+1]+b[j+1]*a[j]; } drealft(b,nn,-1); Then do the inverse Fourier transform. cy=0.0; Make a final pass to do all the carries. for (j=nn;j>=1;j--) { t=b[j]/(nn>>1)+cy+0.5; The 0.5 allows for roundoff error. cy=(unsigned long) (t/RX); b[j]=t-cy*RX; } if (cy >= RX) nrerror("cannot happen in fftmul"); w[1]=(unsigned char) cy; Copy answer to output. for (j=2;j<=n+m;j++) w[j]=(unsigned char) b[j-1]; free_dvector(b,1,nn); free_dvector(a,1,nn); } With multiplication thus a “fast” operation, division is best performed by multiplying the dividend by the reciprocal of the divisor. The reciprocal of a value V is calculated by iteration of Newton’s rule, Ui+1 = Ui(2 − V Ui) (20.6.4) which results in the quadratic convergence of U∞ to 1/V , as you can easily prove. (Many supercomputers and RISC machines actually use this iteration to perform divisions.) We can now see where the operations count N log N log log N, mentioned above, originates: N log N is in the Fourier transform, with the iteration to converge Newton’s rule giving an additional factor of log log N