15.4 General Linear Least Squares 671 15.4 General Linear Least Squares An immediate generalization of $15.2 is to fit a set of data points (xi,yi)to a model that is not just a linear combination of 1 and z(namely a+bz),but rather a linear combination of any M specified functions of x.For example,the functions could be 1,z2,...,M-1,in which case their general linear combination, y()=a1+a2z+a3z2+...+aMzM-1 (15.4.1) is a polynomial of degree M-1.Or,the functions could be sines and cosines,in which case their general linear combination is a harmonic series. The general form of this kind of model is M y()=>axXx(z) (15.4.2) k=1 ICAL where Xi(),...,XM()are arbitrary fixed functions of called the basis functions. Note that the functions X()can be wildly nonlinear functions of z.In this discussion"linear"refers only to the model's dependence on its parameters ak 之 9 For these linear models we generalize the discussion of the previous section by defining a merit function -∑1aXx( 73 (15.4.3) 9 =1 0 As before,o;is the measurement error (standard deviation)of the ith data point. presumed to be known.If the measurement errors are not known,they may all (as discussed at the end of $15.1)be set to the constant valueo=1. 61 Once again,we will pick as best parameters those that minimize x2.There are several different techniques available for finding this minimum.Two are particularly useful,and we will discuss both in this section.To introduce them and elucidate their relationship,we need some notation. Let A be a matrix whose N x M components are constructed from the M basis functions evaluated at the N abscissas zi,and from the N measurement errors oi,by the prescription Numerica 10621 4=) (15.4.4) 0 The matrix A is called the design matrix of the fitting problem.Notice that in general A has more rows than columns,N >M,since there must be more data points than model parameters to be solved for.(You can fit a straight line to two points,but not a very meaningful quintic!)The design matrix is shown schematically in Figure 15.4.1. Also define a vector b of length N by b=班 (15.4.5 and denote the M vector whose components are the parameters to be fitted, a1,...,aM,by a
15.4 General Linear Least Squares 671 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). 15.4 General Linear Least Squares An immediate generalization of §15.2 is to fit a set of data points (xi, yi) to a model that is not just a linear combination of 1 and x (namely a + bx), but rather a linear combination of any M specified functions of x. For example, the functions could be 1, x, x2,...,xM−1, in which case their general linear combination, y(x) = a1 + a2x + a3x2 + ··· + aM xM−1 (15.4.1) is a polynomial of degree M − 1. Or, the functions could be sines and cosines, in which case their general linear combination is a harmonic series. The general form of this kind of model is y(x) = M k=1 akXk(x) (15.4.2) where X1(x),...,XM(x) are arbitrary fixed functions of x, called the basis functions. Note that the functions Xk(x) can be wildly nonlinear functions of x. In this discussion “linear” refers only to the model’s dependence on its parameters a k. For these linear models we generalize the discussion of the previous section by defining a merit function χ2 = N i=1 yi − M k=1 akXk(xi) σi 2 (15.4.3) As before, σi is the measurement error (standard deviation) of the ith data point, presumed to be known. If the measurement errors are not known, they may all (as discussed at the end of §15.1) be set to the constant value σ = 1. Once again, we will pick as best parameters those that minimize χ2. There are several different techniques available for finding this minimum. Two are particularly useful, and we will discuss both in this section. To introduce them and elucidate their relationship, we need some notation. Let A be a matrix whose N × M components are constructed from the M basis functions evaluated at the N abscissas xi, and from the N measurement errors σi, by the prescription Aij = Xj (xi) σi (15.4.4) The matrix A is called the design matrix of the fitting problem. Notice that in general A has more rows than columns, N ≥M, since there must be more data points than model parameters to be solved for. (You can fit a straight line to two points, but not a very meaningful quintic!) The design matrix is shown schematically in Figure 15.4.1. Also define a vector b of length N by bi = yi σi (15.4.5) and denote the M vector whose components are the parameters to be fitted, a1,...,aM, by a.
672 Chapter 15.Modeling of Data basis functions- X()X()···X) 1 X(x1) X3(x) XM(x1) 01 01 01 X2 X(x2) X(x2) XM(x2) 02 02 61 Permission is read able files Sample page : .. Xi(N) X(IN) XM(XN) http://www.nr.com or call 1-800-872-7423(North America (including this one)to any server computer, granted for interet users to make one paper Copyright (C)1988-1992 by Cambridge University Press. from NUMERICAL RECIPES IN C: ON ON Figure 15.4.1.Design matrix for the least-squares fit of a linear combination of M basis functions to N data points.The matrix elements involve the basis functions evaluated at the values of the independent variable at which measurements are made,and the standard deviations of the measured dependent variable. 是 The measured values of the dependent variable do not enter the design matrix. Programs Solution by Use of the Normal Equations copy for their The minimum of(15.4.3)occurs where the derivative ofx2 with respect to all M parameters ak vanishes.Specializing equation(15.1.7)to the case of the model to dir Copyright(C) (15.4.2),this condition yields the M equations 1788-1982 THE ART OF SCIENTIFIC COMPUTING(ISBN 0-521 N 0= Xk(zi) k=1,M (15.4.6) Interchanging the order of summations,we can write(15.4.6)as the matrix equation .Further reproduction, Numerical Recipes -43108-5 M (15.4.7) (outside j=1 Software. where Amer N Xj(zi)Xk(zi) ying of machine visit website akj= or equivalently [a]=AT.A (15.4.8) i=1 0 an M x M matrix,and A=∑ w or equivalently [31=AT.b (15.4.9) =1
672 Chapter 15. Modeling of Data Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). X1(x1) σ1 x1 X2(x1) σ1 . . . XM(x1) σ1 X1( ) X2( ) . . . XM( ) X1(x2) σ2 x2 X2(x2) σ2 . . . XM(x2) σ2 . . . . . . . . . . . . . . . . . . . . . X1(xN) σN xN X2(xN) σN . . . XM(xN) σN data points basis functions Figure 15.4.1. Design matrix for the least-squares fit of a linear combination of M basis functions to N data points. The matrix elements involve the basis functions evaluated at the values of the independent variable at which measurements are made, and the standard deviations of the measured dependent variable. The measured values of the dependent variable do not enter the design matrix. Solution by Use of the Normal Equations The minimum of (15.4.3) occurs where the derivative of χ2 with respect to all M parameters ak vanishes. Specializing equation (15.1.7) to the case of the model (15.4.2), this condition yields the M equations 0 = N i=1 1 σ2 i yi − M j=1 ajXj (xi) Xk(xi) k = 1,...,M (15.4.6) Interchanging the order of summations, we can write (15.4.6) as the matrix equation M j=1 αkjaj = βk (15.4.7) where αkj = N i=1 Xj(xi)Xk(xi) σ2 i or equivalently [α] = AT · A (15.4.8) an M × M matrix, and βk = N i=1 yiXk(xi) σ2 i or equivalently [β] = AT · b (15.4.9)
15.4 General Linear Least Squares 673 a vector of length M. The equations (15.4.6)or(15.4.7)are called the normal equations of the least- squares problem.They can be solved for the vector of parameters a by the standard methods of Chapter 2,notably LU decomposition and backsubstitution,Choleksy decomposition,or Gauss-Jordan elimination.In matrix form,the normal equations can be written as either [al·a=[例 or as (AT.A)·a=AT.b (15.4.10) The inverse matrix C is closely related to the probable (or,more precisely,standard)uncertainties of the estimated parameters a.To estimate these uncertainties,consider that M ∑aR=∑C 02 (15.4.11) and that the variance associated with the estimate a;can be found as in (15.2.7)from RECIPES 令 2(a) ∑ (15.4.12) Press. Note that ajk is independent of yi,so that 9 M =CjkXx(1)/o? (15.4.13) 0班 IENTIFIC k=1 6 Consequently,we find that o2(aj) Xk(zi)XI(xi) 15.4.14) k=1= The final term in brackets is just the matrix [a].Since this is the matrix inverse of Numerical 105211 [C],(15.4.14)reduces immediately to 431 a2(a)=C7 (15.4.15) (outside Recipes In other words,the diagonal elements of [C]are the variances (squared North uncertainties)of the fitted parameters a.It should not surprise you to learn that the off-diagonal elements Cik are the covariances between a;and ak(cf.15.2.10);but we shall defer discussion of these to 815.6. We will now give a routine that implements the above formulas for the general linear least-squares problem,by the method of normal equations.Since we wish to compute not only the solution vector a but also the covariance matrix [C],it is most convenient to use Gauss-Jordan elimination(routine gaussj of 82.1)to perform the linear algebra.The operation count,in this application,is no larger than that for LU decomposition.If you have no need for the covariance matrix,however,you can save a factor of 3 on the linear algebra by switching to LU decomposition,without
15.4 General Linear Least Squares 673 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). a vector of length M. The equations (15.4.6) or (15.4.7) are called the normal equations of the leastsquares problem. They can be solved for the vector of parameters a by the standard methods of Chapter 2, notably LU decomposition and backsubstitution, Choleksy decomposition, or Gauss-Jordan elimination. In matrix form, the normal equations can be written as either [α] · a = [β] or as AT · A · a = AT · b (15.4.10) The inverse matrix Cjk ≡ [α] −1 jk is closely related to the probable (or, more precisely, standard) uncertainties of the estimated parameters a. To estimate these uncertainties, consider that aj = M k=1 [α] −1 jk βk = M k=1 Cjk N i=1 yiXk(xi) σ2 i (15.4.11) and that the variance associated with the estimate aj can be found as in (15.2.7) from σ2(aj ) = N i=1 σ2 i ∂aj ∂yi 2 (15.4.12) Note that αjk is independent of yi, so that ∂aj ∂yi = M k=1 CjkXk(xi)/σ2 i (15.4.13) Consequently, we find that σ2(aj ) = M k=1 M l=1 CjkCjl N i=1 Xk(xi)Xl(xi) σ2 i (15.4.14) The final term in brackets is just the matrix [α]. Since this is the matrix inverse of [C], (15.4.14) reduces immediately to σ2(aj ) = Cjj (15.4.15) In other words, the diagonal elements of [C] are the variances (squared uncertainties) of the fitted parameters a. It should not surprise you to learn that the off-diagonal elements Cjk are the covariances between aj and ak (cf. 15.2.10); but we shall defer discussion of these to §15.6. We will now give a routine that implements the above formulas for the general linear least-squares problem, by the method of normal equations. Since we wish to compute not only the solution vector a but also the covariance matrix [C], it is most convenient to use Gauss-Jordan elimination (routine gaussj of §2.1) to perform the linear algebra. The operation count, in this application, is no larger than that for LU decomposition. If you have no need for the covariance matrix, however, you can save a factor of 3 on the linear algebra by switching to LU decomposition, without
674 Chapter 15.Modeling of Data computation of the matrix inverse.In theory,since AT.A is positive definite, Cholesky decomposition is the most efficient way to solve the normal equations. However,in practice most of the computing time is spent in looping over the data to form the equations,and Gauss-Jordan is quite adequate. We need to warn you that the solution of a least-squares problem directly from the normal equations is rather susceptible to roundoff error.An alternative,and preferred,technique involves QR decomposition (82.10,$11.3,and $11.6)of the design matrix A.This is essentially what we did at the end of 815.2 for fitting data to a straight line,but without invoking all the machinery of OR to derive the necessary formulas.Later in this section,we will discuss other difficulties in the least-squares problem,for which the cure is singular value decomposition(SVD),of which we give an implementation.It turns out that SVD also fixes the roundoff problem,so it is our recommended technique for all but"easyleast-squares problems.It is for these easy g problems that the following routine,which solves the normal equations,is intended The routine below introduces one bookkeeping trick that is quite useful in practical work.Frequently it is a matter of"art"to decide which parameters ak in a model should be fit from the data set,and which should be held constant at fixed values,for example values predicted by a theory or measured in a previous experiment.One wants,therefore,to have a convenient means for "freezing" and"unfreezing"the parameters ak.In the following routine the total number of THE parameters ak is denoted ma(called M above).As input to the routine,you supply ART an array ia[1..ma],whose components are either zero or nonzero (e.g..1).Zeros indicate that you want the corresponding elements of the parameter vector a [1..ma] Programs to be held fixed at their input values.Nonzeros indicate parameters that should be fitted for.On output,any frozen parameters will have their variances,and all their covariances,set to zero in the covariance matrix to dir #include "nrutil.h" void lfit(f1oatx[],f1oaty▣,f1 oat sig0,int ndat,f1oata▣,int ia0, OF SCIENTIFIC COMPUTING(ISBN int ma,float **covar,float *chisq, void (*funcs)(float,float []int)) 1988-19920 Given a set of data points x[1..ndat],y[1..ndat]with individual standard deviations sig[1..ndat],use x minimization to fit for some or all of the coefficients a[1..ma]of a function that depends linearly on a,y=>;ai x afunci(x).The input array ia[1..ma] indicates by nonzero entries those components of a that should be fitted for,and by zero entries those components that should be held fixed at their input values.The program returns values for a[1..ma],x2=chisq,and the covariance matrix covar [1..ma][1..ma].(Parameters Numerical Recipes 10-621 43108 held fixed will return zero covariances.The user supplies a routine funcs(x,afunc,ma)that returns the ma basis functions evaluated at =x in the array afunc[1..ma]. (outside void covsrt(float **covar,int ma,int ia[],int mfit); North Software. void gaussj(float **a,int n,float **b,int m); 1nt1,j,k,1,m,mf1t=0; float ym,wt,sum,sig2i,**beta,*afunc; Ame beta=matrix(1,ma,1,1); afunc-vector(1,ma); for (j=1;j<=ma;j++) if (ia[j])mfit++; if (mfit ==0)nrerror("lfit:no parameters to be fitted"); for (j=1;j<=mfit;j++) Initialize the (symmetric)matrix. for (k=1;k<=mfit;k++)covar [j][k]=0.0; beta[j][1]=0.0; 2 for (i=1;i<=ndat;i++){ Loop over data to accumulate coefficients of the normal equations
674 Chapter 15. Modeling of Data Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). computation of the matrix inverse. In theory, since AT · A is positive definite, Cholesky decomposition is the most efficient way to solve the normal equations. However, in practice most of the computing time is spent in looping over the data to form the equations, and Gauss-Jordan is quite adequate. We need to warn you that the solution of a least-squares problem directly from the normal equations is rather susceptible to roundoff error. An alternative, and preferred, technique involves QR decomposition (§2.10, §11.3, and §11.6) of the design matrix A. This is essentially what we did at the end of §15.2 for fitting data to a straight line, but without invoking all the machinery of QR to derive the necessary formulas. Later in this section, we will discuss other difficulties in the least-squares problem, for which the cure issingular value decomposition (SVD), of which we give an implementation. It turns out that SVD also fixes the roundoff problem, so it is our recommended technique for all but “easy” least-squares problems. It is for these easy problems that the following routine, which solves the normal equations, is intended. The routine below introduces one bookkeeping trick that is quite useful in practical work. Frequently it is a matter of “art” to decide which parameters a k in a model should be fit from the data set, and which should be held constant at fixed values, for example values predicted by a theory or measured in a previous experiment. One wants, therefore, to have a convenient means for “freezing” and “unfreezing” the parameters ak. In the following routine the total number of parameters ak is denoted ma (called M above). As input to the routine, you supply an array ia[1..ma], whose components are either zero or nonzero (e.g., 1). Zeros indicate that you want the corresponding elements of the parameter vector a[1..ma] to be held fixed at their input values. Nonzeros indicate parameters that should be fitted for. On output, any frozen parameters will have their variances, and all their covariances, set to zero in the covariance matrix. #include "nrutil.h" void lfit(float x[], float y[], float sig[], int ndat, float a[], int ia[], int ma, float **covar, float *chisq, void (*funcs)(float, float [], int)) Given a set of data points x[1..ndat], y[1..ndat] with individual standard deviations sig[1..ndat], use χ2 minimization to fit for some or all of the coefficients a[1..ma] of a function that depends linearly on a, y = i ai × afunci(x). The input array ia[1..ma] indicates by nonzero entries those components of a that should be fitted for, and by zero entries those components that should be held fixed at their input values. The program returns values for a[1..ma], χ2 = chisq, and the covariance matrix covar[1..ma][1..ma]. (Parameters held fixed will return zero covariances.)The user supplies a routine funcs(x,afunc,ma) that returns the ma basis functions evaluated at x = x in the array afunc[1..ma]. { void covsrt(float **covar, int ma, int ia[], int mfit); void gaussj(float **a, int n, float **b, int m); int i,j,k,l,m,mfit=0; float ym,wt,sum,sig2i,**beta,*afunc; beta=matrix(1,ma,1,1); afunc=vector(1,ma); for (j=1;j<=ma;j++) if (ia[j]) mfit++; if (mfit == 0) nrerror("lfit: no parameters to be fitted"); for (j=1;j<=mfit;j++) { Initialize the (symmetric)matrix. for (k=1;k<=mfit;k++) covar[j][k]=0.0; beta[j][1]=0.0; } for (i=1;i<=ndat;i++) { Loop over data to accumulate coefficients of the normal equations.
15.4 General Linear Least Squares 675 (*funcs)(x[i],afunc,ma); ym=y[i]: if (mfit ma){ Subtract off dependences on known pieces for (j=1;j<=ma;j++) of the fitting function. if (!ia[j])ym -a[j]*afunc[j]; sig2i=1.0/SQR(sig[i]); for(j=0,1=1;1<ma;1+)( if(ia[1]){ wt=afunc[l]*sig2i; for(j+,k=0,m=1;m<=1;m+) if (ia[m])covar[j][++k]+wt*afunc[m] http://www.nr Permission is read able files beta[j门[1]+=ym*wt: 83g granted for 19881992 for (j=2;j<=mfit;j++) Fill in above the diagonal from symmetry. for (k=1;k<j;k++) 1-.200 covar [k][j]=covar[j][k]; gaussj(covar,mfit,beta,1); Matrix solution. for(j=0,1=1;1<ema;1++) if(ia[1])a[1]=beta[+j][1]; Partition solution to appropriate coefficients from NUMERICAL RECIPES IN *chisq=0.0; for (i=1;i<=ndat;i++){ Evaluate x2 of the fit. (*funcs)(x[i],afunc,ma); (North to make for (sum=0.0,j=1;j<=ma;j++)sum +a[j]*afunc[j]; *chisq +SQR((y[i]-sum)/sig[i]); America 州bMe se one paper UnN电.t THE ART covsrt(covar,ma,ia,mfit); Sort covariance matrix to true order of fitting free_vector(afunc,1,ma); coefficients. free_matrix(beta,1,ma,1,1); Programs That last call to a function covsrt is only for the purpose of spreading the covariances back into the full ma x ma covariance matrix,in the proper rows and to dir columns and with zero variances and covariances set for variables which were held frozen. The function covsrt is as follows. 1788-1982 OF SCIENTIFIC COMPUTING(ISBN v@cam #define SWAP(a,b){swap=(a);(a)=(b);(b)=swap; Numerical Recipes 10-621 void covsrt(float **covar,int ma,int ia[],int mfit) 43108 Expand in storage the covariance matrix covar,so as to take into account parameters that are being held fixed.(For the latter,return zero covariances.) int i,j,k; (outside 膜 float swap; Software. for (ismfit+1;i<=ma;i++) ying of for (j=1;j<=i;j++)covar [i][j]=covar[j][i]=0.0; k=mfit; for(j=ma;j>=1;j--)[ if(1a[j]){ for (i=1;i<=ma;i++)SWAP(covar[i][k],covar[i][j]) for (i=1;i<=ma;i++)SWAP(covar[k][i],covar[j][i]) 22
15.4 General Linear Least Squares 675 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). (*funcs)(x[i],afunc,ma); ym=y[i]; if (mfit < ma) { Subtract off dependences on known pieces for (j=1;j<=ma;j++) of the fitting function. if (!ia[j]) ym -= a[j]*afunc[j]; } sig2i=1.0/SQR(sig[i]); for (j=0,l=1;l<=ma;l++) { if (ia[l]) { wt=afunc[l]*sig2i; for (j++,k=0,m=1;m<=l;m++) if (ia[m]) covar[j][++k] += wt*afunc[m]; beta[j][1] += ym*wt; } } } for (j=2;j<=mfit;j++) Fill in above the diagonal from symmetry. for (k=1;k<j;k++) covar[k][j]=covar[j][k]; gaussj(covar,mfit,beta,1); Matrix solution. for (j=0,l=1;l<=ma;l++) if (ia[l]) a[l]=beta[++j][1]; Partition solution to appropriate coefficients *chisq=0.0; a. for (i=1;i<=ndat;i++) { Evaluate χ2 of the fit. (*funcs)(x[i],afunc,ma); for (sum=0.0,j=1;j<=ma;j++) sum += a[j]*afunc[j]; *chisq += SQR((y[i]-sum)/sig[i]); } covsrt(covar,ma,ia,mfit); Sort covariance matrix to true order of fitting free_vector(afunc,1,ma); coefficients. free_matrix(beta,1,ma,1,1); } That last call to a function covsrt is only for the purpose of spreading the covariances back into the full ma × ma covariance matrix, in the proper rows and columns and with zero variances and covariances set for variables which were held frozen. The function covsrt is as follows. #define SWAP(a,b) {swap=(a);(a)=(b);(b)=swap;} void covsrt(float **covar, int ma, int ia[], int mfit) Expand in storage the covariance matrix covar, so as to take into account parameters that are being held fixed. (For the latter, return zero covariances.) { int i,j,k; float swap; for (i=mfit+1;i<=ma;i++) for (j=1;j<=i;j++) covar[i][j]=covar[j][i]=0.0; k=mfit; for (j=ma;j>=1;j--) { if (ia[j]) { for (i=1;i<=ma;i++) SWAP(covar[i][k],covar[i][j]) for (i=1;i<=ma;i++) SWAP(covar[k][i],covar[j][i]) k--; } } }