316 Chapter 7.Random Numbers 7.8 Adaptive and Recursive Monte Carlo Methods This section discusses more advanced techniques of Monte Carlo integration.As examples of the use of these techniques,we include two rather different,fairly sophisticated, multidimensional Monte Carlo codes:vegas[1,2],and miser [3].The techniques that we discuss all fall under the general rubric of reduction of variance ($7.6),but are otherwise quite distinct. 三 Importance Sampling The use of importance sampling was already implicit in equations (7.6.6)and (7.6.7). We now return to it in a slightly more formal way.Suppose that an integrand f can be written as the product of a function h that is almost constant times another,positive,function g.Then its integral over a multidimensional volume V is fdv =(f/g)gdv hgdv (7.8.1) In equation(7.6.7)we interpreted equation (7.8.1)as suggesting a change of variable to G,the indefinite integral of g.That made gdV a perfect differential.We then proceeded to use the basic theorem of Monte Carlo integration,equation (7.6.1).A more general ad出 Press. THE interpretation of equation (7.8.1)is that we can integrate f by instead sampling h not, however,with uniform probability density dV,but rather with nonuniform density gdV.In this second interpretation,the first interpretation follows as the special case,where the means of generating the nonuniform sampling of gdv is via the transformation method,using the Programs indefinite integral G(see $7.2). More directly,one can go back and generalize the basic theorem (7.6.1)to the case of nonuniform sampling:Suppose that points z are chosen within the volume V with a probability density p satisfying 6 pdV=1 (7.8.2) 1C% The generalized fundamental theorem is that the integral of any function f is estimated,using N sample points 1,...,N,by f2/p2)-U/p)2 Numerical Recipes 10-621 I=fav= (7.8.3) uction. 43108 where angle brackets denote arithmetic means over the N points,exactly as in equation (7.6.2).As in equation (7.6.1),the "plus-or-minus"term is a one standard deviation error estimate.Notice that equation (7.6.1)is in fact the special case of equation (7.8.3),with (outside p constant 1/V. What is the best choice for the sampling density p?Intuitively,we have already seen Software. that the idea is to make h=f/p as close to constant as possible.We can be more rigorous by focusing on the numerator inside the square root in equation (7.8.3),which is the variance per sample point.Both angle brackets are themselves Monte Carlo estimators of integrals, so we can write s=(〉-(}≈∫sw-V5'=5w-ra (7.84) We now find the optimal p subject to the constraint equation (7.8.2)by the functiona ariation o-(Ew-ffrav]'+xfoa (7.8.5)
316 Chapter 7. Random Numbers Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). 7.8 Adaptive and Recursive Monte Carlo Methods This section discusses more advanced techniques of Monte Carlo integration. As examples of the use of these techniques, we include two rather different, fairly sophisticated, multidimensional Monte Carlo codes: vegas [1,2], and miser [3]. The techniques that we discuss all fall under the general rubric of reduction of variance (§7.6), but are otherwise quite distinct. Importance Sampling The use of importance sampling was already implicit in equations (7.6.6) and (7.6.7). We now return to it in a slightly more formal way. Suppose that an integrand f can be written as the product of a function h that is almost constant times another, positive, function g. Then its integral over a multidimensional volume V is f dV = (f /g) gdV = h gdV (7.8.1) In equation (7.6.7) we interpreted equation (7.8.1) as suggesting a change of variable to G, the indefinite integral of g. That made gdV a perfect differential. We then proceeded to use the basic theorem of Monte Carlo integration, equation (7.6.1). A more general interpretation of equation (7.8.1) is that we can integrate f by instead sampling h — not, however, with uniform probability density dV , but rather with nonuniform density gdV . In this second interpretation, the first interpretation follows as the special case, where the means of generating the nonuniform sampling of gdV is via the transformation method, using the indefinite integral G (see §7.2). More directly, one can go back and generalize the basic theorem (7.6.1) to the case of nonuniform sampling: Suppose that points xi are chosen within the volume V with a probability density p satisfying p dV =1 (7.8.2) The generalized fundamental theorem is that the integral of any function f is estimated, using N sample points x1,...,xN , by I ≡ f dV = f p pdV ≈ f p ± f 2/p2−f /p 2 N (7.8.3) where angle brackets denote arithmetic means over the N points, exactly as in equation (7.6.2). As in equation (7.6.1), the “plus-or-minus” term is a one standard deviation error estimate. Notice that equation (7.6.1) is in fact the special case of equation (7.8.3), with p = constant = 1/V . What is the best choice for the sampling density p? Intuitively, we have already seen that the idea is to make h = f /p as close to constant as possible. We can be more rigorous by focusing on the numerator inside the square root in equation (7.8.3), which is the variance per sample point. Both angle brackets are themselves Monte Carlo estimators of integrals, so we can write S ≡ f 2 p2 − f p 2 ≈ f 2 p2 pdV − f p pdV 2 = f 2 p dV − f dV 2 (7.8.4) We now find the optimal p subject to the constraint equation (7.8.2) by the functional variation 0 = δ δp f 2 p dV − f dV 2 + λ p dV (7.8.5)
7.8 Adaptive and Recursive Monte Carlo Methods 317 with Aa Lagrange multiplier.Note that the middle term does not depend on p.The variation (which comes inside the integrals)gives 0=-f2/p2+or p=If f升 = (7.8.6) -√仄-丁If升dw where A has been chosen to enforce the constraint (7.8.2). If f has one sign in the region of integration,then we get the obvious result that the optimal choice of p-if one can figure out a practical way of effecting the sampling-is that it be proportional to f.Then the variance is reduced to zero.Not so obvious,but seen to be true,is the fact that p f is optimal even if f takes on both signs.In that case the variance per sample point (from equations 7.8.4 and 7.8.6)is 8 S=Soptimal (7.8.7) One curiosity is that one can add a constant to the integrand to make it all of one sign, since this changes the integral by a known amount,constant x V.Then,the optimal choice of p always gives zero variance,that is,a perfectly accurate integral!The resolution of this seeming paradox(already mentioned at the end of $7.6)is that perfect knowledge of p in equation (7.8.6)requires perfect knowledge of fdV,which is tantamount to already knowing the integral you are trying to compute! If your function f takes on a known constant value in most of the volume V,it is certainly a good idea to add a constant so as to make that value zero.Having done that,the accuracy attainable by importance sampling depends in practice not on how small equation (7.8.7)is,but rather on how small is equation (7.8.4)for an implementable p,likely only a 3 Press. crude approximation to the ideal. Stratified Sampling Programs 只 The idea of stratified sampling is quite different from importance sampling.Let us CIENTIFI expand our notation slightly and let (f)denote the true average of the function f over the volume V (namely the integral divided by V),while (f)denotes as before the simplest 6 (uniformly sampled)Monte Carlo estimator of that average: 《f》立/fd )三N∑f) (7.8.8) The variance of the estimator,Var ((f)),which measures the square of the error of the Monte Carlo integration,is asymptotically related to the variance of the function,Var (f)= 《f》-《f》2,by the relation Numerica 10621 Var(())=Var(f) (7.8.9) 431 N (compare equation 7.6.1). Recipes Suppose we divide the volume V into two equal,disjoint subvolumes,denoted a and b, and sample N/2 points in each subvolume.Then another estimator for (f)),different from equation (7.8.8),which we denote (f)',is North y=.+0】 (7.8.10) in other words,the mean of the sample averages in the two half-regions.The variance of estimator (7.8.10)is given by Var()=[Var(()+Var(()] =+ (7.8.11) N/2 =六aa0+Na,l
7.8 Adaptive and Recursive Monte Carlo Methods 317 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). with λ a Lagrange multiplier. Note that the middle term does not depend on p. The variation (which comes inside the integrals) gives 0 = −f2/p2 + λ or p = |f| √ λ = |f| |f| dV (7.8.6) where λ has been chosen to enforce the constraint (7.8.2). If f has one sign in the region of integration, then we get the obvious result that the optimal choice of p — if one can figure out a practical way of effecting the sampling — is that it be proportional to |f|. Then the variance is reduced to zero. Not so obvious, but seen to be true, is the fact that p ∝ |f| is optimal even if f takes on both signs. In that case the variance per sample point (from equations 7.8.4 and 7.8.6) is S = Soptimal = |f| dV 2 − f dV 2 (7.8.7) One curiosity is that one can add a constant to the integrand to make it all of one sign, since this changes the integral by a known amount, constant × V . Then, the optimal choice of p always gives zero variance, that is, a perfectly accurate integral! The resolution of this seeming paradox (already mentioned at the end of §7.6) is that perfect knowledge of p in equation (7.8.6) requires perfect knowledge of |f|dV , which is tantamount to already knowing the integral you are trying to compute! If your function f takes on a known constant value in most of the volume V , it is certainly a good idea to add a constant so as to make that value zero. Having done that, the accuracy attainable by importance sampling depends in practice not on how small equation (7.8.7) is, but rather on how small is equation (7.8.4) for an implementable p, likely only a crude approximation to the ideal. Stratified Sampling The idea of stratified sampling is quite different from importance sampling. Let us expand our notation slightly and let f denote the true average of the function f over the volume V (namely the integral divided by V ), while f denotes as before the simplest (uniformly sampled) Monte Carlo estimator of that average: f ≡ 1 V f dV f ≡ 1 N i f(xi) (7.8.8) The variance of the estimator, Var(f), which measures the square of the error of the Monte Carlo integration, is asymptotically related to the variance of the function, Var(f) ≡ f 2 − f2, by the relation Var(f) = Var(f) N (7.8.9) (compare equation 7.6.1). Suppose we divide the volume V into two equal, disjoint subvolumes, denoted a and b, and sample N/2 points in each subvolume. Then another estimator for f, different from equation (7.8.8), which we denote f , is f ≡ 1 2 fa + fb (7.8.10) in other words, the mean of the sample averages in the two half-regions. The variance of estimator (7.8.10) is given by Var f = 1 4 Var fa + Var fb = 1 4 Vara (f) N/2 + Varb (f) N/2 = 1 2N [Vara (f) + Varb (f)] (7.8.11)
318 Chapter 7.Random Numbers Here Vara(f)denotes the variance of f in subregion a,that is,,《f》a-《f》a,and correspondingly for b. From the definitions already given,it is not difficult to prove the relation arU)=2ara)+VaU川+1(《》a-《》a)2 (7.8.12) (In physics,this formula for combining second moments is the "parallel axis theorem.") Comparing equations (7.8.9),(7.8.11),and (7.8.12),one sees that the stratified (into two subvolumes)sampling gives a variance that is never larger than the simple Monte Carlo case -and smaller whenever the means of the stratified samples,.《f》aand《f》b,are different. We have not yet exploited the possibility of sampling the two subvolumes with different numbers of points,say Na in subregion a and N=N-Na in subregion b.Let us do so 81 now.Then the variance of the estimator is m=+2】 (7.8.13) N-N。 which is minimized (one can easily verify)when n兰 ICAL Na Oa =0a+0 (7.8.14) RECIPES Here we have adopted the shorthand notationVar (f/2,and correspondingly for b. If Na satisfies equation (7.8.14),then equation (7.8.13)reduces to ar(f))= (ga+o%)2 (7.8.15) 4N Press. Equation (7.8.15)reduces to equation(7.8.9)if Var (f)=Vara (f)=Var (f),in which case stratifying the sample makes no difference. 9 A standard way to generalize the above result is to consider the volume V divided into 9 more than two equal subregions.One can readily obtain the result that the optimal allocation of sample points among the regions is to have the number of points in each region j proportional IENTIFIC to oj(that is,the square root of the variance of the function f in that subregion).In spaces of high dimensionality (say d4)this is not in practice very useful,however.Dividing a 6 volume into K segments along each dimension implies Kd subvolumes,typically much too g large a number when one contemplates estimating all the corresponding a,'s. Mixed Strategies Importance sampling and stratified sampling seem,at first sight,inconsistent with each 10621 other.The former concentrates sample points where the magnitude of the integrand f is Numerica largest,that latter where the variance of f is largest.How can both be right? The answer is that (like so much else in life)it all depends on what you know and how uctio 43108 well you know it.Importance sampling depends on already knowing some approximation to Recipes your integral,so that you are able to generate random points ri with the desired probability density p.To the extent that your p is not ideal,you are left with an error that decreases only as N-1/2.Things are particularly bad if your p is far from ideal in a region where the North Software. integrand f is changing rapidly,since then the sampled function h=f/p will have a large variance.Importance sampling works by smoothing the values of the sampled function h,and is effective only to the extent that you succeed in this. Stratified sampling,by contrast,does not necessarily require that you know anything about f.Stratified sampling works by smoothing out the fluctuations of the number of points in subregions,not by smoothing the values of the points.The simplest stratified strategy, dividing V into N equal subregions and choosing one point randomly in each subregion, already gives a method whose error decreases asymptotically as N-1,much faster than N-1/2.(Note that quasi-random numbers,87.7,are another way of smoothing fluctuations in the density of points,giving nearly as good a result as the "blind"stratification strategy.) However,"asymptotically"is an important caveat:For example,if the integrand is negligible in all but a single subregion,then the resulting one-sample integration is all but
318 Chapter 7. Random Numbers Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Here Vara (f) denotes the variance of f in subregion a, that is, f2a − f2 a, and correspondingly for b. From the definitions already given, it is not difficult to prove the relation Var(f) = 1 2 [Vara (f) + Varb (f)] + 1 4 (fa − fb) 2 (7.8.12) (In physics, this formula for combining second moments is the “parallel axis theorem.”) Comparing equations (7.8.9), (7.8.11), and (7.8.12), one sees that the stratified (into two subvolumes) sampling gives a variance that is never larger than the simple Monte Carlo case — and smaller whenever the means of the stratified samples, fa and fb, are different. We have not yet exploited the possibility of sampling the two subvolumes with different numbers of points, say Na in subregion a and Nb ≡ N − Na in subregion b. Let us do so now. Then the variance of the estimator is Var f = 1 4 Vara (f) Na + Varb (f) N − Na (7.8.13) which is minimized (one can easily verify) when Na N = σa σa + σb (7.8.14) Here we have adopted the shorthand notation σa ≡ [Vara (f)]1/2, and correspondingly for b. If Na satisfies equation (7.8.14), then equation (7.8.13) reduces to Var f = (σa + σb) 2 4N (7.8.15) Equation (7.8.15) reduces to equation (7.8.9) if Var(f) = Vara (f) = Varb (f), in which case stratifying the sample makes no difference. A standard way to generalize the above result is to consider the volume V divided into more than two equal subregions. One can readily obtain the result that the optimal allocation of sample points among the regions is to have the number of points in each region j proportional to σj (that is, the square root of the variance of the function f in that subregion). In spaces of high dimensionality (say d >∼ 4) this is not in practice very useful, however. Dividing a volume into K segments along each dimension implies Kd subvolumes, typically much too large a number when one contemplates estimating all the corresponding σj ’s. Mixed Strategies Importance sampling and stratified sampling seem, at first sight, inconsistent with each other. The former concentrates sample points where the magnitude of the integrand |f| is largest, that latter where the variance of f is largest. How can both be right? The answer is that (like so much else in life) it all depends on what you know and how well you know it. Importance sampling depends on already knowing some approximation to your integral, so that you are able to generate random points xi with the desired probability density p. To the extent that your p is not ideal, you are left with an error that decreases only as N−1/2. Things are particularly bad if your p is far from ideal in a region where the integrand f is changing rapidly, since then the sampled function h = f /p will have a large variance. Importance sampling works by smoothing the values of the sampled function h, and is effective only to the extent that you succeed in this. Stratified sampling, by contrast, does not necessarily require that you know anything about f. Stratified sampling works by smoothing out the fluctuations of the number of points in subregions, not by smoothing the values of the points. The simplest stratified strategy, dividing V into N equal subregions and choosing one point randomly in each subregion, already gives a method whose error decreases asymptotically as N−1, much faster than N−1/2. (Note that quasi-random numbers, §7.7, are another way of smoothing fluctuations in the density of points, giving nearly as good a result as the “blind” stratification strategy.) However, “asymptotically” is an important caveat: For example, if the integrand is negligible in all but a single subregion, then the resulting one-sample integration is all but
7.8 Adaptive and Recursive Monte Carlo Methods 319 useless.Information,even very crude,allowing importance sampling to put many points in the active subregion would be much better than blind stratified sampling. Stratified sampling really comes into its own if you have some way of estimating the variances,so that you can put unequal numbers of points in different subregions,according to (7.8.14)or its generalizations,and if you can find a way of dividing a region into a practical number of subregions (notably not Kd with large dimension d),while yet significantly reducing the variance of the function in each subregion compared to its variance in the full volume.Doing this requires a lot of knowledge about f,though different knowledge from what is required for importance sampling. In practice,importance sampling and stratified sampling are not incompatible.In many, if not most,cases of interest,the integrand f is small everywhere in V except for a small fractional volume of"active regions."In these regions the magnitude of f and the standard 8 deviation=[Var(f/2 are comparable in size,so both techniques will give about the same concentration of points.In more sophisticated implementations,it is also possible to "nest"the two techniques,so that (e.g.)importance sampling on a crude grid is followed by stratification within each grid cell. Adaptive Monte Carlo:VEGAS R以 ed for The VEGAS algorithm,invented by Peter Lepage [1.2],is widely used for multidimen- sional integrals that occur in elementary particle physics.VEGAS is primarily based on importance sampling,but it also does some stratified sampling if the dimension d is small enough to avoid K"explosion(specifically,if(K/2)<N/2,with N the number of sample points).The basic technique for importance sampling in VEGAS is to construct,adaptively, ⑦入9 Press. a multidimensional weight function g that is separable, pgx,y,z,..)=9z(E)9g(y)9:(2).. (7.8.16) Such a function avoids the Kd explosion in two ways:(i)It can be stored in the computer as d separate one-dimensional functions,each defined by K tabulated values,say -so that SCIENTIFIC Kx d replaces K.(ii)It can be sampled as a probability density by consecutively sampling the d one-dimensional functions to obtain coordinate vector components (r,y,z,...). 6 The optimal separable weight function can be shown to be [1] 9(z)x dy dz... fz,]/2 (7.8.17) 9y(y)g(2).…」 (and correspondingly for y,2,...).Notice that this reduces to gf(7.8.6)in one dimension.Equation (7.8.17)immediately suggests VEGAS'adaptive strategy:Given a 10621 set of g-functions (initially all constant,say),one samples the function f,accumulating not only the overall estimator of the integral,but also the kd estimators (K subdivisions of the Numerical Recipes 43106 independent variable in each of d dimensions)of the right-hand side of equation (7.8.17). These then determine improved g functions for the next iteration. When the integrand f is concentrated in one,or at most a few,regions in d-space,then (outside the weight function g's quickly become large at coordinate values that are the projections of these regions onto the coordinate axes.The accuracy of the Monte Carlo integration is then North Software. enormously enhanced over what simple Monte Carlo would give. The weakness of VEGAS is the obvious one:To the extent that the projection of the function f onto individual coordinate directions is uniform,VEGAS gives no concentration of sample points in those dimensions.The worst case for VEGAS,e.g.,is an integrand that is concentrated close to a body diagonal line,e.g,one from (0,0,0,...)to (1,1,1,...). Since this geometry is completely nonseparable,VEGAS can give no advantage at all.More generally,VEGAS may not do well when the integrand is concentrated in one-dimensional (or higher)curved trajectories (or hypersurfaces),unless these happen to be oriented close to the coordinate directions. The routine vegas that follows is essentially Lepage's standard version,minimally modified to conform to our conventions.(We thank Lepage for permission to reproduce the program here.)For consistency with other versions of the VEGAS algorithm in circulation
7.8 Adaptive and Recursive Monte Carlo Methods 319 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). useless. Information, even very crude, allowing importance sampling to put many points in the active subregion would be much better than blind stratified sampling. Stratified sampling really comes into its own if you have some way of estimating the variances, so that you can put unequal numbers of points in different subregions, according to (7.8.14) or its generalizations, and if you can find a way of dividing a region into a practical number of subregions (notably not Kd with large dimension d), while yet significantly reducing the variance of the function in each subregion compared to its variance in the full volume. Doing this requires a lot of knowledge about f, though different knowledge from what is required for importance sampling. In practice, importance sampling and stratified sampling are not incompatible. In many, if not most, cases of interest, the integrand f is small everywhere in V except for a small fractional volume of “active regions.” In these regions the magnitude of |f| and the standard deviation σ = [Var(f)]1/2 are comparable in size, so both techniques will give about the same concentration of points. In more sophisticated implementations, it is also possible to “nest” the two techniques, so that (e.g.) importance sampling on a crude grid is followed by stratification within each grid cell. Adaptive Monte Carlo: VEGAS The VEGAS algorithm, invented by Peter Lepage [1,2], is widely used for multidimensional integrals that occur in elementary particle physics. VEGAS is primarily based on importance sampling, but it also does some stratified sampling if the dimension d is small enough to avoid Kd explosion (specifically, if (K/2)d < N/2, with N the number of sample points). The basic technique for importance sampling in VEGAS is to construct, adaptively, a multidimensional weight function g that is separable, p ∝ g(x, y, z, . . .) = gx(x)gy(y)gz(z)... (7.8.16) Such a function avoids the Kd explosion in two ways: (i) It can be stored in the computer as d separate one-dimensional functions, each defined by K tabulated values, say — so that K × d replaces Kd. (ii) It can be sampled as a probability density by consecutively sampling the d one-dimensional functions to obtain coordinate vector components (x, y, z, . . .). The optimal separable weight function can be shown to be [1] gx(x) ∝ dy dz . . . f 2(x, y, z, . . .) gy(y)gz(z)... 1/2 (7.8.17) (and correspondingly for y,z,...). Notice that this reduces to g ∝ |f| (7.8.6) in one dimension. Equation (7.8.17) immediately suggests VEGAS’ adaptive strategy: Given a set of g-functions (initially all constant, say), one samples the function f, accumulating not only the overall estimator of the integral, but also the Kd estimators (K subdivisions of the independent variable in each of d dimensions) of the right-hand side of equation (7.8.17). These then determine improved g functions for the next iteration. When the integrand f is concentrated in one, or at most a few, regions in d-space, then the weight function g’s quickly become large at coordinate values that are the projections of these regions onto the coordinate axes. The accuracy of the Monte Carlo integration is then enormously enhanced over what simple Monte Carlo would give. The weakness of VEGAS is the obvious one: To the extent that the projection of the function f onto individual coordinate directions is uniform, VEGAS gives no concentration of sample points in those dimensions. The worst case for VEGAS, e.g., is an integrand that is concentrated close to a body diagonal line, e.g., one from (0, 0, 0,...) to (1, 1, 1,...). Since this geometry is completely nonseparable, VEGAS can give no advantage at all. More generally, VEGAS may not do well when the integrand is concentrated in one-dimensional (or higher) curved trajectories (or hypersurfaces), unless these happen to be oriented close to the coordinate directions. The routine vegas that follows is essentially Lepage’s standard version, minimally modified to conform to our conventions. (We thank Lepage for permission to reproduce the program here.) For consistency with other versions of the VEGAS algorithm in circulation
320 Chapter 7.Random Numbers we have preserved original variable names.The parameter NDMX is what we have called K, the maximum number of increments along each axis;MXDIM is the maximum value of d;some other parameters are explained in the comments. The vegas routine performs m =itmx statistically independent evaluations of the desired integral,each with N=ncall function evaluations.While statistically independent, these iterations do assist each other,since each one is used to refine the sampling grid for the next one.The results of all iterations are combined into a single best answer,and its estimated error,by the relations 12 /best=】 (7.8.18) 8 Also returned is the quantity (I:-Ibest)2 (7.8.19) m-14 σ2 ICAL If this is significantly larger than 1,then the results of the iterations are statistically inconsistent,and the answers are suspect. The input flag init can be used to advantage.One might have a call with init=0, ncall=1000,itmx=5 immediately followed by a call with init=1,ncall=100000.itmx=1. The effect would be to develop a sampling grid over 5 iterations of a small number of samples, then to do a single high accuracy integration on the optimized grid. Note that the user-supplied integrand function,fxn,has an argument wgt in addition to the expected evaluation point x.In most applications you ignore wgt inside the function. Occasionally,however,you may want to integrate some additional function or functions along with the principal function f.The integral of any such function g can be estimated by Ig=>wig(x) (7.8.20) 灵兰 where the wi's and x's are the arguments wgt and x,respectively.It is straightforward to accumulate this sum inside your function fxn,and to pass the answer back to your main 6 program via global variables.Of course,g(x)had better resemble the principal function f to some degree,since the sampling will be optimized for f. COMPUTING #include <stdio.h> 19200 #include <math.h> (ISBN #include "nrutil.h" #define ALPH 1.5 色 #define NDMX 50 #define MXDIM 10 Numerical Recipes 43108 #define TINY 1.0e-30 extern long idum; For random number initialization in main. (outside void vegas(float regn[],int ndim,float (*fxn)(float []float),int init, Software. unsigned long ncall,int itmx,int nprn,float *tgral,float *sd, North float *chi2a) Performs Monte Carlo integration of a user-supplied ndim-dimensional function fxn over a rectangular volume specified by regn[1..2*ndim],a vector consisting of ndim "lower left" coordinates of the region followed by ndim "upper right"coordinates.The integration consists of itmx iterations,each with approximately ncall calls to the function.After each iteration the grid is refined;more than 5 or 10 iterations are rarely useful.The input flag init signals whether this call is a new start,or a subsequent call for additional iterations (see comments below).The input flag nprn (normally 0)controls the amount of diagnostic output.Returned answers are tgral (the best estimate of the integral),sd (its standard deviation),and chi2a (x-per degree of freedom,an indicator of whether consistent results are being obtained).See text for further details. float ran2(long *idum);
320 Chapter 7. Random Numbers Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). we have preserved original variable names. The parameter NDMX is what we have called K, the maximum number of increments along each axis; MXDIM is the maximum value of d; some other parameters are explained in the comments. The vegas routine performs m = itmx statistically independent evaluations of the desired integral, each with N = ncall function evaluations. While statistically independent, these iterations do assist each other, since each one is used to refine the sampling grid for the next one. The results of all iterations are combined into a single best answer, and its estimated error, by the relations Ibest = m i=1 Ii σ2 i m i=1 1 σ2 i σbest = m i=1 1 σ2 i −1/2 (7.8.18) Also returned is the quantity χ2 /m ≡ 1 m − 1 m i=1 (Ii − Ibest) 2 σ2 i (7.8.19) If this is significantly larger than 1, then the results of the iterations are statistically inconsistent, and the answers are suspect. The input flag init can be used to advantage. One might have a call with init=0, ncall=1000, itmx=5 immediately followed by a call with init=1, ncall=100000, itmx=1. The effect would be to develop a sampling grid over 5 iterations of a small number of samples, then to do a single high accuracy integration on the optimized grid. Note that the user-supplied integrand function, fxn, has an argument wgt in addition to the expected evaluation point x. In most applications you ignore wgt inside the function. Occasionally, however, you may want to integrate some additional function or functions along with the principal function f. The integral of any such function g can be estimated by Ig = i wig(x) (7.8.20) where the wi’s and x’s are the arguments wgt and x, respectively. It is straightforward to accumulate this sum inside your function fxn, and to pass the answer back to your main program via global variables. Of course, g(x) had better resemble the principal function f to some degree, since the sampling will be optimized for f. #include <stdio.h> #include <math.h> #include "nrutil.h" #define ALPH 1.5 #define NDMX 50 #define MXDIM 10 #define TINY 1.0e-30 extern long idum; For random number initialization in main. void vegas(float regn[], int ndim, float (*fxn)(float [], float), int init, unsigned long ncall, int itmx, int nprn, float *tgral, float *sd, float *chi2a) Performs Monte Carlo integration of a user-supplied ndim-dimensional function fxn over a rectangular volume specified by regn[1..2*ndim], a vector consisting of ndim “lower left” coordinates of the region followed by ndim “upper right” coordinates. The integration consists of itmx iterations, each with approximately ncall calls to the function. After each iteration the grid is refined; more than 5 or 10 iterations are rarely useful. The input flag init signals whether this call is a new start, or a subsequent call for additional iterations (see comments below). The input flag nprn (normally 0) controls the amount of diagnostic output. Returned answers are tgral (the best estimate of the integral), sd (its standard deviation), and chi2a (χ2 per degree of freedom, an indicator of whether consistent results are being obtained). See text for further details. { float ran2(long *idum);