194 T.J.DICICCIO AND B.EFRON formula(4.9)of Section 4,is moderately large.Sup- normal-theory standard intervals for the correlation pose we think we have moved 1.645 standard errors coefficient are much more accurate if constructed on to the right of中,to the scale =tanh()and then transformed back 6=6+1.645061 to give an interval for 6 itself.Transformation in- variance means that the BC intervals cannot be Actually though,with a=0.105, fooled by a bad choice of scale.To put it another way, the statistician does not have to search for a trans- 0%=(1+1.645a)o6=1.17306, formation like tanh in applying the BCa method according to(2.5).For calculating a confidence level, In summary,BCa produces confidence intervals is really only 1.645/1.173 =1.40 standard er- for 6 from the bootstrap distribution of requir- rors to the right of o,considerably less than 1.645. ing on the order of 2,000 bootstrap replications Formula(2.3)automatically corrects for an acceler- These intervals are transformation invariant ating standard error.The next section gives a ge- and exactly correct under the normal transforma- ometrical interpretation of a,and also of the BC tion model(2.5);in general they are second-order formula(2.3). accurate and correct. The peculiar-looking formula(2.3)for the BCa endpoints is designed to give exactly the right an- 3.THE ACCELERATION a swer in situation(2.5),and to give it automatically The acceleration parameter a appearing in the in terms of the bootstrap distribution of *Notice, BCa formula(3.2)looks mysterious.Its definition for instance,that the normalizing transformation in(2.5)involves an idealized transformation to nor- d=m()is not required in (2.3).By comparison, mality which will not be known in practice.Fortu- the standard interval works perfectly only under the nately a enjoys a simple relationship with Fisher's more restrictive assumption that score function which makes it easy to estimate.This (2.9) 8~N(0,σ2), section describes the relationship in the context of one-parameter families.In doing so it also allows with o2 constant.In practice we do not expect ei- us better motivation for the peculiar-looking BCa ther (2.9)or (2.5)to hold exactly,but the broader formula(2.3). assumptions (2.5)are likely to be a better approxi- Suppose then that we have a one-parameter fam- mation to the truth.They produce intervals that are ily of c.d.f.'s Ge()on the real line,with 6 being an an order of magnitude more accurate,as shown in estimate of 0.In the relationships below we assume Section 8. that6 behaves asymptotically like a maximum like- Formula (2.5)generalizes (2.9)in three ways,by lihood estimator,with respect to a notional sample allowing bias,nonconstant standard error and a size n,as made explicit in(5.3)of Efron(1987).As normalizing transformation.These three extensions a particular example,we will consider the case are necessary and sufficient to give second-order accuracy, (3.1) Gamman n=10, n (2.10) Prob{0<0BC.[a]}=a+0(1/n), where Gamma indicates a standard gamma variate compared with Prob{<0sTAN[a]}=a+0(1/n), with density tn-1 exp{-t)/r(n)for t>0. where n is the sample size in an i.i.d.sampling situ- Having observed 6,we wonder with what confi- ation.This result is stated more carefully in Section dence we can reject a trial value 00 of the parameter 8,which also shows the second-order correctness of 6.In the gamma example(3.1)we might have the BC intervals.Hall (1988)was the first to es- tablish (2.10). (3.2) 0=1and0o=1.5. The BC intervals are transformation invariant. The easy answer from the bootstrap point of view is If we change the parameter of interest from 0 to given in terms of the bootstrap c.d.f.G(c)=Ga(c). some monotone function of 6,=m(),likewise We can define the bootstrap confidence value to be changing0to中=m(8)and to中*=m(),then the a-level BC endpoints change in the same way, (3.3) a=G(0o)=Ga(0o): (2.11) BC [a]m(0BC,[a]) However,this will usually not agree with the more familiar hypothesis-testing confidence level for a The standard intervals are not transformation in one-parameter problem,say variant,and this accounts for some of their practi- cal difficulties.It is well known,for instance,that (3.4) a=1-Ge()
194 T. J. DICICCIO AND B. EFRON formula (4.9) of Section 4, is moderately large. Suppose we think we have moved 1.645 standard errors to the right of φˆ, to φe = φˆ + 1:645σφˆ : Actually though, with a = 0:105, σeφ = 1 + 1:645aσφˆ = 1:173σφˆ; according to (2.5). For calculating a confidence level, φe is really only 1:645/1:173 = 1:40 standard errors to the right of φˆ, considerably less than 1:645. Formula (2.3) automatically corrects for an accelerating standard error. The next section gives a geometrical interpretation of a, and also of the BCa formula (2.3). The peculiar-looking formula (2.3) for the BCa endpoints is designed to give exactly the right answer in situation (2.5), and to give it automatically in terms of the bootstrap distribution of θˆ ∗ . Notice, for instance, that the normalizing transformation φˆ = mθˆ is not required in (2.3). By comparison, the standard interval works perfectly only under the more restrictive assumption that 2:9 θˆ ∼ Nθ; σ 2 ; with σ 2 constant. In practice we do not expect either (2.9) or (2.5) to hold exactly, but the broader assumptions (2.5) are likely to be a better approximation to the truth. They produce intervals that are an order of magnitude more accurate, as shown in Section 8. Formula (2.5) generalizes (2.9) in three ways, by allowing bias, nonconstant standard error and a normalizing transformation. These three extensions are necessary and sufficient to give second-order accuracy, 2:10 Probθ < θˆBCa α = α + O1/n; compared with Probθ < θˆ STANα = α + O1/ √ n, where n is the sample size in an i.i.d. sampling situation. This result is stated more carefully in Section 8, which also shows the second-order correctness of the BCa intervals. Hall (1988) was the first to establish (2.10). The BCa intervals are transformation invariant. If we change the parameter of interest from θ to some monotone function of θ, φ = mθ, likewise changing θˆ to φˆ = mθˆ and θˆ ∗ to φˆ ∗ = mθˆ ∗ , then the α-level BCa endpoints change in the same way, 2:11 φˆ BCa α = mθˆBCa α: The standard intervals are not transformation invariant, and this accounts for some of their practical difficulties. It is well known, for instance, that normal-theory standard intervals for the correlation coefficient are much more accurate if constructed on the scale φ = tanh−1 θ and then transformed back to give an interval for θ itself. Transformation invariance means that the BCa intervals cannot be fooled by a bad choice of scale. To put it another way, the statistician does not have to search for a transformation like tanh−1 in applying the BCa method. In summary, BCa produces confidence intervals for θ from the bootstrap distribution of θˆ ∗ , requiring on the order of 2,000 bootstrap replications θˆ ∗ . These intervals are transformation invariant and exactly correct under the normal transformation model (2.5); in general they are second-order accurate and correct. 3. THE ACCELERATION a The acceleration parameter a appearing in the BCa formula (3.2) looks mysterious. Its definition in (2.5) involves an idealized transformation to normality which will not be known in practice. Fortunately a enjoys a simple relationship with Fisher’s score function which makes it easy to estimate. This section describes the relationship in the context of one-parameter families. In doing so it also allows us better motivation for the peculiar-looking BCa formula (2.3). Suppose then that we have a one-parameter family of c.d.f.’s Gθ θˆ on the real line, with θˆ being an estimate of θ. In the relationships below we assume that θˆ behaves asymptotically like a maximum likelihood estimator, with respect to a notional sample size n, as made explicit in (5.3) of Efron (1987). As a particular example, we will consider the case 3:1 θˆ ∼ θ Gamman n ; n = 10; where Gamma indicates a standard gamma variate with density t n−1 exp−t/0n for t > 0. Having observed θˆ, we wonder with what confi- dence we can reject a trial value θ0 of the parameter θˆ. In the gamma example (3.1) we might have 3:2 θˆ = 1 and θ0 = 1:5: The easy answer from the bootstrap point of view is given in terms of the bootstrap c.d.f. Gˆ c = Gθˆc. We can define the bootstrap confidence value to be 3:3 α˜ = Gˆ θ0 = Gθˆθ0 : However, this will usually not agree with the more familiar hypothesis-testing confidence level for a one-parameter problem, say 3:4 αˆ = 1 − Gθ0 θˆ;
BOOTSTRAP CONFIDENCE INTERVALS 195 the probability under 0o of getting a less extreme We can list three important properties of the(2,2) observation than 0.(For convenience these defini- curve (3.12)near=20: tions assume <00.)In the case of(3.1)-(3.2)we have a=0.930 while a=0.863. (3.13) (2,)=(2o-0)at乏=o: The BCa formula(2.3)amounts to a rule for con- verting bootstrap confidence values into hypothe- (3.14) dE=1at2=0, sis-testing confidence levels a.This becomes crucial as soon as we try to use the bootstrap on problems and more complicated than one-parameter families.De- d22 (3.15) fine d2=-2d at. (3.5) z=Φ-(a)ande=Φ-1(a) The last of these relationships is of special interest here.It says that the curvature of the (z,2)curve at For a given value of 0o and a above,let a=a and 20 is directly proportional to the acceleration a. eBC [a]0o in (2.3).If (2.3)works perfectly,then In any given one-parameter problem we can find we have the actual(2,2)curve,at least in theory.This is ob- tained by keeping 6 fixed and varying the trial point (3.6) 20十2 Φ-1G(00)=2=20+1-a(20+) 0o in (3.3)-(3.5).Figure 3 shows the (2,2)curve for the gamma problem,with 6 any fixed value,say or =1.In this case the BC approximation formula (3.7) 2-20 (3.12)matches the actual(,2)curve to three deci- 龙=1+a(2-z0) -20 mal places over most of the range of the graph.At 0=1,00 1.5 for example,2 equals 1.092 both The fact that the BC intervals are second-order actually and from(3.15). accurate implies that the conversion formula(3.7) The fact that the BCa formula(2.3)is second- itself must be quite accurate. order accurate implies that the conversion formula To use (3.7),or(2.3),we first must estimate the (3.12)errs only by O(1/n).This means that rela- two parameters zo and a.The bias-correction zo is tionships (3.13)-(3.15)must have the same order of estimated by accuracy,even in quite general problems.In partic- ular,the curvature of the actual(,)plot,if it were (3.8) 0=Φ-1G(0)=-1G(0 possible to compute it,would nearly equal-2a,with a given by the skewness definition(3.10). as in(2.8).The acceleration a is estimated in terms None of this is special to one-parameter families of the skewness of the score function except for the skewness definition(3.10),which does (3.9) o=品1g{g.(o, not allow for nuisance parameters.The next section where ge(0)is the density Ge(0)/00.Section 10 of Efron (1987)shows that one-sixth the skewness of (0)evaluated at 0=6, (3.10) a =SKEW0-oflo(0)}/6, is an excellent estimate of a. Both zo and a are of order O(1/n),with the estimates 2o and a erring by O(1/n).For the gamma problem(3.1)it is easy to calculate that (3.11) 20=0.106anda=0.105. If 6 is the MLE in a one-parameter family (but not in general),then 2o and d are nearly the same,as 2 0 2 is the case here. The usable form of(3.7)is FIG.3.Plot of versus in the gamma problem (3.1);the BC approximation (3.12)or(2.3),matches the actual curve to three (3.12) 2-20 2= decimal places.The central curvature of the(2,2)plot is propor 1+(2-20) -0 tional to the acceleration a
BOOTSTRAP CONFIDENCE INTERVALS 195 the probability under θ0 of getting a less extreme observation than θˆ. (For convenience these definitions assume θˆ < θ0 .) In the case of (3.1)–(3.2) we have α˜ = 0:930 while αˆ = 0:863. The BCa formula (2.3) amounts to a rule for converting bootstrap confidence values α˜ into hypothesis-testing confidence levels αˆ. This becomes crucial as soon as we try to use the bootstrap on problems more complicated than one-parameter families. De- fine 3:5 z˜ = 8 −1 α˜ and zˆ = 8 −1 αˆ: For a given value of θ0 and αˆ above, let α = αˆ and θˆBCa α = θ0 in (2.3). If (2.3) works perfectly, then we have 3:6 8 −1Gˆ θ0 = z˜ = z0 + z0 + zˆ 1 − az0 + zˆ ; or 3:7 zˆ = z˜ − z0 1 + azˆ − z0 − z0 : The fact that the BCa intervals are second-order accurate implies that the conversion formula (3.7) itself must be quite accurate. To use (3.7), or (2.3), we first must estimate the two parameters z0 and a. The bias-correction z0 is estimated by 3:8 zˆ0 = 8 −1Gˆ θˆ = 8 −1Gθˆθˆ as in (2.8). The acceleration a is estimated in terms of the skewness of the score function 3:9 `˙ θ θˆ = ∂ ∂θ loggθ θˆ; where gθ θˆ is the density ∂Gθ θˆ/∂θˆ. Section 10 of Efron (1987) shows that one-sixth the skewness of `˙ θ θˆ evaluated at θ = θˆ, 3:10 aˆ = SKEWθ=θˆ`˙ θ θˆ/6; is an excellent estimate of a. Both z0 and a are of order O1/ √ n, with the estimates zˆ0 and aˆ erring by O1/n. For the gamma problem (3.1) it is easy to calculate that 3:11 zˆ0 = 0:106 and aˆ = 0:105: If θˆ is the MLE in a one-parameter family (but not in general), then zˆ0 and aˆ are nearly the same, as is the case here. The usable form of (3.7) is 3:12 zˆ = z˜ − zˆ0 1 + aˆz˜ − z0 − zˆ0 : We can list three important properties of the z˜; zˆ curve (3.12) near z˜ = zˆ0 : z˜; zˆ = zˆ0 − zˆ0 at z˜ = zˆ0 (3.13) y dzˆ dz˜ = 1 at z˜ = zˆ0 (3.14) ; and d 2zˆ dz˜ 2 = −2aˆ at z˜ = zˆ0 (3.15) : The last of these relationships is of special interest here. It says that the curvature of the z˜; zˆ curve at zˆ0 is directly proportional to the acceleration aˆ. In any given one-parameter problem we can find the actual z˜; zˆ curve, at least in theory. This is obtained by keeping θˆ fixed and varying the trial point θ0 in (3.3)–(3.5). Figure 3 shows the z˜; zˆ curve for the gamma problem, with θˆ any fixed value, say θˆ = 1. In this case the BCa approximation formula (3.12) matches the actual z˜; zˆ curve to three decimal places over most of the range of the graph. At θˆ = 1; θ0 = 1:5 for example, zˆ equals 1.092 both actually and from (3.15). The fact that the BCa formula (2.3) is secondorder accurate implies that the conversion formula (3.12) errs only by O1/n. This means that relationships (3.13)–(3.15) must have the same order of accuracy, even in quite general problems. In particular, the curvature of the actual z˜; zˆ plot, if it were possible to compute it, would nearly equal −2aˆ, with aˆ given by the skewness definition (3.10). None of this is special to one-parameter families except for the skewness definition (3.10), which does not allow for nuisance parameters. The next section Fig. 3. Plot of zˆ versus z˜ in the gamma problem 3:1; the BCa approximation 3:12 or 2:3, matches the actual curve to three decimal places. The central curvature of the z˜; zˆ plot is proportional to the acceleration aˆ.
196 T.J.DICICCIO AND B.EFRON shows how to extend the skewness definition of a to the cumulant generating function,is a normalizing multiparameter situations.This gives an estimate factor that makes gu(x)integrate to 1. that is easy to evaluate,especially in exponential The vectors u and n are in one-to-one correspon- families,and that behaves well in practice.In fact dence so that either can be used to index functions a is usually easier to estimate than zo,despite the of interest.In(4.1),for example,we used u to index latter's simpler definition. the densities g,but n to index The ABC algo- rithm involves the mapping from n to u,say 4.THE ABC METHOD (4.2) u=mu(n), We now leave one-parameter families and return which,fortunately,has a simple form in all of the to the more complicated situations that bootstrap common exponential families.Section 3 of DiCic- methods are intended to deal with.In many such cio and Efron(1992)gives function(4.2)for several situations it is possible to approximate the BCa families,as well as specifying the other inputs nec- interval endpoints analytically,entirely dispens- essary for using the ABC algorithm. ing with Monte Carlo simulations.This reduces The MLE of u in (3.1)is y,so that the MLE the computational burden by an enormous fac- of a real-valued parameter of interest 6=t(u)is tor,and also makes it easier to understand how BCa improves upon the standard intervals.The (4.3) 0=t()=t(y). ABC method("ABC"standing for approximate boot- As an example consider the bivariate normal model strap confidence intervals)is an analytic version (1.2).Here x=(B1,A1),(B2,A2),,(B20,A2o) of BCa applying to smoothly defined parameters andy=∑21(B,A,B号,B:A,Ay/20.The bivari- in exponential families.It also applies to smoothly ate normal is a five-parameter exponential family defined nonparametric problems,as shown in Sec- with tion 6.DiCiccio and Efron (1992)introduced the ABC method,which is also discussed in Efron and (4.4)u=(入1,入2,A足+「11,入1入2+「12,λ2+「22)/ Tibshirani (1993). Thus the correlation coefficient is the function t(u) The BC endpoints(2.3)depend on the bootstrap given by c.d.f.G and estimates of the two parameters a and zo.The ABC method requires one further estimate, (4.5) L4-u1八2 of the nonlinearity parameter ca,but it does not in- [(3-(4屿-]2 volve G. 6=t()is seen to be the usual sample correlation The standard interval (1.1)depends only on the coefficient. two quantities ()The ABC intervals depend We denote the p x p covariance matrix of y by on the five quantities (0,G,a,2o,c).Each of the (u)=cov.{y,and letΣ=(i),the MLE ofΣ. three extra numbers(a,2o,c)corrects a deficiency The delta-method estimate of standard error for 6= of the standard method,making the ABC intervals t()depends on Let i denote the gradient vector second-order accurate as well as second-order cor- of =t(u)at u=, rect. The ABC system applies within multiparame- (4.6) ou: ter exponential families,which are briefly reviewed below.This framework includes most familiar Then parametric situations:normal,binomial,Poisson, (4.7) 6=(t'2t)/2 gamma,multinomial,ANOVA,logistic regression, contingency tables,log-linear models,multivariate is the parametric delta-method estimate of standard normal problems,Markov chains and also nonpara- error,and it is also the usual Fisher information metric situations as discussed in Section 6. standard error estimate. The density function for a p-parameter exponen- The o values for the standard intervals in Tables tial family can be written as 2 and 3 were found by numerical differentiation, using (4.1) gu(x)=exp[n'y-(n)] (4.8) t(+se;)-t(i-se;) dμi 28 where x is the observed data and y=Y(x)is a p- dimensional vector of sufficient statistics;n is the for a small value of s,with ei the ith coordinate p-dimensional natural parameter vector;u is vector.The covariance matrix is simple to calcu- the expectation parameter u=E{y};and (n), late in most of the familiar examples,as shown in
196 T. J. DICICCIO AND B. EFRON shows how to extend the skewness definition of aˆ to multiparameter situations. This gives an estimate that is easy to evaluate, especially in exponential families, and that behaves well in practice. In fact a is usually easier to estimate than z0 , despite the latter’s simpler definition. 4. THE ABC METHOD We now leave one-parameter families and return to the more complicated situations that bootstrap methods are intended to deal with. In many such situations it is possible to approximate the BCa interval endpoints analytically, entirely dispensing with Monte Carlo simulations. This reduces the computational burden by an enormous factor, and also makes it easier to understand how BCa improves upon the standard intervals. The ABC method (“ABC” standing for approximate bootstrap confidence intervals) is an analytic version of BCa applying to smoothly defined parameters in exponential families. It also applies to smoothly defined nonparametric problems, as shown in Section 6. DiCiccio and Efron (1992) introduced the ABC method, which is also discussed in Efron and Tibshirani (1993). The BCa endpoints (2.3) depend on the bootstrap c.d.f. Gˆ and estimates of the two parameters a and z0 . The ABC method requires one further estimate, of the nonlinearity parameter cq , but it does not involve Gˆ. The standard interval (1.1) depends only on the two quantities θˆ; σˆ . The ABC intervals depend on the five quantities θˆ; σˆ; aˆ; zˆ0 ; cˆq . Each of the three extra numbers aˆ; zˆ0 ; cˆq corrects a deficiency of the standard method, making the ABC intervals second-order accurate as well as second-order correct. The ABC system applies within multiparameter exponential families, which are briefly reviewed below. This framework includes most familiar parametric situations: normal, binomial, Poisson, gamma, multinomial, ANOVA, logistic regression, contingency tables, log-linear models, multivariate normal problems, Markov chains and also nonparametric situations as discussed in Section 6. The density function for a p-parameter exponential family can be written as 4:1 gµx = expη 0y − ψη where x is the observed data and y = Yx is a pdimensional vector of sufficient statistics; η is the p-dimensional natural parameter vector; µ is the expectation parameter µ = Eµy; and ψη, the cumulant generating function, is a normalizing factor that makes gµx integrate to 1. The vectors µ and η are in one-to-one correspondence so that either can be used to index functions of interest. In (4.1), for example, we used µ to index the densities g, but η to index ψ. The ABC algorithm involves the mapping from η to µ, say 4:2 µ = muη; which, fortunately, has a simple form in all of the common exponential families. Section 3 of DiCiccio and Efron (1992) gives function (4.2) for several families, as well as specifying the other inputs necessary for using the ABC algorithm. The MLE of µ in (3.1) is µˆ = y, so that the MLE of a real-valued parameter of interest θ = tµ is 4:3 θˆ = tµˆ = ty: As an example consider the bivariate normal model (1.2). Here x = B1 ; A1 , B2 ; A2 ;: : :; B20; A20 and y = P20 i=1 Bi , Ai , B2 i , BiAi , A2 i 0 /20. The bivariate normal is a five-parameter exponential family with 4:4 µ = λ1 ; λ2 ; λ 2 1 + 011; λ1λ2 + 012; λ 2 2 + 022 0 : Thus the correlation coefficient is the function tµ given by 4:5 θ = µ4 − µ1µ2 µ3 − µ 2 1 µ5 − µ 2 2 1/2 y θˆ = tµˆ is seen to be the usual sample correlation coefficient. We denote the p × p covariance matrix of y by 6µ = covµy, and let 6ˆ = 6µˆ , the MLE of 6. The delta-method estimate of standard error for θˆ = tµˆ depends on 6ˆ . Let t˙ denote the gradient vector of θ = tµ at µ = µˆ, 4:6 t˙ = : : :; ∂t ∂µi ;: : : 0 µ=µˆ : Then 4:7 σˆ = t˙ 06ˆ t˙ 1/2 is the parametric delta-method estimate of standard error, and it is also the usual Fisher information standard error estimate. The σˆ values for the standard intervals in Tables 2 and 3 were found by numerical differentiation, using 4:8 ∂t ∂µi µˆ := tµˆ + εei − tµˆ − εei 2ε for a small value of ε, with ei the ith coordinate vector. The covariance matrix 6ˆ is simple to calculate in most of the familiar examples, as shown in
BOOTSTRAP CONFIDENCE INTERVALS 197 DiCiccio and Efron (1992,Section 3)giving from Notice that ca is still required here,to estimate 2o (4.7).This assumes that t(u)is differentiable.In in(4.12). fact we need t(u)to be twice differentiable in order Formula(4.14)is the one used in Tables 2 and 3. to carry out the ABC computations. It has the advantage of being transformation invari- The ABC algorithm begins by computing a from ant,(2.11),and is sometimes more accurate than (4.7)-(4.8).Then the parameters (a,zo,ca)are esti- (4.13).However,(4.13)is local,all of the recompu- mated by computing p+2 numerical second deriva- tations of t(u)involved in (4.8)-(4.13)taking place tives.The first of these is infinitesimally near=y.In this sense ABCq is (4.9) a=alfmui+st0la like the standard method.Nonlocality occasionally /663 causes computational difficulties with boundary vi- when is the MLE of the natural parameter vec- olations.In fact (4.13)is a simple quadratic approx- tor n.This turns out to be the same as the skew- imation to (4.14),so ABC and ABCq usually agree ness definition of a,(3.10),in the one-parameter reasonably well. The main point of this article is that highly ac- family obtained from Stein's least favorable family construction [see Efron,1987,(6.7)].Formula (4.9) curate approximate confidence intervals can now be uses exponential family relationships to compute calculated on a routine basis.The ABC intervals are implemented by a short computer algorithm.[The the skewness from a second derivative. ABC intervals in Tables 2 and 3 were produced by The second ABC numerical derivative is the parametric and nonparametric ABC algorithms (4.10) 26 “abcpar'”and“abcnon."These and the BCa program are available in the language S:send electronic mail co measures how nonlinear the parameter of inter- to statlib@lib.stat.cmu.edu with the one-line mes- est 6 is,as a function of u. sage:send bootstrap.funs from S.]There are five in- The final p second derivatives are required for puts to the algorithm:立,,方and the functions t() the bias-correction parameter zo.The parametric and mu().The outputs include @sTAN[a],ABc[a] delta-method estimate of bias for =t()can be and ABc[].Computational effort for the ABC in- expressed as tervals is two or three times that required for the standard intervals. (4.11) The ABC intervals can be useful even in very simple situations.Suppose that the data consists where di is the ith eigenvalue and yi is the ith of a single observation x from a Poisson distribu- eigenvector of Then tion with unknown expectation 0.In this case 6= (4.12)0=Φ-1(2.(a)(cg-b/a)兰a+c,-b/a t(x)=x and =v6.Carrying through definitions (4.9)-4.14)gives d=0=1/(60/2),c,=0,and so This involves terms other than b becuase zo relates to median bias.For the kind of smooth exponential family problems considered here,(4.12)is usually cl网=i+a-,w=+zo more accurate than the direct estimate(2.8). The simplest form of the ABC intervals,called For x 7,the interval (ABc[0.05],0ABc[0.95]) ABCquadratic or ABCq,gives the a-level end- equals (3.54,12.67).This compares with the exact point directly as a function of the five numbers interval (3.57,12.58)for 0,splitting the atom of (0,,a,2o,g): probability at x=7,and with the standard interval (2.65,11.35). a→w=0+z@间 Here is a more realistic example of the ABC al- gorithm,used in a logistic regression context. (4.13) →=(1-a0P →5=入+c,A2 Table 4 shows the data from an experiment con- cerning mammalian cell growth.The goal of this →9ABca[a]=0+5 experiment was to quantify the effects of two fac- The original ABC endpoint,denoted BABcla],re- tors on the success of a culture.Factor"r"measures quires one more recomputation of the function t(): the ratio of two key constituents of the culture plate,while factor "d"measures how many days a→0=0+z(@→入= (1-aw)2 were allowed for culture maturation.A total of (4.14) 1,843 independent cultures were prepared,investi- gating 25 different(ri,d)combinations.The table lists si;and nii for each combination,the num-
BOOTSTRAP CONFIDENCE INTERVALS 197 DiCiccio and Efron (1992, Section 3) giving σˆ from (4.7). This assumes that tµ is differentiable. In fact we need tµ to be twice differentiable in order to carry out the ABC computations. The ABC algorithm begins by computing σˆ from (4.7)–(4.8). Then the parameters a; z0 ; cq are estimated by computing p+2 numerical second derivatives. The first of these is 4:9 aˆ = ∂ 2 ∂ε2 t˙ 0muηˆ + εt˙ε=0 6σˆ 3 ; when ηˆ is the MLE of the natural parameter vector η. This turns out to be the same as the skewness definition of aˆ, (3.10), in the one-parameter family obtained from Stein’s least favorable family construction [see Efron, 1987, (6.7)]. Formula (4.9) uses exponential family relationships to compute the skewness from a second derivative. The second ABC numerical derivative is 4:10 cˆq = ∂ 2 ∂ε2 t µˆ + ε6ˆ t˙ σˆ ε=0 2σˆ y cˆq measures how nonlinear the parameter of interest θ is, as a function of µ. The final p second derivatives are required for the bias-correction parameter z0 . The parametric delta-method estimate of bias for θˆ = tµˆ can be expressed as 4:11 bˆ = 1 2 X p i=1 ∂ 2 ∂ε2 tµˆ + εd 1/2 i γi ε=0 ; where di is the ith eigenvalue and γi is the ith eigenvector of 6ˆ . Then 4:12 zˆ0 = 8 −1 2·8aˆ·8cˆq−bˆ/σˆ := aˆ+cˆq−bˆ/σˆ : This involves terms other than bˆ becuase z0 relates to median bias. For the kind of smooth exponential family problems considered here, (4.12) is usually more accurate than the direct estimate (2.8). The simplest form of the ABC intervals, called ABCquadratic or ABCq, gives the α-level endpoint directly as a function of the five numbers θˆ; σˆ; aˆ; zˆ0 ; cˆq : 4:13 α → w ≡ zˆ0 + z α → λ ≡ w 1 − aˆw 2 → ξ ≡ λ + cˆqλ 2 → θˆ ABCqα = θˆ + σˆ ξ: The original ABC endpoint, denoted θˆ ABCα, requires one more recomputation of the function t·: 4:14 α → w = zˆ0 + z α → λ = w 1 − aˆw 2 → θˆ ABCα = t µˆ + λ6ˆ t˙ σˆ : Notice that cˆq is still required here, to estimate zˆ0 in (4.12). Formula (4.14) is the one used in Tables 2 and 3. It has the advantage of being transformation invariant, (2.11), and is sometimes more accurate than (4.13). However, (4.13) is local, all of the recomputations of tµ involved in (4.8)–(4.13) taking place infinitesimally near µˆ = y. In this sense ABCq is like the standard method. Nonlocality occasionally causes computational difficulties with boundary violations. In fact (4.13) is a simple quadratic approximation to (4.14), so ABC and ABCq usually agree reasonably well. The main point of this article is that highly accurate approximate confidence intervals can now be calculated on a routine basis. The ABC intervals are implemented by a short computer algorithm. [The ABC intervals in Tables 2 and 3 were produced by the parametric and nonparametric ABC algorithms “abcpar” and “abcnon.” These and the BCa program are available in the language S: send electronic mail to statlib@lib.stat.cmu.edu with the one-line message: send bootstrap.funs from S.] There are five inputs to the algorithm: µˆ, 6ˆ , ηˆ and the functions t· and mu·. The outputs include θˆ STANα, θˆ ABCα and θˆ ABCqα. Computational effort for the ABC intervals is two or three times that required for the standard intervals. The ABC intervals can be useful even in very simple situations. Suppose that the data consists of a single observation x from a Poisson distribution with unknown expectation θ. In this case θˆ = tx = x and σˆ = p θˆ. Carrying through definitions (4.9)–(4.14) gives aˆ = zˆ0 = 1/6θˆ1/2 ; cˆq = 0, and so θˆ ABCα = θˆ + w 1 − aˆw 2 p θˆ; w = zˆ0 + z α : For x = 7, the interval θˆ ABC0:05; θˆ ABC0:95 equals 3:54; 12:67. This compares with the exact interval (3.57, 12.58) for θ, splitting the atom of probability at x = 7, and with the standard interval 2:65; 11:35. Here is a more realistic example of the ABC algorithm, used in a logistic regression context. Table 4 shows the data from an experiment concerning mammalian cell growth. The goal of this experiment was to quantify the effects of two factors on the success of a culture. Factor “r” measures the ratio of two key constituents of the culture plate, while factor “d” measures how many days were allowed for culture maturation. A total of 1,843 independent cultures were prepared, investigating 25 different ri ;dj combinations. The table lists sij and nij for each combination, the num-
198 T.J.DICICCIO AND B.EFRON TABLE 4 Cell data:1,843 cell cultures were prepared,varying two factors,r(the ratio of two key constituents)and d (the number of days of culturing).Data shown are sij and nij.the number of successful cultures and the number of cultures attempted,at the ith level of r and the jth level of d di d2 ds da ds Total ri 5/31 3/28 20/45 24/47 29/35 81/186 T2 15/77 36/78 43/71 56/71 66/74 216/371 Ts 48/126 68/116 145/171 98/119 114/129 473/661 TA 29/92 35/52 57/85 38/50 7277 231/356 T5 11/53 20/52 20/48 40/55 52/61 143/269 Total 108/379 162/326 285/420 256/342 333/376 1144/1843 ber of successful cultures,compared to the number not too bad in this case,although better perfor- attempted. mance might have been expected with n=1,843 We suppose that the number of successful cul- data points.In fact it is very difficult to guess a pri- tures is a binomial variate, ori what constitutes a large enough sample size for sij~ii.d.binomial(nij,mi), adequate standard-interval performance. (4.15) The ABC formulas (4.13)-(4.14)were derived as i,j=1,2,3,4,5, second-order approximations to the BCa endpoints by DiCiccio and Efron (1992).They showed that with an additive logistic regression model for the these formulas give second-order accuracy as in unknown probabilities Tij, (2.10),and also second-order correctness.Section 8 reviews some of these results.There are many other expressions for ABC-like interval endpoints T (4.16) that enjoy equivalent second-order properties in theory,although they may be less dependable in ∑a-∑B=0. practice.A particularly simple formula is 1 For the example here we take the parameter of in- (4.20)0ABc[a]=0sTAN[a]+o+(2a+g)z(). terest to be This shows that the ABC endpoints are not just a (4.17) 6=15 T51 translation of @srAN[a]. In repeated sampling situations the estimated the success probability for the lowest r and highest d divided by the success probability for the highest constants (a,20,ca)are of stochastic order 1/n in the sample size,the same as a.They multiply a r and lowest d.This typifies the kind of problem traditionally handled by the standard method. in (4.20),resulting in corrections of order /vn to A logistic regression program calculated maxi- @sTAN[a].If there were only 1/4 as much cell data, mum likelihood estimates @:Bi,from which we n=461,but with the same proportion of successes in every cell of Table 4,then (a,2,would be obtained twice as large.This would double the relative dif- (4.18) i=1+exp[-(位+a+月1l =4.16. ference (0ABc[a]-0sTAN[a])/G according to (4.20), 1+exp[-(i+a1+B5)] rendering 0sTAN[a]quite inaccurate. The output of the logistic regression program pro- Both a and 2o are transformation invariant,re- vided and n for the ABC algorithm.Section 3 taining the same numerical value under monotone of DiCiccio and Efron(1992)gives the exact speci- parameter transformations =m().The nonlin- fication for an ABC analysis of a logistic regression earity constant co is not invariant,and it can be problem.Applied here,the algorithm gave standard reduced by transformations that make o more lin- and ABC 0.90 central intervals for 0, ear as a function of u.Changing parameters from 0=T15/πs1to中=log()changes(a,o,cg)from (6sTAw[0.05],sTAx[0.95])=(3.06,5.26), (-0.006,-0.025,0.105)to(-0.006,-0.025,0.025) 4.19) (ABcl0.05],ABC[0.95])=(3.20,5.43). for the cell data.The standard intervals are nearly correct on the d scale.The ABC and BC methods The ABC limits are shifted moderately upwards automate this kind of data-analytic trick. relative to the standard limits,enough to make the We can visualize the relationship between the shape (1.6)equal 1.32.The standard intervals are BC and ABC intervals in terms of Figure 3.The
198 T. J. DICICCIO AND B. EFRON Table 4 Cell data: 1,843 cell cultures were prepared, varying two factors, r (the ratio of two key constituents) and d (the number of days of culturing). Data shown are sij and nij ; the number of successful cultures and the number of cultures attempted, at the ith level of r and the jth level of d d1 d2 d3 d4 d5 Total r1 5/31 3/28 20/45 24/47 29/35 81/186 r2 15/77 36/78 43/71 56/71 66/74 216/371 r3 48/126 68/116 145/171 98/119 114/129 473/661 r4 29/92 35/52 57/85 38/50 72/77 231/356 r5 11/53 20/52 20/48 40/55 52/61 143/269 Total 108/379 162/326 285/420 256/342 333/376 1144/1843 ber of successful cultures, compared to the number attempted. We suppose that the number of successful cultures is a binomial variate, (4.15) sij ∼i:i:d: binomialnij ;πij ; i; j = 1; 2; 3; 4; 5; with an additive logistic regression model for the unknown probabilities πij , 4:16 log πij 1 − πij = µ + αi + βj ; X 5 1 αi = X 5 1 βj = 0: For the example here we take the parameter of interest to be 4:17 θ = π15 π51 ; the success probability for the lowest r and highest d divided by the success probability for the highest r and lowest d. This typifies the kind of problem traditionally handled by the standard method. A logistic regression program calculated maximum likelihood estimates µˆ; αˆ i ;βˆ j , from which we obtained 4:18 θˆ = 1 + exp−µˆ + αˆ5 + βˆ 1 1 + exp−µˆ + αˆ1 + βˆ 5 = 4:16: The output of the logistic regression program provided µˆ, 6ˆ and ηˆ for the ABC algorithm. Section 3 of DiCiccio and Efron (1992) gives the exact speci- fication for an ABC analysis of a logistic regression problem. Applied here, the algorithm gave standard and ABC 0.90 central intervals for θ, 4:19 θˆ STAN0:05; θˆ STAN0:95 = 3:06; 5:26; θˆ ABC0:05; θˆ ABC0:95 = 3:20; 5:43: The ABC limits are shifted moderately upwards relative to the standard limits, enough to make the shape (1.6) equal 1.32. The standard intervals are not too bad in this case, although better performance might have been expected with n = 1; 843 data points. In fact it is very difficult to guess a priori what constitutes a large enough sample size for adequate standard-interval performance. The ABC formulas (4.13)–(4.14) were derived as second-order approximations to the BCa endpoints by DiCiccio and Efron (1992). They showed that these formulas give second-order accuracy as in (2.10), and also second-order correctness. Section 8 reviews some of these results. There are many other expressions for ABC-like interval endpoints that enjoy equivalent second-order properties in theory, although they may be less dependable in practice. A particularly simple formula is 4:20 θˆ ABCα := θˆ STANα + σˆ zˆ0 + 2aˆ + cˆq z α 2 : This shows that the ABC endpoints are not just a translation of θˆ STANα. In repeated sampling situations the estimated constants aˆ; zˆ0 ; cˆq are of stochastic order 1/ √ n in the sample size, the same as σˆ . They multiply σˆ in (4.20), resulting in corrections of order σˆ / √ n to θˆ STANα. If there were only 1/4 as much cell data, n = 461, but with the same proportion of successes in every cell of Table 4, then aˆ; zˆ0 ; cˆq would be twice as large. This would double the relative difference θˆ ABCα − θˆ STANα/σˆ according to (4.20), rendering θˆ STANα quite inaccurate. Both aˆ and zˆ0 are transformation invariant, retaining the same numerical value under monotone parameter transformations φ = mθ. The nonlinearity constant cˆq is not invariant, and it can be reduced by transformations that make φ more linear as a function of µ. Changing parameters from θ = π15/π51 to φ = logθ changes aˆ; zˆ0 ; cˆq from −0:006; −0:025; 0:105 to −0:006; −0:025; 0:025 for the cell data. The standard intervals are nearly correct on the φ scale. The ABC and BCa methods automate this kind of data-analytic trick. We can visualize the relationship between the BCa and ABC intervals in terms of Figure 3. The