MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods 8.3.4.5.2 The Hanson-Koopmans method The following procedure (References 8.3.4.5.2(a)and(b))can be a useful method for obtaining a B-basis value for sample sizes not exceeding 28.This procedure requires the assumption that the obser- vations are a random sample from a population for which the logarithm of the cumulative distribution func- tion is concave,an assumption satisfied by a large class of probability distributions.There is substantial empirical evidence that suggests the composite strength data satisfies this assumption,consequently this procedure can usually be recommended for use when n is less than 29.However,in view of the required assumption,this is not an unconditional recommendation. The Hanson-Koopmans B-basis value is B=x(r 8.3.4.5.2(a x(r) where x)is the smallest and x is the rth largest data value.The values of r and k depend on n and are tabulated in Table 8.5.14.This equation for the B-basis value should not be employed if x(n=x()The example problem in Section 8.3.7.5 demonstrates these procedures. The Hanson-Koopmans method can be used to calculate A-basis values for n less than 299.Find the value kA corresponding to the sample size n in Table 8.5.15.Let x and x be the largest and smallest data values.The A-basis value is A=) 8.3.4.5.2(b) x(n) 8.3.5 Basis values for structured data Where possible,it is advantageous to reduce structured data to unstructured cases as discussed in Section 8.3.2.The analysis of unstructured data is possible for distributions other than a normal prob- ability model,which is assumed by the procedures for structured data.Where the data are structured and cannot be combined according to the test in Section 8.3.2.2,the procedures in this section should be used.These procedures for basis value calculations for structured data assume a normal probability model.All of these procedures can be considered in terms of regression analysis.A general description of regression analysis of linear statistical models is provided in Section 8.3.5.1.Included in this section is a discussion of checking the required assumptions.Analysis of variance is a special case with one ran- dom effect and no fixed effects(Section 8.3.5.2).A case of one fixed effect and no random effects is sim- ple linear regression(Section 8.3.5.3). 8.3.5.1 Regression analysis of linear statistical models The objective of a regression analysis for material basis properties is to obtain basis values for a par- ticular response(for example,tensile strength)as functions of fixed factors(such as temperature,lay-up, and humidity).The measured response values will be called observations,and the values which describe the conditions corresponding to these observations will be referred to as covariates.For example,if a linear relationship is assumed between tensile strength and temperature,then the mean strength at a temperature Ti is,in the limit of infinitely many observations at this temperature,equal to 6+0Ti.The constants 6o and 0 are generally unknown and must be estimated from the data.The values that these constants multiply,here 1 and Ti.are covariates;together they describe the fixed conditions under which the ith strength observation was made.Linear regression refers to a method for the analysis of relation- ships which are linear functions of unknown parameters (here 0 and 0).These relationships need not be linear in covariates.For example,a quadratic model in which squared temperature(T)is introduced as an additional covariate can be analyzed using linear regression. 8-26
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-26 8.3.4.5.2 The Hanson-Koopmans method The following procedure (References 8.3.4.5.2(a) and (b)) can be a useful method for obtaining a B-basis value for sample sizes not exceeding 28. This procedure requires the assumption that the observations are a random sample from a population for which the logarithm of the cumulative distribution function is concave, an assumption satisfied by a large class of probability distributions. There is substantial empirical evidence that suggests the composite strength data satisfies this assumption, consequently this procedure can usually be recommended for use when n is less than 29. However, in view of the required assumption, this is not an unconditional recommendation. The Hanson-Koopmans B-basis value is B = x x x (r) k (1) (r) L N M M O Q P P 8.3.4.5.2(a) where x(1) is the smallest and x(r) is the rth largest data value. The values of r and k depend on n and are tabulated in Table 8.5.14. This equation for the B-basis value should not be employed if x(r) = x(1). The example problem in Section 8.3.7.5 demonstrates these procedures. The Hanson-Koopmans method can be used to calculate A-basis values for n less than 299. Find the value kA corresponding to the sample size n in Table 8.5.15. Let x(n) and x(1) be the largest and smallest data values. The A-basis value is A = x x x (n) k (1) (n) L N M M O Q P P 8.3.4.5.2(b) 8.3.5 Basis values for structured data Where possible, it is advantageous to reduce structured data to unstructured cases as discussed in Section 8.3.2. The analysis of unstructured data is possible for distributions other than a normal probability model, which is assumed by the procedures for structured data. Where the data are structured and cannot be combined according to the test in Section 8.3.2.2, the procedures in this section should be used. These procedures for basis value calculations for structured data assume a normal probability model. All of these procedures can be considered in terms of regression analysis. A general description of regression analysis of linear statistical models is provided in Section 8.3.5.1. Included in this section is a discussion of checking the required assumptions. Analysis of variance is a special case with one random effect and no fixed effects (Section 8.3.5.2). A case of one fixed effect and no random effects is simple linear regression (Section 8.3.5.3). 8.3.5.1 Regression analysis of linear statistical models The objective of a regression analysis for material basis properties is to obtain basis values for a particular response (for example, tensile strength) as functions of fixed factors (such as temperature, lay-up, and humidity). The measured response values will be called observations, and the values which describe the conditions corresponding to these observations will be referred to as covariates. For example, if a linear relationship is assumed between tensile strength and temperature, then the mean strength at a temperature Ti is, in the limit of infinitely many observations at this temperature, equal to θ0 + θ1Ti. The constants θ0 and θ1 are generally unknown and must be estimated from the data. The values that these constants multiply, here 1 and Ti, are covariates; together they describe the fixed conditions under which the ith strength observation was made. Linear regression refers to a method for the analysis of relationships which are linear functions of unknown parameters (here θ0 and θ1). These relationships need not be linear in covariates. For example, a quadratic model in which squared temperature (T2 ) is introduced as an additional covariate can be analyzed using linear regression
MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods Assume that the data being analyzed consist of n observations at fixed conditions (or levels),and number these conditions 1,2,.,c.In the example of linear regression on temperature,there are c tem- peratures,and corresponding sets of covariates:(1,T),(1,T2),..(1,T,).It is necessary to indicate which fixed condition corresponds to each observation(recall the subscript i in Equation 8.2.3,so let the fixed conditions for observation s be p(s).Also each observation is made on a specimen from one of m batches.These batches are numbered 1,2,...,m,and q(s)indicates the batch corresponding the sth ob- servation.Denote the observations by xs,for s=1,2,...n,where the sth value comes from fixed level p(s) and from batch q(s). Assume that the (xs)represents a sample from a normal distribution with mean Lp(s)=01Zp(s).1+02zp(s).2+...+zp(s). 8.3.5.1(a) where the )for 1 s p(s)s(and u=1,...,are known constants and theare parameters to be estimated.For example,if mean strength is assumed to vary linearly with temperature,and if condition p(s)=1 corresponds to 75 degrees,then 41=+275 8.3.5.1(b) so r=2,zu=1,and zi2=75.Recall that the covariates zp(s).u are not required to be linear.For example,a quadratic relationship between strength and temperature would have covariates,1,Ti,and T?. The means ups)can never be observed,but must be estimated from limited data.Each data value consists of the sum of ups)plus a random quantity bas+e,where bas takes on a different value for each batch q(s)and e,takes on a different value for each observation.The random variables (ba(s and (es)are assumed to be random samples from normal populations with means zero and variances ando. The varianceof is the between-batch variance,and o is referred to as the within-batch (or error)vari- ance.(For a more elementary discussion of these ideas,see Section 8.2.3.) The model for the data can now be written as xs Up(s)+ba(s)+es 01zp(s).1+...+rzp(s).r+bq(s)+es 8.3.5.1(c where the (p())are known,the (0)are unknown fixed quantities,and the (ba(s))and (es)are random quantities with unknown variances.Equation 8.3.5.1(c)is called a regression model.Every regression analysis begins with the choice of a regression model. Special cases of Equation 8.3.5.1(c)are frequently useful.If the levels correspond to data groups, with the covariates indicating which group is associated with each observation,then the regression model is an analysis of variance(ANOVA)(Section 8.3.5.2).This case is most frequently used to calculate basis values when there is significant batch-to-batch variability.When there is one continuous covariate,the case is called the simple linear regression model(Section 8.3.5.3).Details of the analysis are provided for these special cases in the following sections.The analysis of the more general case is beyond the scope of this handbook:however,the RECIPE software is available to perform the analysis and examples are shown in Sections 8.3.7.6-8.3.7.9. The power gained by using regression models for basis values is obtained at the expense of addi- tional assumptions.A residua/is defined to be the difference between a data point and its fitted value. Using the residuals,the following assumptions need to be checked: 8-27
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-27 Assume that the data being analyzed consist of n observations at A fixed conditions (or levels), and number these conditions 1, 2, .., A . In the example of linear regression on temperature, there are A temperatures, and A corresponding sets of covariates: (1, T1), (1, T2), ..., (1, TA ). It is necessary to indicate which fixed condition corresponds to each observation (recall the subscript i in Equation 8.2.3, so let the fixed conditions for observation s be p(s). Also each observation is made on a specimen from one of m batches. These batches are numbered 1, 2, ..., m, and q(s) indicates the batch corresponding the sth observation. Denote the observations by xs, for s = 1, 2, ..., n, where the sth value comes from fixed level p(s) and from batch q(s). Assume that the {xs} represents a sample from a normal distribution with mean p(s) 1 p(s),1 2 p(s),2 r p(s),r µ = θθ θ z + z + + … z 8.3.5.1(a) where the {zp(s),u}, for 1 ≤ p(s) ≤ A and u = 1, ..., r, are known constants and the {θ u} are parameters to be estimated. For example, if mean strength is assumed to vary linearly with temperature, and if condition p(s) = 1 corresponds to 75 degrees, then 1 1 2 µ = + 75 θ θ 8.3.5.1(b) so r = 2, z11 = 1, and z12 = 75. Recall that the covariates zp(s),u are not required to be linear. For example, a quadratic relationship between strength and temperature would have covariates, 1, Ti, and i 2 T . The means µp(s) can never be observed, but must be estimated from limited data. Each data value consists of the sum of µp(s) plus a random quantity bq(s) + es, where bq(s) takes on a different value for each batch q(s) and es takes on a different value for each observation. The random variables {bq(s)} and {es} are assumed to be random samples from normal populations with means zero and variances b 2 σ and e 2 σ . The variance b 2 σ is the between-batch variance, and e 2 σ is referred to as the within-batch (or error) variance. (For a more elementary discussion of these ideas, see Section 8.2.3.) The model for the data can now be written as xs p(s) q(s) s 1 p(s),1 r p(s),r q(s) s = + µ b + e = θ θ z + + … z + b + e 8.3.5.1(c) where the {zp(s),u} are known, the {θu} are unknown fixed quantities, and the {bq(s)} and {e s} are random quantities with unknown variances. Equation 8.3.5.1(c) is called a regression model. Every regression analysis begins with the choice of a regression model. Special cases of Equation 8.3.5.1(c) are frequently useful. If the levels correspond to data groups, with the covariates indicating which group is associated with each observation, then the regression model is an analysis of variance (ANOVA) (Section 8.3.5.2). This case is most frequently used to calculate basis values when there is significant batch-to-batch variability. When there is one continuous covariate, the case is called the simple linear regression model (Section 8.3.5.3). Details of the analysis are provided for these special cases in the following sections. The analysis of the more general case is beyond the scope of this handbook; however, the RECIPE software is available to perform the analysis and examples are shown in Sections 8.3.7.6 - 8.3.7.9. The power gained by using regression models for basis values is obtained at the expense of additional assumptions. A residual is defined to be the difference between a data point and its fitted value. Using the residuals, the following assumptions need to be checked:
MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods 1.Check the validity of the assumed curvilinear relation between property and predictor variables, for example,straight line,quadratic,or other assumed relationship; 2.Check homogeneity of variance (variances are assumed constant over the range of predictor variables); 3.Check normality of regression residuals;and 4.Check for independence of residuals. Also,one should not extrapolate beyond the range of the predictor variables without good cause. A detailed discussion of the validation of a regression model is beyond the scope of this handbook; however it is discussed at length in most elementary texts,including References 8.3.5.1(a)-(d).Some elaboration at this point,though,might be helpful. If a model fits well,then the residuals should be as likely to be positive as negative,and so they will alternate in sign every few values.They will have no apparent structure,and ideally will look like white noise'.If a model fits poorly,then there will often be long sequences of residuals that have the same sign. and curved patterns will typically be apparent in the residuals. If the variance is high for a group of residuals,then these values will appear more scattered,and con- versely for the case of low variability.This behavior can often be detected by examining residual plots. For example,if a simple linear regression has been performed of strength of specimens as a function of temperature,and if strength becomes more variable as temperature increases,then a plot of residuals against temperature might have a 'megaphone'shape. There are also graphical procedures for checking the normality assumption for residuals.These can be found in most textbooks.It is also possible to apply the Anderson-Darling goodness-of-fit test for nor- mality (Section 8.3.4.3)to the ratio of residuals to the standard deviation about the regression line(that is, e,/s,).Ajustification for this procedure can be found in Reference 8.3.5.1(e). It is difficult to test for independence graphically.One possibility is to plot the odd-numbered residuals against the even-numbered ones,and to see if a trend is apparent.Further discussion can be found in the referenced textbooks.One form of lack of independence,'clustering'due to batch effects,is ad- dressed in the example in Section 8.3.7.9. 8.3.5.2 Analysis of variance This section contains a discussion of one-way analysis of variance(ANOVA)procedures.Although these models can be written using the general notation of Equation 8.3.5.1(c),for the present discussion it is simpler to write the one-way ANOVA model as i=1,,k X对=u+bi+ej,j=l,,ni 8.3.5.2 where n,is the number of values in the ith group,and xi represents the jth observation in the ith of k groups.The overall average of the population is u,b,is the effect attributed to the ith group,and e;is a random error term representing unexplained sources of variation.The error terms,eij.are assumed to be independently distributed normal random variables with mean zero and variance o(the within-group variance).The b;may be regarded as fixed (unknown)constants,or else they may be modeled as reali- zations of a random variable,which is generally taken to be normally distributed with mean zero and vari- anceo(the between-group variance). 8-28
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-28 1. Check the validity of the assumed curvilinear relation between property and predictor variables, for example, straight line, quadratic, or other assumed relationship; 2. Check homogeneity of variance (variances are assumed constant over the range of predictor variables); 3. Check normality of regression residuals; and 4. Check for independence of residuals. Also, one should not extrapolate beyond the range of the predictor variables without good cause. A detailed discussion of the validation of a regression model is beyond the scope of this handbook; however it is discussed at length in most elementary texts, including References 8.3.5.1(a) - (d). Some elaboration at this point, though, might be helpful. If a model fits well, then the residuals should be as likely to be positive as negative, and so they will alternate in sign every few values. They will have no apparent structure, and ideally will look like 'white noise'. If a model fits poorly, then there will often be long sequences of residuals that have the same sign, and curved patterns will typically be apparent in the residuals. If the variance is high for a group of residuals, then these values will appear more scattered, and conversely for the case of low variability. This behavior can often be detected by examining residual plots. For example, if a simple linear regression has been performed of strength of specimens as a function of temperature, and if strength becomes more variable as temperature increases, then a plot of residuals against temperature might have a ‘megaphone’ shape. There are also graphical procedures for checking the normality assumption for residuals. These can be found in most textbooks. It is also possible to apply the Anderson-Darling goodness-of-fit test for normality (Section 8.3.4.3) to the ratio of residuals to the standard deviation about the regression line (that is, ei/sy). A justification for this procedure can be found in Reference 8.3.5.1(e). It is difficult to test for independence graphically. One possibility is to plot the odd-numbered residuals against the even-numbered ones, and to see if a trend is apparent. Further discussion can be found in the referenced textbooks. One form of lack of independence, ‘clustering’ due to batch effects, is addressed in the example in Section 8.3.7.9. 8.3.5.2 Analysis of variance This section contains a discussion of one-way analysis of variance (ANOVA) procedures. Although these models can be written using the general notation of Equation 8.3.5.1(c), for the present discussion it is simpler to write the one-way ANOVA model as ij i ij i x = + b + e , i = 1, ,k j = 1, ,n µ … … 8.3.5.2 where ni is the number of values in the ith group, and xij represents the jth observation in the ith of k groups. The overall average of the population is µ, bi is the effect attributed to the ith group, and ei j is a random error term representing unexplained sources of variation. The error terms, ei j, are assumed to be independently distributed normal random variables with mean zero and variance e 2 σ (the within-group variance). The bi may be regarded as fixed (unknown) constants, or else they may be modeled as realizations of a random variable, which is generally taken to be normally distributed with mean zero and variance b 2 σ (the between-group variance)
MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods The case of fixed b;is called a fixed-effects analysis of variance,and it is appropriate for situations where the group means u+b,are not to be considered as samples from a population of means.For ex- ample,the groups might consist of strength measurements on composite material specimens having dif- ferent numbers of plies.If the groups differ substantially in mean strength,one might consider determin- ing basis values for the various numbers of plies.However,it clearly makes no sense to consider hypo- thetical random populations of specimens with different number of plies,and to regard the k groups which appear in the data as a random sample from such a population. If the group means u+b;are considered to be a sample from a population of group means,then the model is a random-effects analysis of variance.For example,the data might come from k batches.In this case,one would typically be concerned as much with future batches as with those represented in the data.If one intends to use future batches in fabrication,then it does not make much sense to calculate basis values for each of the k observed batches.Rather,one might choose to determine basis values based on the populations of a random observation from an as yet unobtained batch.In this way,protec- tion against batch-to-batch variability can be incorporated into design values.Reference 8.3.5.2(a)pro- vides more information on analysis of variance procedures.The effect of sample size on an analysis of this type should be considered in test program design(Section 2.2.5.2). The following calculations address batch-to-batch variability.In other words,the only grouping is due to batches and the compatibility test(Section 8.3.2)indicate that unstructured data methods should not be used.The method is based on the one-way analysis of variance (ANOVA)random-effects model and the procedure is documented in Reference 8.3.5.2(b). The assumptions are that 1.The data from each batch are normally distributed, 2.The within-batch variance is the same from batch to batch,and 3.The batch means are normally distributed. There is no test available for the first assumption.Simulation studies,however,suggest that moderate violation of this assumption does not have an adverse effect on the properties of the ANOVA method. The second assumption should be validated by performing the test described in Section 8.3.5.2.1.This test is currently recommended as a diagnostic,since extensive simulation suggests that violation of this assumption will likely result in conservatism,although non-conservatism can arise in some situations. There is no useful test for the third assumption unless data from many(twenty or more)batches are available In this analysis,all batches are treated the same (for example,no distinction is made between batches from different fabricators).If the batches are not from a single fabricator,then the approach shown in Section 8.3.7.9 should be used. The organization of this subsection is as follows.The test for equality of variance is documented in the first two subsections.The next three subsections present computational procedures for statistics used in the ANOVA procedures.Next,a method for three or more batches,which should cover most cases of practical importance,is presented.The case of two batches is discussed separately. 8.3.5.2.1 Levene's test for equality of variances The ANOVA method is derived under the assumption that the variances within each batch are equal. This section describes a widely-used test suggested by Levene(References 8.3.5.2.1(a)-(c))for deter- mining whether the sample variances for k groups differ significantly.This test is nonparametric;that is,it does not require strong assumptions about the form of the underlying populations. To perform this test,form the transformed data 8-29
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-29 The case of fixed bi is called a fixed-effects analysis of variance, and it is appropriate for situations where the group means µ + bi are not to be considered as samples from a population of means. For example, the groups might consist of strength measurements on composite material specimens having different numbers of plies. If the groups differ substantially in mean strength, one might consider determining basis values for the various numbers of plies. However, it clearly makes no sense to consider hypothetical random populations of specimens with different number of plies, and to regard the k groups which appear in the data as a random sample from such a population. If the group means µ + bi are considered to be a sample from a population of group means, then the model is a random-effects analysis of variance. For example, the data might come from k batches. In this case, one would typically be concerned as much with future batches as with those represented in the data. If one intends to use future batches in fabrication, then it does not make much sense to calculate basis values for each of the k observed batches. Rather, one might choose to determine basis values based on the populations of a random observation from an as yet unobtained batch. In this way, protection against batch-to-batch variability can be incorporated into design values. Reference 8.3.5.2(a) provides more information on analysis of variance procedures. The effect of sample size on an analysis of this type should be considered in test program design (Section 2.2.5.2). The following calculations address batch-to-batch variability. In other words, the only grouping is due to batches and the compatibility test (Section 8.3.2) indicate that unstructured data methods should not be used. The method is based on the one-way analysis of variance (ANOVA) random-effects model and the procedure is documented in Reference 8.3.5.2(b). The assumptions are that 1. The data from each batch are normally distributed, 2. The within-batch variance is the same from batch to batch, and 3. The batch means are normally distributed. There is no test available for the first assumption. Simulation studies, however, suggest that moderate violation of this assumption does not have an adverse effect on the properties of the ANOVA method. The second assumption should be validated by performing the test described in Section 8.3.5.2.1. This test is currently recommended as a diagnostic, since extensive simulation suggests that violation of this assumption will likely result in conservatism, although non-conservatism can arise in some situations. There is no useful test for the third assumption unless data from many (twenty or more) batches are available. In this analysis, all batches are treated the same (for example, no distinction is made between batches from different fabricators). If the batches are not from a single fabricator, then the approach shown in Section 8.3.7.9 should be used. The organization of this subsection is as follows. The test for equality of variance is documented in the first two subsections. The next three subsections present computational procedures for statistics used in the ANOVA procedures. Next, a method for three or more batches, which should cover most cases of practical importance, is presented. The case of two batches is discussed separately. 8.3.5.2.1 Levene's test for equality of variances The ANOVA method is derived under the assumption that the variances within each batch are equal. This section describes a widely-used test suggested by Levene (References 8.3.5.2.1(a) - (c)) for determining whether the sample variances for k groups differ significantly. This test is nonparametric; that is, it does not require strong assumptions about the form of the underlying populations. To perform this test, form the transformed data
MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods w阿=x对-xl 8.3.5.2.1 where xi is the median of the n;values in the ith group.Then perform an F-test on these transformed data(Section 8.3.5.2.2).If the test statistic is greater than or equal to the tabulated F-distribution quantile, then the variances are declared to be significantly different.If the statistic is less than the tabulated value. then the hypothesis of equality of variance is not rejected. If the test does reject the hypothesis that the variances are equal,it is recommended that an investi- gation of the reason for the unequal variances be carried out.This may reveal problems in the generation of the data or in the fabrication of the material.Basis values calculated using the ANOVA method are likely to be conservative if the variances differ substantially. 8.3.5.2.2 The F-test for equality of means To test the assumption that the populations from which the k samples were drawn have the same mean,calculate the following F statistic: ∑ni(-x)2/k-1) F= 8.3.5.2.2 Σ (灯rxP1- i=1 i=1 where xi is the average of the n,values in the ith group,and x is the average of all n observations.If Equation 8.3.5.2.2 is greater than the 1-a quantile of the F-distribution having k-I numerator and n-k denominator degrees of freedom,then one concludes (with a five percent risk of making an error)that the k population means are not all equal.For a=0.05,the required F quantiles are tabulated in Table 8.5.1. This test is based on the assumption that the data are normally distributed;however,it is well known to be relatively insensitive to departures from this assumption. 8.3.5.2.3 One-way ANOVA computations based on individual measurements When all of the observations in a sample are available,the first step is to compute the means. k x=∑】 X对/n 8.3.5.2.3(a) i=1j=1 and 又=是x灯/nm, fori =1,....k 8.3.5.2.3(b) where k n=∑ni 8.3.5.2.3(c) =1 is the total sample size.The required sums of squares can now be computed.The between-batch of squares is computed as SSB=Σnx-nx2 8.3.5.2.3(d =1 and the total sum of squares is 8-30
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-30 wij ij i = |x - ~x | 8.3.5.2.1 where i ~x is the median of the ni values in the ith group. Then perform an F-test on these transformed data (Section 8.3.5.2.2). If the test statistic is greater than or equal to the tabulated F-distribution quantile, then the variances are declared to be significantly different. If the statistic is less than the tabulated value, then the hypothesis of equality of variance is not rejected. If the test does reject the hypothesis that the variances are equal, it is recommended that an investigation of the reason for the unequal variances be carried out. This may reveal problems in the generation of the data or in the fabrication of the material. Basis values calculated using the ANOVA method are likely to be conservative if the variances differ substantially. 8.3.5.2.2 The F-test for equality of means To test the assumption that the populations from which the k samples were drawn have the same mean, calculate the following F statistic: F = n ( - ) / (k-1) (x - ) /(n-k) i=1 k i 2 i=1 k j=1 n ij i 2 i ∑ ∑ ∑ x x x i 8.3.5.2.2 where xi is the average of the ni values in the ith group, and x is the average of all n observations. If Equation 8.3.5.2.2 is greater than the 1 - α quantile of the F-distribution having k - 1 numerator and n - k denominator degrees of freedom, then one concludes (with a five percent risk of making an error) that the k population means are not all equal. For α = 0.05, the required F quantiles are tabulated in Table 8.5.1. This test is based on the assumption that the data are normally distributed; however, it is well known to be relatively insensitive to departures from this assumption. 8.3.5.2.3 One-way ANOVA computations based on individual measurements When all of the observations in a sample are available, the first step is to compute the means. x = x / n i=1 k j=1 n ij 1 ∑ ∑ 8.3.5.2.3(a) and xi = x / n , for i = 1, ,k j=1 n ij i i ∑ … 8.3.5.2.3(b) where n = n i=1 k ∑ i 8.3.5.2.3(c) is the total sample size. The required sums of squares can now be computed. The between-batch of squares is computed as SSB = - n i=1 k ∑ nx x i i 2 2 8.3.5.2.3(d) and the total sum of squares is