MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods The approach for structured data divides the grouping of data according to fixed and random effects. A fixed effect is where an independent variable is set or measured.An example of a fixed effect is data obtained,by design or by chance,at different measured test temperatures.A random effect is the result of variability where the cause in unknown or unmeasurable.An example of a random effect is data ob- tained from several batches with significant batch-to-batch variability.(See definitions in Section 8.1.4.) Data sets with random effects,fixed effects or combinations of fixed and random effects require a basic understanding of linear models for regression and the analysis of variance.While a detailed exposition of this topic is beyond the scope of the handbook,an introduction with elementary references is provided in Section 8.3.5.1.The simplest case of structured data is where the only grouping is by a random effect, such as batches or panels.For this situation,basis values should be calculated by the analysis of vari- ance (ANOVA)procedure (Section 8.3.5.2).Before basis values are calculated,a diagnostic test for equality of variances should be applied.Note that there is a special approach for determining basis val- ues when the data consist of only two groups. The case of one fixed effect and no random effects is linear regression(Section 8.3.5.3).For cases with no or one random effect and an arbitrary number of fixed effects,basis values from regression mod- els can be calculated using the computer program RECIPE.A method for pooling small data sets from multiple environmental conditions is described in Section 8.3.5.4. 8.3.2 Subpopulation compatibility-structured or unstructured Expected and unexpected behavior should be considered in determining whether there are natural or logical groupings of the data.Data for which natural groupings exist,or for which responses of interest could vary systematically with respect to known factors,are structured data.For example measurements made from each of several batches could reasonably be grouped according to batch,and measurements made at various known temperatures could be modeled using linear regression (Section 8.3.5);hence both can be regarded as structured data.In many ways,it is easier to analyze data which are unstruc- tured;hence,it is often desirable to be able to show that a natural grouping of data has no significant ef- fect.Data are considered unstructured if all relevant information is contained in the response measure- ments themselves.This could be because these measurements are all that is known,or else because one is able to ignore potential structure in the data.For example,data measurements that have been grouped by batch and demonstrated to have negligible batch-to-batch variability may be considered un- structured.An unstructured data set is a simple random sample. The following section describes the k-sample Anderson-Darling test for showing the subpopulations are compatible,that is,the natural groupings have no significant effect.Compatible groups may be treated as part of the same population.Thus,a structured data set,with a natural grouping identified,can become an unstructured data set by showing that the natural grouping has no significant effect using the k-sample Anderson-Darling test. For composite materials,it is recommended that batches (and panels where possible)be treated as natural groupings and tested for compatibility.Other groupings may result from expected behavior.Ply count might have a significant effect on t45 shear test;thus specimens with different ply counts naturally fall into groupings for this test.The decision regarding grouping the data may also be affected by the pur- pose of the test program.As an example,consider the influence of strain rate on material properties.A test program may be designed to evaluate the effects of strain rate on a given property.That program would obtain data at selected and controlled values of strain rate.These would provide the natural group- ing for the data.A subpopulation compatibility test could be used to determine if there was a significant effect;or a structured data approach,such as linear regression,could be used. 8.3.2.1 Notation for grouped data For structured data,each data value belongs to a particular group,and there will generally be more than one value within each group.Therefore,double subscripts will be used to identify the observations. Let the data be denoted by xi for i=1,...,k and j=1,...,n,where i is the group and j is the observation within that group.There are n;data values in the ith of k groups.Then the total number of observations is 8-16
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-16 The approach for structured data divides the grouping of data according to fixed and random effects. A fixed effect is where an independent variable is set or measured. An example of a fixed effect is data obtained, by design or by chance, at different measured test temperatures. A random effect is the result of variability where the cause in unknown or unmeasurable. An example of a random effect is data obtained from several batches with significant batch-to-batch variability. (See definitions in Section 8.1.4.) Data sets with random effects, fixed effects or combinations of fixed and random effects require a basic understanding of linear models for regression and the analysis of variance. While a detailed exposition of this topic is beyond the scope of the handbook, an introduction with elementary references is provided in Section 8.3.5.1. The simplest case of structured data is where the only grouping is by a random effect, such as batches or panels. For this situation, basis values should be calculated by the analysis of variance (ANOVA) procedure (Section 8.3.5.2). Before basis values are calculated, a diagnostic test for equality of variances should be applied. Note that there is a special approach for determining basis values when the data consist of only two groups. The case of one fixed effect and no random effects is linear regression (Section 8.3.5.3). For cases with no or one random effect and an arbitrary number of fixed effects, basis values from regression models can be calculated using the computer program RECIPE. A method for pooling small data sets from multiple environmental conditions is described in Section 8.3.5.4. 8.3.2 Subpopulation compatibility - structured or unstructured Expected and unexpected behavior should be considered in determining whether there are natural or logical groupings of the data. Data for which natural groupings exist, or for which responses of interest could vary systematically with respect to known factors, are structured data. For example measurements made from each of several batches could reasonably be grouped according to batch, and measurements made at various known temperatures could be modeled using linear regression (Section 8.3.5); hence both can be regarded as structured data. In many ways, it is easier to analyze data which are unstructured; hence, it is often desirable to be able to show that a natural grouping of data has no significant effect. Data are considered unstructured if all relevant information is contained in the response measurements themselves. This could be because these measurements are all that is known, or else because one is able to ignore potential structure in the data. For example, data measurements that have been grouped by batch and demonstrated to have negligible batch-to-batch variability may be considered unstructured. An unstructured data set is a simple random sample. The following section describes the k-sample Anderson-Darling test for showing the subpopulations are compatible, that is, the natural groupings have no significant effect. Compatible groups may be treated as part of the same population. Thus, a structured data set, with a natural grouping identified, can become an unstructured data set by showing that the natural grouping has no significant effect using the k-sample Anderson-Darling test. For composite materials, it is recommended that batches (and panels where possible) be treated as natural groupings and tested for compatibility. Other groupings may result from expected behavior. Ply count might have a significant effect on ±45 shear test; thus specimens with different ply counts naturally fall into groupings for this test.The decision regarding grouping the data may also be affected by the purpose of the test program. As an example, consider the influence of strain rate on material properties. A test program may be designed to evaluate the effects of strain rate on a given property. That program would obtain data at selected and controlled values of strain rate. These would provide the natural grouping for the data. A subpopulation compatibility test could be used to determine if there was a significant effect; or a structured data approach, such as linear regression, could be used. 8.3.2.1 Notation for grouped data For structured data, each data value belongs to a particular group, and there will generally be more than one value within each group. Therefore, double subscripts will be used to identify the observations. Let the data be denoted by xij for i = 1, ..., k and j = 1, ..., ni, where i is the group and j is the observation within that group. There are ni data values in the ith of k groups. Then the total number of observations is
MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods n=n+n2+...+nk The distinct values in the combined data set,ordered from smallest to largest,is de- noted )z)...zL)where L will be less than n if there are tied observations. 8.3.2.2 The k-sample Anderson-Darling test The k-sample Anderson-Darling test is a nonparametric statistical procedure that tests the hypothesis that the populations from which two or more groups of data were drawn are identical.The test requires that each group be an independent random sample from a population.For more information on this pro- cedure,see Reference 8.3.2.2 The k-sample Anderson-Darling statistic is ADK=- n-1 (nFj-niHj)2 8.3.2.2(a) n2(k-1)i=ni Hj(n-Hi)-nhj/4 where h the number of values in the combined samples equal to zu H=the number of values in the combined samples less than zo plus one half the number of values in the combined samples equal to zu).and Fi=the number of values in the ith group which are less than z plus one half the number of values in this group which are equal tozu) Under the hypothesis of no difference in the populations,the mean and variance of ADK are approxi- mately 1 and n=Var(ADK)=an+bn2+cn+d 8.3.2.2(b) (n-1n-2)n-3)k-1)2 with a=(4g6)k-1)+(10-6g)S 8.3.2.2(c b=(2g4)k2+8Tk+(2g14T-4)S-8T+4g6 8.3.2.2(d) c=(6T+2g-2)k2+(4T-4g+6)k+(2T-6)S+4T 8.3.2.2(e) d=(2T+6)k2.4Tk 8.3.2.2(⑤ where s=1 8.3.2.2(g) i=1ni 8.3.2.2(h) and 21 g名三1- 8.3.2.20 If the critical value ADC=1+an1.645+0678.0362 k-1k-1 8.3.2.20) is less than the test statistic in Equation 8.3.2.2(a),then one can conclude(with a five percent risk of be- ing in error)that the groups were drawn from different populations.Otherwise,the hypothesis that the groups were selected from identical populations is not rejected,and the data may be considered unstruc- tured with respect to the random or fixed effect in question.Table 8.5.6 contains the critical values(Equa- tion 8.3.2.2(j))for the case of where all of the ni are equal.The example problem in Section 8.3.7.1,Step 2 demonstrates this procedure. 8-17
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-17 n = n1 + n2 + ... + n k. The distinct values in the combined data set, ordered from smallest to largest, is denoted z(1), z(2), ..., z(L), where L will be less than n if there are tied observations. 8.3.2.2 The k-sample Anderson-Darling test The k-sample Anderson-Darling test is a nonparametric statistical procedure that tests the hypothesis that the populations from which two or more groups of data were drawn are identical. The test requires that each group be an independent random sample from a population. For more information on this procedure, see Reference 8.3.2.2. The k-sample Anderson-Darling statistic is ADK = n-1 n (k-1) 1 n h (nF -n H ) H (n- H ) - nh / 4 2 i=1 k i j=1 L j ij i j 2 j jj ∑ ∑ L N M M O Q P P 8.3.2.2(a) where hj = the number of values in the combined samples equal to z(j) Hj = the number of values in the combined samples less than z(j) plus one half the number of values in the combined samples equal to z(j), and Fij = the number of values in the ith group which are less than z(j) plus one half the number of values in this group which are equal to z(j). Under the hypothesis of no difference in the populations, the mean and variance of ADK are approximately 1 and n 2 3 2 2 = Var(ADK) = an + bn + cn+ d (n-1)(n- 2)(n- 3)(k-1) σ 8.3.2.2(b) with a = (4g- 6)(k-1) + (10- 6g)S 8.3.2.2(c) b = (2g- 4) k + 8Tk+ (2g-14T- 4)S-8T+ 4g- 6 2 8.3.2.2(d) c = (6T+ 2g- 2) k + (4T- 4g+ 6) k+ (2T- 6)S+ 4T 2 8.3.2.2(e) d = (2T+ 6) k - 4Tk 2 8.3.2.2(f) where S = 1 i=1 n k i ∑ 8.3.2.2(g) T = 1 i i=1 n-1 ∑ 8.3.2.2(h) and g = 1 (n- i) j i=1 n-2 j=i+1 n-1 ∑ ∑ 8.3.2.2(i) If the critical value ADC = 1+ 1.645 + 0.678 k-1 - 0.362 k-1 σ n L N M O Q P 8.3.2.2(j) is less than the test statistic in Equation 8.3.2.2(a), then one can conclude (with a five percent risk of being in error) that the groups were drawn from different populations. Otherwise, the hypothesis that the groups were selected from identical populations is not rejected, and the data may be considered unstructured with respect to the random or fixed effect in question. Table 8.5.6 contains the critical values (Equation 8.3.2.2(j)) for the case of where all of the ni are equal. The example problem in Section 8.3.7.1, Step 2 demonstrates this procedure
MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods 8.3.3 Detecting outliers An outlier is an observation that is much lower or much higher than most other observations in a data set.Often outliers are erroneous values,perhaps due to clerical error,to the incorrect setting of environ- mental conditions during testing.or to a defective test specimen.Data should routinely be screened for outliers,since these values can have a substantial influence on the statistical analysis.In addition to the quantitative screening for outliers (Section 8.3.3.1),the data should also be examined visually,since no statistical procedure can be completely reliable for outlier detection. The Maximum Normed Residual(MNR)method is used for quantitative screening for outliers.This test screens for outliers in an unstructured data set.If the data can be grouped naturally into subgroups (due to batches,manufacturers,temperatures,and so on),then one should form the smallest subgroups possible and screen each of these separately.Data from compatible subgroups,based on the previous section,should be combined and the screening test performed on the larger group.Of course,data should only be pooled when it makes sense to do so.For example,batches of data for the same property and environmental condition can be combined,but tension and compression data should never be pooled. All values identified as outliers should be investigated.Those values for which a cause can be de- termined should be corrected if possible,and otherwise discarded.When error in data collection or re- cording are discovered,all data should be examined to determine whether similar errors occurred;these values should also be corrected or discarded.If no cause can be found for an outlier,it should be re- tained in the data set.If an outlier is clearly erroneous.it can be removed after careful consideration pro- vided that the subjective decision to remove a value is documented as part of the data analysis.If any observations are corrected or discarded,both the statistical outlier test and the visual inspection should be repeated. 8.3.3.1 The maximum normed residual The maximum normed residual(MNR)test is a screening procedure for identifying an outlier in an unstructured set of data.A value is declared to be an outlier by this method if it has an absolute deviation from the sample mean which,when compared to the sample standard deviation,is too large to be due to chance.This procedure assumes that observations which are not outliers can be regarded as a random sample from a normal population.The MNR method can only detect one outlier at a time,hence the sig- nificance level pertains to a single decision.Additional information on this procedure can be found in Ref- erences 8.3.3.1(a)and (b). Let xi,x2,..xn denote the data values in the sample of size n,and let x and s be the sample mean and sample deviation,defined in Section 8.1.4.The MNR statistic is the maximum absolute deviation, from the sample mean,divided by the sample standard deviation: MNR =maxli- i=1,2,n 8.3.3.1(a) i s The value of Equation 8.3.3.1(a)is compared to the critical value for the sample size n from Table 8.5.7. These critical values are computed from the following formula Cn-1 2 8.3.3.1(b) Vn Vn-2+t2 where t is the [1-a/(2n)]quantile of the t-distribution with n-2 degrees of freedom and a is the signifi- cance level.The recommended significance level for this test is a =0.05. 8-18
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-18 8.3.3 Detecting outliers An outlier is an observation that is much lower or much higher than most other observations in a data set. Often outliers are erroneous values, perhaps due to clerical error, to the incorrect setting of environmental conditions during testing, or to a defective test specimen. Data should routinely be screened for outliers, since these values can have a substantial influence on the statistical analysis. In addition to the quantitative screening for outliers (Section 8.3.3.1), the data should also be examined visually, since no statistical procedure can be completely reliable for outlier detection. The Maximum Normed Residual (MNR) method is used for quantitative screening for outliers. This test screens for outliers in an unstructured data set. If the data can be grouped naturally into subgroups (due to batches, manufacturers, temperatures, and so on), then one should form the smallest subgroups possible and screen each of these separately. Data from compatible subgroups, based on the previous section, should be combined and the screening test performed on the larger group. Of course, data should only be pooled when it makes sense to do so. For example, batches of data for the same property and environmental condition can be combined, but tension and compression data should never be pooled. All values identified as outliers should be investigated. Those values for which a cause can be determined should be corrected if possible, and otherwise discarded. When error in data collection or recording are discovered, all data should be examined to determine whether similar errors occurred; these values should also be corrected or discarded. If no cause can be found for an outlier, it should be retained in the data set. If an outlier is clearly erroneous, it can be removed after careful consideration provided that the subjective decision to remove a value is documented as part of the data analysis. If any observations are corrected or discarded, both the statistical outlier test and the visual inspection should be repeated. 8.3.3.1 The maximum normed residual The maximum normed residual (MNR) test is a screening procedure for identifying an outlier in an unstructured set of data. A value is declared to be an outlier by this method if it has an absolute deviation from the sample mean which, when compared to the sample standard deviation, is too large to be due to chance. This procedure assumes that observations which are not outliers can be regarded as a random sample from a normal population. The MNR method can only detect one outlier at a time, hence the significance level pertains to a single decision. Additional information on this procedure can be found in References 8.3.3.1(a) and (b). Let x1, x2, ... xn denote the data values in the sample of size n, and let x and s be the sample mean and sample deviation, defined in Section 8.1.4. The MNR statistic is the maximum absolute deviation, from the sample mean, divided by the sample standard deviation: MNR = max i | s , i = 1,2,...,n x -x i 8.3.3.1(a) The value of Equation 8.3.3.1(a) is compared to the critical value for the sample size n from Table 8.5.7. These critical values are computed from the following formula C = n-1 n t n- 2 + t 2 2 8.3.3.1(b) where t is the [1 - α/(2n)] quantile of the t-distribution with n - 2 degrees of freedom and α is the significance level. The recommended significance level for this test is α = 0.05
MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods If MNR is smaller than the critical value,then no outliers are detected in the sample;otherwise the data value associated with the largest value of x-xis declared to be a outlier. If an outlier is detected.this value is omitted form the calculations and the MNR procedure is applied again.This process is repeated until no outliers are detected.Note that the jth time that a sample is screened for an outlier,the mean,standard deviation,and critical value are computed using a sample size of n-j-1.It should be noted that for small samples,for example a batch containing five or six data,this procedure may identify most of the data as outliers,particularly if two or more of the values are identical. The example problem in Section 8.3.7.1,Step 1 demonstrates this procedure. 8.3.4 Basis values for unstructured data The method employed in calculating basis values for unstructured data depends on the distributional form which is assumed.Section 8.3.4 contains procedures for performing a goodness-of-fit test for the Weibull,normal,and lognormal distributions. As shown in Figure 8.3.1,it is recommended that the Weibull model be used if it adequately fits the data,even if other models apparently fit the data better.This preference for the Weibull distribution is based on two factors: 1.Theory suggests that the Weibull distribution is appropriate for the strength distribution of brittle materials such as composite fibers(see,for example,Reference 8.3.4(a). 2. The "Chain-of-Bundles"model for the strength of two-and three-dimensional unidirectional com- posites suggests that the Weibull model is appropriate for the strength distribution of such com- posites.This result is stated in References 8.3.4(b)and(c). If the Weibull model cannot be shown to adequately fit the data,then the normal and lognormal tests are performed in succession.If none of these three population models can be demonstrated to ade- quately fit the data,then nonparametric procedures should be used to compute basis values. The exploratory data analysis(EDA)techniques of Section 8.3.6 should also be used to graphically display the data,highlighting potential difficulties and providing graphical evidence of goodness-of-fit to support the quantitative conclusions of the tests in this section. 8.3.4.1 Goodness-of-fit tests Each distribution is considered using the Anderson-Darling test statistic which is sensitive to discrep- ancies in the tail regions.The Anderson-Darling test compares the cumulative distribution function for the distribution of interest with the cumulative distribution function of the data.The data are first converted to a common representation for the distribution under consideration.For example,for a normal distribution, the data are normalized to a mean of 0 and a standard deviation of 1.An observed significance level (OSL)based on the Anderson-Darling test statistic is computed for each test.The OSL measures the probability of observing an Anderson-Darling test statistics as least as extreme as the value calculated if the distribution under consideration is in fact the underlying distribution of the data.The OSL is the prob- ability of obtaining a value of the test statistic at least as large as that obtained if the hypothesis that the data are actually from the distribution being tested is true.If the OSL is less than or equal to 0.05,the hypothesis is rejected(with at most a five percent risk of being in error)and one proceeds as if the data are not from the distribution being tested. In what follows,unless otherwise noted,the sample size is denoted by n,the sample observations by x1,...,xn,and the sample observations ordered from least to greatest by x()x) 8-19
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-19 If MNR is smaller than the critical value, then no outliers are detected in the sample; otherwise the data value associated with the largest value of x x i − is declared to be a outlier. If an outlier is detected, this value is omitted form the calculations and the MNR procedure is applied again. This process is repeated until no outliers are detected. Note that the jth time that a sample is screened for an outlier, the mean, standard deviation, and critical value are computed using a sample size of n - j - 1. It should be noted that for small samples, for example a batch containing five or six data, this procedure may identify most of the data as outliers, particularly if two or more of the values are identical. The example problem in Section 8.3.7.1, Step 1 demonstrates this procedure. 8.3.4 Basis values for unstructured data The method employed in calculating basis values for unstructured data depends on the distributional form which is assumed. Section 8.3.4 contains procedures for performing a goodness-of-fit test for the Weibull, normal, and lognormal distributions. As shown in Figure 8.3.1, it is recommended that the Weibull model be used if it adequately fits the data, even if other models apparently fit the data better. This preference for the Weibull distribution is based on two factors: 1. Theory suggests that the Weibull distribution is appropriate for the strength distribution of brittle materials such as composite fibers (see, for example, Reference 8.3.4(a). 2. The "Chain-of-Bundles" model for the strength of two- and three-dimensional unidirectional composites suggests that the Weibull model is appropriate for the strength distribution of such composites. This result is stated in References 8.3.4(b) and (c). If the Weibull model cannot be shown to adequately fit the data, then the normal and lognormal tests are performed in succession. If none of these three population models can be demonstrated to adequately fit the data, then nonparametric procedures should be used to compute basis values. The exploratory data analysis (EDA) techniques of Section 8.3.6 should also be used to graphically display the data, highlighting potential difficulties and providing graphical evidence of goodness-of-fit to support the quantitative conclusions of the tests in this section. 8.3.4.1 Goodness-of-fit tests Each distribution is considered using the Anderson-Darling test statistic which is sensitive to discrepancies in the tail regions. The Anderson-Darling test compares the cumulative distribution function for the distribution of interest with the cumulative distribution function of the data. The data are first converted to a common representation for the distribution under consideration. For example, for a normal distribution, the data are normalized to a mean of 0 and a standard deviation of 1. An observed significance level (OSL) based on the Anderson-Darling test statistic is computed for each test. The OSL measures the probability of observing an Anderson-Darling test statistics as least as extreme as the value calculated if the distribution under consideration is in fact the underlying distribution of the data. The OSL is the probability of obtaining a value of the test statistic at least as large as that obtained if the hypothesis that the data are actually from the distribution being tested is true. If the OSL is less than or equal to 0.05, the hypothesis is rejected (with at most a five percent risk of being in error) and one proceeds as if the data are not from the distribution being tested. In what follows, unless otherwise noted, the sample size is denoted by n, the sample observations by x1, ..., xn , and the sample observations ordered from least to greatest by x(1), ..., x(n)
MIL-HDBK-17-1F Volume 1,Chapter 8 Statistical Methods 8.3.4.2 Two-parameter Weibull distribution In order to compute a basis value for a two-parameter Weibull population,it is first necessary to ob- tain estimates of the population shape and scale parameters.Section 8.3.4.2.1 contains a step-by-step procedure for calculating maximum likelihood estimates of these parameters.Calculations specific to the goodness-of-fit test for the Weibull distribution are provided in Section 8.3.4.2.2.The computational pro- cedure for calculating basis values using these estimates is outlined in Section 8.3.4.2.3.The example problem in Section 8.3.7.1 demonstrates these procedures.For further information on these procedures, see Reference 8.3.4.2. 8.3.4.2.1 Estimating the shape and scale parameters of a Weibull distribution The section describes the maximum likelihood method for estimating the parameters of the two- parameter Weibull distribution.The maximum-likelihood estimates of the shape and scale parameters are denoted B and &The estimates are the solution to the pair of equations: 0=0 aBn-- 8.3.4.2.1(a) and (Inxi-In@)=0 8.3.4.2.1(b) =1 Equation 8.3.4.2.1(a)can be rewritten as 8.3.4.2.1(c) By substituting Equation 8.3.4.2.1(c)into Equation 8.3.4.2.1(b),the following equation is obtained. n口 8.3.4.2.1(d n 8i= i=1 Equation 8.3.4.2.1(d)can be solved numerically for B,which can then be substituted into Equation 8.3.4.2.1(c)to obtain Figure 8.3.4.2.1 shows FORTRAN source code for three routines which compute the estimates of a and B by the method described above.WBLEST is a subroutine which returns the estimates of the pa- rameters,B and &FNALPH is a function which calculates the estimate of the scale parameter,& GFUNCT is a function which evaluates Equation 8.3.4.2.1(d).Arguments to WBLEST are a vector of length NOBS containing the data (input). NOBS the number of data values,n(input). BETA estimate of the shape parameter (output). ALPHA estimate of the scale parameter (output). The algorithm by which the FORTRAN code computes the estimates is described in the following para- graph. 8-20
MIL-HDBK-17-1F Volume 1, Chapter 8 Statistical Methods 8-20 8.3.4.2 Two-parameter Weibull distribution In order to compute a basis value for a two-parameter Weibull population, it is first necessary to obtain estimates of the population shape and scale parameters. Section 8.3.4.2.1 contains a step-by-step procedure for calculating maximum likelihood estimates of these parameters. Calculations specific to the goodness-of-fit test for the Weibull distribution are provided in Section 8.3.4.2.2. The computational procedure for calculating basis values using these estimates is outlined in Section 8.3.4.2.3. The example problem in Section 8.3.7.1 demonstrates these procedures. For further information on these procedures, see Reference 8.3.4.2. 8.3.4.2.1 Estimating the shape and scale parameters of a Weibull distribution The section describes the maximum likelihood method for estimating the parameters of the twoparameter Weibull distribution. The maximum-likelihood estimates of the shape and scale parameters are denoted β and α . The estimates are the solution to the pair of equations: n- x = 0 -1i=1 n i αβ β αβ β ∑ 8.3.4.2.1(a) and n - nln + ln x - x (ln x - ln )=0 i=1 n i i=1 n i i β α α α β ∑ ∑ L N M O Q P 8.3.4.2.1(b) Equation 8.3.4.2.1(a) can be rewritten as = x n 1 i=1 n i α β β ∑F H G G G G I K J J J J 8.3.4.2.1(c) By substituting Equation 8.3.4.2.1(c) into Equation 8.3.4.2.1(b), the following equation is obtained. n + ln x - n x x ln x = 0 i=1 n i i=1 n i i=1 n i i β β β ∑ ∑ ∑ 8.3.4.2.1(d) Equation 8.3.4.2.1(d) can be solved numerically for β , which can then be substituted into Equation 8.3.4.2.1(c) to obtain α . Figure 8.3.4.2.1 shows FORTRAN source code for three routines which compute the estimates of α and β by the method described above. WBLEST is a subroutine which returns the estimates of the parameters, β and α . FNALPH is a function which calculates the estimate of the scale parameter, α . GFUNCT is a function which evaluates Equation 8.3.4.2.1(d). Arguments to WBLEST are X = a vector of length NOBS containing the data (input), NOBS = the number of data values, n (input), BETA = estimate of the shape parameter (output), ALPHA = estimate of the scale parameter (output). The algorithm by which the FORTRAN code computes the estimates is described in the following paragraph.