Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 41
--> 2 On the Performance of Weibull Life Tests Based on Exponential Life Testing Designs Francisco J. Samaniego and Yun Sam Chong, University of California, Davis 1. Exponential Life Testing Applications abound in which investigators seek to make inferences about the lifetime characteristics of a ''system'' of interest from data on the failure times of prototypical systems placed on test. There are a good many different experimental designs that might be considered in planning a given life testing application; often, some form of data censoring (aimed at bounding the experiment's duration) or some sequential procedure (aimed at possibly resolving the test based on early failures) are part of the test plan. The analysis of life testing data is usually preceded by the setting of assumptions regarding the underlying probability distribution of system lifetimes. Among the most studied parametric life testing models are the exponential, gamma, Weibull, Pareto and lognormal families (see Lawless, 1982); nonparametric analyses under various assumptions on the distribution's hazard function or residual lifetime characteristics have also been developed (see Barlow and Proschan, 1975; Hollander and Proschan, 1984). By far, the most comprehensive development of exact statistical procedures in life testing has occurred under the assumption of exponentiality. For virtually all other assumed models, the analysis of failure time data involves extensive use of numerical optimization methods and asymptotic approximations. The exact performance of tests and estimates developed under nonexponential assumptions has, for the most part, resisted analytical treatment, and has thus been studied mostly via simulation. The temptation to use exponential life testing methods is no doubt due, in part, to the marked lack of success in dealing with the theoretical properties of nonexponential life testing in a definitive way. The ease with which relevant distribution theory
OCR for page 42
--> (especially that involving ordered failure times) can be produced, and the occasional "conservatism" of the exponential assumption, have also contributed to its popularity, in spite of its notorious nonrobustness. It is important to acknowledge that the exponential assumption is very special and highly restrictive, so that its use should be discouraged except in circumstances in which there is good physical, empirical and practical support for the model. In due course, we will review the basics of exponential life testing, both to make the present paper self-contained and to set the stage for the various comparisons we wish to make with alternative analyses. First, however, we will describe the type of problem—a sort of statistical hybrid—on which the present investigation is focused. Suppose a statistician is faced with an application in which two hypotheses concerning the mean life µ of a new system are to be tested. He wishes to resolve the test of H0: µ = µ0 against the alternative H1 : µ = µ1, where µ1 < µ0 are fixed and known, with certain predetermined probabilities α and β for type I and type II errors (also often called the producer's and consumer's risks). Having no pressing reason to doubt exponentiality in the application at hand, the statistician determines (using the Department of Defense's Handbook H108, for example; U.S. Department of Defense, 1960) that these goals can be accomplished with an experimental design calling for some specific number of observed failures (say r), rejecting H0 in favor of H1 if the total time on test Tat the time of the rth failure is less than the threshold T0. Among the advantages afforded by an exponential life test plan is the fact the the resources required to perform the test (that is, the number of systems that must be placed on test and the maximum amount of testing time needed to resolve the rest) may be calculated in advance. The fact that the duration of the test, in real time, can be controlled and made suitably small by placing n > r systems on test while still resolving the test upon the rth failure is also an important advantage. Consider, now, the analysis stage of this life testing experiment. Suppose that when the data have been collected, their characteristics suggest that they are definitely not exponential. It then falls upon the statistician to analyze the available data under some alternative model or, perhaps, nonparametrically. Let us suppose, as will be tacitly assumed in the sequel, that the two-parameter Weibull distribution is taken as an appropriate underlying model for the
OCR for page 43
--> experiment in question. It is then incumbent upon the statistician to test the means µ0 vs µ1 under the Weibull assumption. The goal of this paper is to examine the consequences of this paradigm shift. We will study the resultant error probabilities associated with the Weibull test, and will explore the potential that exists for resource savings (smaller sample sizes, less testing time) when the Weibull model is entertained during the design stage rather than only at the analysis stage of the experiment. Our study has enabled us to identify the circumstances under which rather substantial resource savings are possible. For a study which examines similar questions in the contrast of interval estimation, see Woods (1996). We now turn to a brief description of the mechanics of exponential life testing. Let us first suppose that a sample X1...,Xr of system lifetimes is available for observation, and that these data are independent and identically distributed according to the exponential distribution Exp(θ) with density function For short, we will write . The statistic , which may be described as the "total time on test" for the r systems taken together, is a sufficient statistic for θ and is distributed according to the gamma distribution Γ (r,θ) with density function We use the subscript r : r to reflect the fact that the experimental design calls for sampling r failure times out of a random sample of size r. The standard estimate of θ based on Tr:r is the sample mean which is both the maximum likelihood estimate and the minimum variance unbiased estimate of θ. For any fixed α (0,1), the best test of size α for testing H0 : θ = θ0 vs H1 : θ = θ 1 < θ0 is the test which rejects H0 if and only if , where c is determined by the equation
OCR for page 44
--> Since, given θ = θ0, 2T/θ0 is distributed as variable, it is clear that the threshold for rejection is given by , where is such that when . The test which rejects H0 when is, in fact, uniformly most powerful for testing against H1 : θ < θ0, and, in particular, maximizes the power (or minimizes the "consumer's risk" β) at the alternative θ = θ1. If we assume that the levels of α and β are fixed and determined in advance, then it remains to find the sample size r for which these levels obtain. Since r must satisfy the equation it follows that the required sample size is the smallest integer r = r0 for which The fact that the required sample size r0 is completely determined by the values of α, β and the "discrimination ratio" θ1/θ0 is a special feature of exponential life testing that facilitates the automated application of this methodology. Once a sample size r = r0 is obtained through (1.7), the rejection threshold c in (1.4) may be represented as The constant c/θ, which is independent of model parameters, will appear in several of our tabulations as the multiplier which, together with the θ0 of interest, determines the rejection threshold of the desired test. Execution of the exponential life test above is perhaps most easily described in terms of the "total time on test" (TTT) function. If X(1) <,...,< X(r) are the ordered failure times in our sample of size r, then the TTT function may be written, for as
OCR for page 45
--> where and j = 0, 1,...,r -1. The TTT function keeps track of the total amount of test time logged by working systems up to a fixed time t. Clearly The TTT function is itself a useful tool in reliability modeling. Plots involving a rescaled version of this function will be discussed in the next section. Returning to the problem of testing H0: θ = θ0 vs H1 : θ = θ1, we note that the test may be resolved as follows: if the rth failure occurs before the total time on test exceeds the threshold r0c, that is, if Tr:r(X(r)) < r0c, then H0 is rejected in favor of H1; otherwise, H0 is accepted. In the latter case, the experiment is completed at time t0, where while in the former case, the experiment is terminated at time t = X(r) < t0. Thus the threshold r0c, with c given in (1.8), represents the maximum total test time that could be required to resolve the test, that is, to be able to accept or reject H0 on the basis of the data. Together, r0 and c describe the total resources that must be committed to guarantee successful completion of the life test. Extension of the above discussion to type II censored data is immediate. If , and if the experiment is terminated upon the occurrence of the rth failure, then the statistic is sufficient for θ. Moreover, since Tr:n has precisely the same distribution as Tr:r, that is, since the best test of H0 : θ = θ0 vs H1 : θ = θ1 has the same form as before, that is, rejects H0 if where is the MLE (and UMVUE) of θ.
OCR for page 46
--> Similarly, the sample size required to resolve this test, given set values for α, β and θ1/θ0, is r0 derived via (1.7), and the maximum total testing time needed is again the constant r0c, where c is given in (1.8). The number n of systems on test influences test performance only with regard to the test's duration. Let us expand the definition of the total time on test function to accommodate the case of type II censoring; for , define Then, under type II censoring, the experiment is terminated at time t = X(r) if or, otherwise, at time t = t0, where t0 satisfies the equation It is easy to see that the random time min(X(r0), t0 (X)) at which the experiment is terminated is bounded above by the factor r0c/(n-r 0). Thus, the waiting time until the test is completed can be made suitably small for any fixed r0 by choosing the sample size n sufficiently large. This strategy of course is based on a tacit assumption of the correctness of the exponential model in the application of interest; when exponentiality fails, this practice can yield highly misleading results. There are a host of other experimental designs for exponential life testing, including type I censoring (that is, censoring at a fixed time t), random record designs (that is, observing only record breaking failure times) and sequential designs. The type of study which will be pursued in this paper can be carried out analogously for other designs, but we have chosen to focus exclusively on complete and type II censored data. This choice is motivated by the fact that these two designs are frequently encountered in practice and also by our belief that the general lessons learned from analyzing these particular designs will hold more broadly. For example, the distribution theory developed in Samaniego and Whittaker (1986) shows that inverse sampling from an exponential distribution until the occurrence of the rth record value (that is, successive minimum) yields a test statistic (again, the total time on test) that has properties identical to those of the designs mentioned above. In particular, the resources required to resolve testing problems for predetermined values of α, β θ0 are again given by the pair (r0, c) of (1.7) and (1.8).
OCR for page 47
--> Instead of pursuing greater breadth in the designs considered, we will direct our efforts at examining two particular designs (complete samples and type II censoring) in depth. As a guide for military applications of exponential life testing, DoD Handbook H108 provides tabled values of the required sample size r0 and the constant c/θ0 through which the total test time required by a particular application can be computed. An excerpt from Table 2B-5 of that Handbook, showing the five tabled values given corresponding to error probabilities α =. 1 and β = 1, appears in Table 1. If, for instance, one wishes to test H0: θ = 1,000 hrs vs H1: θ = 500 hrs, and one sets α = .1 = β, then Table 1 indicates that 15 or more systems should be put on test, and that a total test time required to ensure resolution of the test is 15(.687)(1,000) = 10,305 hrs. Before proceeding with our study of alternatives to exponential life tests, we briefly review what is known about their lack of robustness. Of special interest to us is the behavior of exponential life tests when the underlying distribution is a nonexponential Weibull, since it is then that the procedures we investigate in the sequel stand to provide improved performance. We thus restrict ourselves to this particular circumstance and describe the findings of Zelen and Dannemiller (1961), who studied the performance of exponential life tests for Weibull data in exhaustive detail. In that paper, four specific life testing designs were studied: complete samples, type II censored samples, truncated type II censored samples, and samples obtained sequentially. We quote from Zelen and Dannemiller's discussion section: None of the four life testing procedures studied in this paper is robust with respect to Weibull alternatives. In particular, the censored life test and the truncated nonreplacement test are strikingly non-robust. It is obvious from the graphs of the O.C. curves that lots having low mean failure time have a high probability of acceptance when the failure times follow a Weibull distribution with shape parameter p > 1. This tendency is increased as p increases.... We have tried to show that dogmatic use of life testing procedures without a careful verification of the assumption that failure times follow the exponential distribution may result in a high probability of accepting "poor quality" equipment.
OCR for page 48
--> In the case of complete and type II censored samples, the operating characteristics plotted by Zelen and Dannemiller indicate the extent to which the risk of a high probability of acceptance of a hypothesized mean of 1,000 occurs at mean values less than 1,000. The performance of the exponential test of H0: θ = 1,000 vs H1: θ = 500 at the nominal values α = .1 = β is shown there to deteriorate as the Weibull shape parameter increases from 1 to 3. It is interesting to note that at θ = 500 and θ = 1,000, the probabilities α and β of error actually decrease in the complete sample setting; this is a manifestation of the conservative nature of these tests. Since Weibull distributions with shape parameter greater than I are lighter tailed than the exponential, these distributions are more tightly concentrated about the mean, rendering it easier to distinguish between two candidate mean values on the basis of a Weibull sample. For complete samples, the nonrobustness of which Zelen and Dannemiller write becomes evident as the mean value at which the probability of accepting H0 is being computed moves toward the null value 1,000 from the alternative value of 500. At θ = 750, for example, the probability of accepting H0: θ = 1,000 goes from .615 under exponentiality to .837 under a Weibull distribution with shape parameter equal to 3. In spite of this type of inflation, it is clear that exponential life tests carried out with complete samples offer reasonable performance in that even under rather severe departures of the Weibull type, they deliver error probabilities at selected key parameter values θ0 and θ1 that are smaller than those set at the planning stage. The question that will interest us as we proceed is: since the achieved α and β levels are both lower than planned for or required, what savings might be possible with a test that is calibrated to achieve the nominal values of α and β when the data are Weibull? The case of censored samples is markedly different from the above. In an example involving n = 28 systems on test with censoring at the 14th failure, Zelen and Dannemiller note that the probability of acceptance of H0: θ = 1,000 is exceedingly high for all potential mean values between the alternative 500 and the null 1,000. Remarkably, the probability of accepting the null hypothesis of mean 1,000 when the true mean is 500 is .985 when the sample is drawn
OCR for page 49
--> from a Weibull distribution with shape parameter 3. Even when the shape parameter is 1.5, this probability is unduly high (.463). The lessons to be learned from the phenomena documented above include (1) exponential life testing based on complete samples works fairly well in a Weibull environment, but there should be opportunities for saving resources when that environment is recognized in advance; and (2) exponential life testing based on censored samples works very poorly in a Weibull environment, and alternative procedures should be considered when the exponential assumption is suspect. The sequel is largely devoted to the study of ways of addressing these two issues. Before proceeding, let us make special mention of the scope of this paper, and its attendant limitations. We have begun by discussing exponential life testing based on complete or type II censored samples. In sections 3 and 4, we will develop a comparable analysis under the assumption that the underlying distribution of the observable failure time data is, instead, a nonexponential Weibull. Both analyses assume that it is a random sample of items simultaneously and independently placed on test. Because of the memoryless property of the exponential distribution, exponential life testing methods can be validly applied (assuming the model is appropriate) to data on time between failures of repairable systems by treating time between failures as independent exponential observations. Such an extension will not generally be valid under Weibull assumptions. In the latter case, the alternative analysis developed in this paper would be applicable only when each repair following an observed failure could reasonably be considered "perfect" in the sense of restoring the item to its condition when new. When such an assumption cannot be justified, the appropriate reanalysis of data should be based on a more elaborate modeling of the failure process, perhaps as a nonhomogeneous Poisson process. Nonparametric alternatives in this setting have been developed by Nelson (1995) and by Lawless and Nadeau (1995) and have been shown to work very well in a variety of applications (without the restrictive NHPP and independence assumptions). Such analyses lie beyond the scope of the present paper. Other issues not covered in the present report include the treatment of systems with multiple failure modes and the treatment of accelerated life testing data. Parallel developments
OCR for page 50
--> in those areas, where Weibull alternatives to exponentiality assumptions are developed, would certainly be worthwhile. 2. Weibull Considerations The Weibull distribution is arguably the most popular parametric alternative to the exponential distribution in reliability applications. Like the gamma model, it contains the exponential distribution as a special case, so that the adoption of a Weibull assumption represents a broadening from the exponential model rather than a rejection of it. Often, statistical extreme value theory forms the basis for the applicability of the Weibull model; when system failure can be attributed to the failure of the weakest of its many components, the Weibull model will tend to describe failure data quite well. The parametrization we will employ for the Weibull is as follows: X has a Weibull distribution with parameters A > 0, B > 0 (henceforth denoted as X~ W(A,B)) if X has distribution function and density function where A is the "shape" parameter and B1/A the scale parameter of the distribution. The mean and variance of X ~ W(A,B) can be written as: and The coefficient of variation cv = σ/µ is independent of the parameter B and may be written as
OCR for page 51
--> It is apparent from (2.2) that the W(1,B) distribution is simply the exponential distribution Exp(B). A more interesting and valuable connection between the Weibull and exponential models is the fact that if X ~ W(A,B), then XA ~ Exp(B). There is a rather substantial literature on modeling and inference involving the Weibull distribution. A keyword search of the Current Index to Statistics, volumes 1-19 (American Statistical Association, 1975 to 1995), shows that there were 647 articles published in statistics journals between 1975 and 1993 on Weibull-related topics. Much of this literature deals with estimation issues, with goodness of fit questions, with separate families tests (for example, testing Gamma vs Weibull) or with robustness issues. Good overviews on estimation and testing procedures may be found in the recent books by Lawless (1982), Sinha (1987) and Bain and Engelhardt (1991). Other references with extensive discussion of inference for the Weibull distribution include Mann, Shafer and Singpurwalla (1972), Sinha and Kale (1979) and Nelson (1982 and 1990). Of particular interest to us are testing procedures which seek to distinguish between two mutually exclusive collections of Weibull models. In the sequel, we will examine and compare various approaches to testing competing hypotheses about a Weibull mean. The literature on this latter problem is rather sparse. When the shape parameter is assumed known, the test of interest can be executed easily after transforming the data into exponential variables. With the scale parameter known, Bain and Weeks (1965) developed tests and confidence intervals for the unknown shape parameter. For the general problem, when both A and B are unknown, there is rather limited guidance on how to proceed. Thoman, Bain and Antle (1969) have developed MLE-based confidence intervals for each parameter when the other parameter is unspecified. However, it is known that large sample methods based on the asymptotic behavior of maximum likelihood estimates behave rather poorly for small and moderate samples (see Lawless, 1975). Likelihood ratio tests for
OCR for page 113
--> Figure 5 Fifth simulation. FIGURE 6 Sixth simulation.
OCR for page 114
--> Figure 7 First plot. Figure 8 Second plot.
OCR for page 115
--> Figure 9 Third plot. Figure 10 Fourth plot.
OCR for page 116
--> Figure 11 Fifth plot. Figure 12 Sixth plot.
OCR for page 117
--> Figure 13 Seventh plot. Figure 14 Eighth plot.
OCR for page 118
--> Figure 15 Level curves for TTTR when α = β = .1.
OCR for page 119
--> References Abramowitz, M. and Stegun, I.A. (eds.) 1964 Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Wiley and Sons. Alfers, D., and Dinges, H. 1984 A normal approximation for beta and gamma tail probabilities. Zeit. Wahr. 65:399-419. American Statistical Association 1975 Current Index to Statistics: Volumes 1-19. Alexandria, Va.: American Statistical -93 Association. Anderson, T.P. 1994 Current Issues Concerning Reliability Estimation in Operations Test and Evaluation Unpublished Master's Thesis, Naval Postgraduate School, Monterey, Calif. Bain, L.J., and M. Engelhardt 1991 Statistical Analysis of Reliability and Life Testing Models, Theory and Methods. Second edition. New York: Dekker. Bain, L.J., and D.L. Weeks 1965 Tolerance limits for the generalized gamma distribution. Journal of the American Statistical Association 60:1142-1152. Barlow, R.E. 1979 Geometry of the total time on test transformation. Naval Research Logistics Quarterly 26:393-402. Barlow, R.E., D. Bartholomew, J. Bremner, and H. Brunk 1972 Statistical Inference Under Order Restrictions. New York: John Wiley and Sons.
OCR for page 120
--> Barlow, R.E., and R. Campo 1975 Total time on test processes and applications to failure data analysis. Pp. 451-481 in R.E. Barlow, R. Fussell, and N.D. Singpurwalla, eds., Reliability and Fault Tree Analysis. Philadelphia, Pa.: SIAM. Barlow, R.E., and F. Proschan 1969 A note on tests for monotone failure rate based on incomplete data. Annals of Mathematical Statistics 40:595-600. 1975 Statistical Theory of Reliability and Life Testing. New York: Holt, Reinhart and Winston. Barlow, R.E., R.H. Toland, and T. Freeman 1988 A Bayesian analysis of the stress-rupture life of Kevlar/Eposy spherical pressure vessels. In C. Clarotti and D. Lindley, eds., Accelerated Life Testing and Experts' Opinions in Reliability. Lindley, Amsterdam: North-Holland. Chandra, M., and N.D. Singpurwalla 1981 Relationships between some notions which are common to reliability theory and economics. Math. Operational Research 6:113-121. Chernoff, H., and G. Lieberman 1956 The use of generalized probability paper for continuous distributions. Annals of Mathematical Statistics 27:806-818. Cramér, H. 1946 Mathematical Methods of Statistics. Princeton: Princeton University Press. Gong, G., and F.J. Samaniego 1981 Pseudo maximum likelihood estimation: Theory and methods. Annals of Statistics 9:861-869. Hardy, G., J. Littlewood, and G. Pólya 1929 Some simple inequalities satisfied by convex functions. Messenger Mathematics 58:145-152.
OCR for page 121
--> Hollander, M., and F. Proschan 1984 Nonparametric concepts and methods in reliability. Pp. 613-655 in P.R. Krishniah, and P.K. Sen, eds., Handbook of Statistics, Volume 4: Nonparametric Methods. Amsterdam: North-Holland. Klefsjö, B. 1980 Some tests against aging based on the total time on test transform. Statist. Res. Report No. 1979-4, University of Umeå (Umeå, Sweden). Lawless, J.F. 1975 Construction of tolerance bounds for the extreme value and Weibull distributions. Technometrics 17:255-261. 1982 Statistical Models and Methods for Lifetime Data. New York: John Wiley and Sons. Lawless, J.F., and C. Nadeau 1995 Some simple robust methods for the analysis of recurrent events. Technometrics 37:158-168. Luenberger, D. 1989 Linear and Nonlinear Programming. Second edition. Reading, Mass.: Addison Wesley. Mann, N.R., R.E. Schafer, and N.D. Singpurwalla 1974 Methods for Statistical Analysis of Reliability and Lifetime Data. New York: John Wiley and Sons. Marshall, A., and I. Olkin 1979 Inequalities: The Theory of Majorization and Its Applications . New York: Academic Press. Nair, V. 1984 On the behavior of some estimators from probability plots. Journal of the American Statistical Association 79:823-831.
OCR for page 122
--> Neath, A., and F. Samaniego 1992 On the total time on test transforms of an IFRA distribution. Statistics and Probability Letters 14:289-291. Nelson, W. 1982 Applied Data Analysis. New York: John Wiley and Sons. 1990 Accelerated Testing: Statistical Models, Test Plans and Data Analyses. New York: John Wiley and Sons. 1995 Confidence limits for recurrence data—Applied to cost a number of product repairs. Technometrics 37:147-157. Neyman, J. 1959 Optimal asymptotic tests of composite statistical hypotheses. In U. Grenander, ed., Probability and Statistics, the Harold Cramér Volume. New York: John Wiley and Sons. Press, W., S. Teukalsky, W. Vetterling, and B. Flannery 1992 Numerical Recipes in C: The Art of Scientific Computing. Second edition. Cambridge, U.K.: Cambridge University Press. Rolph, John E. and Duane L. Steffey, eds. 1994 Statistical Issues in Defense Analysis and Testing: Summary of a Workshop. Committee on National Statistics and Committee on Applied and Theoretical Statistics, National Research Council. Washington, D.C.: National Academy Press. Samaniego, F.J. 1993 On the Needs of the DoD Testing Community and the Expertise in the Statistical Research Community: A Look at the Interface. Technical Report #286, Division of Statistics, University of California, Davis, Calif. Samaniego, F.J., and L.R. Whittaker 1986 On estimating population characteristics from record-breaking observations: I. parametric results. Naval Research Logistics Quarterly 33:531-543.
OCR for page 123
--> Sinha, S.K. 1987 Reliability and Life Testing. New York: John Wiley and Sons. Sinha, S.K., and B.K. Kale 1979 Life Testing and Reliability Estimation. New Delhi: Wiley Eastern Limited. Thoman, D.R., L.J. Bain, and C.E. Antle 1969 Inferences on the parameters of the Weibull distribution. Technometrics 11:445-460. U.S. Department of Defense 1960 Handbook H108: Sampling Procedures and Tables for Life and Reliability Testing (Based on the Exponential Distribution). Washington, D.C.: U.S. Department of Defense. Woods, W.M. 1996 Using wearout information to reduce reliability demonstration test time, Proceedings of the First Annual U.S. Army Conference on Applied Statistics, Army Research Laboratory, Adelphi, MD, Publication QRL-SR-43. Zelen, M., and M. Dannemiller 1961 The robustness of life testing procedures derived from the exponential distribution. Technometrics 3:29-49.
Representative terms from entire chapter: