Read "Reliability Issues for DOD Systems: Report of a Workshop" at NAP.edu

Page 70 Cite

Suggested Citation:"4. Further Discussion and Next Steps." National Research Council. 2002. Reliability Issues for DOD Systems: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/10561.

×

4
Further Discussion and Next Steps

This chapter begins by providing some general remarks concerning the collaboration of statisticians and DoD in carrying out research on reliability measurement. This is followed by discussion of the physics of failure, the need to develop and utilize reliability models that reflect the physical attributes of the materials used and the stresses to which these materials are subjected, and the need for procedures for identifying important modes and sources of failure in components and systems. The final section begins with a summary of some of the discussion at the workshop on appropriate ways to think about and quantify the uncertainties that accompany the statistical modeling of a complex system, and then presents highlights from the closing panel discussion and a summary of the general discussion that followed.

COLLABORATION BETWEEN STATISTICIANS AND THE DEFENSE COMMUNITY

The successful early collaboration between the statistics and defense communities was epitomized by the achievements of the Statistical Research Working Groups during World War II. Thereafter, academic research in experimental design, reliability estimation, and other areas at the interface of statistics and engineering was strongly supported by DoD. However, the level of collaboration has fallen off of late, possibly because the research was not fully targeted to DoD’s most pressing needs.

Page 71 Cite

Suggested Citation:"4. Further Discussion and Next Steps." National Research Council. 2002. Reliability Issues for DOD Systems: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/10561.

×

Nevertheless, academic and industrial research on reliability methods has continued, and substantial progress has been made in the last 20–25 years. Areas of recent progress include: (1) methods for combining information across test environments, including methods for incorporating subjective assessments; (2) fatigue modeling; (3) statistical methods for software engineering; (4) nonparametric or distribution-free methods, specifically for reliability growth, but also more generally (an important example being for variance estimation); (5) alternative methods for modeling reliability growth; (6) treatment of censored or missing data; (7) use of accelerated testing methods; and (8) greater use of physics-of-failure models and procedures that are helpful in identifying the primary sources of failure.

Chapter 3 presented an argument expressed by several workshop speakers: that DoD needs not only to upgrade the “tried and true” reliability methods that could be disseminated in a handbook, but also to stay abreast of methods on which current research is being carried out or for which the full extent of the applicability of recent methods to defense systems is still unclear. Application of these methods may still require greater resources, but many of them are likely to provide important, substantial advantages over current methods. The issue is how the test service agencies and other members of the test and evaluation community can gain easier access to contemporary reliability methods. As discussed at the workshop, one important way to address this issue would be to identify properties of a reference book that could be made available to help provide this linkage between the defense and statistical communities. The primary means suggested for accomplishing this was updating or redesigning the RAM Primer.

PHYSICS-OF-FAILURE MODELS AND METHODS FOR SEPARATELY MODELING FAILURES FROM DIFFERENT SOURCES

While no session was devoted specifically to either greater use of physics-of-failure models (i.e., models that directly represent the physical basis for failure) in modeling reliability for defense systems or methods for separately modeling failures due to distinct sources, these two related topics arose repeatedly during various workshop sessions. Several speakers supported the greater use of physics-of-failure models whenever possible to acquire a better understanding of the sources and effects of component and

Page 72 Cite

Suggested Citation:"4. Further Discussion and Next Steps." National Research Council. 2002. Reliability Issues for DOD Systems: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/10561.

×

system failure. Of course, models that make no use of direct understanding of specific failure modes can still be useful in certain broad contexts, but their validity is, generally speaking, more questionable.

To make progress in the development and implementation of physics-of-failure models will require the interaction of statisticians and other scientists. Some systems or components will not benefit from this type of approach since the physics underlying the phenomenon, for one reason or another, is not mature enough. For many different types of defense systems, however, even partial knowledge of the underlying mechanism for various failure modes can be extremely helpful to assist in the statistical estimation of reliability. Some areas addressed by the workshop in which these points were made were (1) fatigue modeling, to help apply the proper acceleration function; (2) early assessments of system reliability for PREDICT; (3) help in categorizing failure modes into types A and B in the Integrated Reliability Growth Strategy (IRGS); (4) improved understanding of whether combining information from developmental and operational test is reasonable; and (5) Meeker’s research linking developmental, operational, and field use by employing various acceleration models.

A related issue is separate modeling of the failure-time distributions for failures from different sources. This topic arose at the workshop in several somewhat unrelated contexts. First, IRGS considers separately fault modes that are characteristic of a mature component and those that are characteristics of a component still capable of further development, referred to as type A and B failure modes. Second, Frank Camm mentioned that failures in the field are overrepresented by poorly produced components, referred to as “bad actors,” possibly resulting from a poorly controlled manufacturing process. Third, Bill Meeker noted that some failures are unpredictable (possibly because of changes in the manufacturing process) and therefore in need of separate modeling, which he referred to as “special-cause failures” as distinguished from “common-cause” failures.

Two related notions—the separation of a system’s components into those that are mature and immature and the separation of the results of an industrial process into those systems that are indicative of the proper and improper functioning of the process—arose in these and other parts of the workshop. This separation of failures due to components or processes that are or are not functioning as intended is clearly worth greater investigation for its applicability to reliability analysis for defense systems.

There are problems involved in making this idea operational, that is, in designating which failures are due to mature or immature components, and

Page 73 Cite

Suggested Citation:"4. Further Discussion and Next Steps." National Research Council. 2002. Reliability Issues for DOD Systems: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/10561.

×

which production processes can be considered stable and which prototypes may have been poorly produced. However, an approach for estimating system or fleet reliability that would be consistent with this line of reasoning would be to analyze the pattern of failures from each source alone and separately model the impact on reliability. The expectation would be that decisions at the boundary would not make that much difference in such analyses.

A number of advantages could potentially result from this general approach. For example, better decisions could be made in separating out faulty designs from faulty processes. Also, some design failures might be attributed to a single component and easily remedied.

CONCLUDING PANEL SESSION AND GENERAL DISCUSSION

Models now exist or are being developed for representing how weapon systems work, such as missile intercept models and models of earth penetration. These models have input variables, for example, impact velocities. If one has a multivariate distribution for the inputs, one can run simulations and estimate a number of characteristics concerning model performance, such as system reliability. A partial means of understanding how useful these estimates are and how they should be compared or combined with real data is model validation.

Consider a computer model that is intended to simulate a real-world phenomenon, such as the functioning of a defense system. The validity of a computer model necessarily focuses on the differences between the model’s predictions, y*, and the corresponding observations of the phenomenon, y, that is, the prediction errors. To learn efficiently about the prediction errors, one designs an experiment by carefully choosing a collection of inputs, x, and then running the model, observing the system, and computing the prediction errors at those inputs. The prediction errors are then analyzed, often through development of a model of those errors as a function of the inputs. A candidate starting point for a prediction error model is that the prediction errors are normally distributed with parameters that are dependent on x; that is, y = y* + e_x, where e_x~ N (m_x, σ_x). The objective of the prediction error model is to characterize the e_x’s, which can be difficult to do since (1) x is typically high-dimensional; (2) the model may be numerically challenging and hence may take a long time to converge; (3) observing what actually happens can be extremely expensive, and as a result the number of separate experimental runs may be highly limited; and (4)

Page 74 Cite

Suggested Citation:"4. Further Discussion and Next Steps." National Research Council. 2002. Reliability Issues for DOD Systems: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/10561.

×

some of the input variables may be very difficult to control. A great deal of progress has been made on the simple linear version of this problem, but this is not the case for highly complicated nonlinear versions. Further, if one is required to make a prediction in a region of the x space where no tests have been run, one needs to extrapolate from the prediction error model, which necessitates a rather complete understanding of the underlying physics. This response-surface modeling could be equivalent in difficulty to building the computer model in the first place.

In the concluding panel session, Rob Easterling described some approaches that might address this problem: (1) leaving some x’s out of the model, (2) using simplified variable and parameter spaces, (3) using simplified prediction error models, and (4) using fractional two-level factorial experiments. Easterling said that it can be shown that

where the subscript x under the variance and expectation operators indicates averaging over x as it varies according to its multivariate distribution. Since m_x is likely to vary much less than y*, this equation can be typically modified as follows:

There is a large body of research exploring how much variance there is in y* given the random variation in the inputs, using such methods as Latin Hypercube sampling. However, that term is a good estimate of the variance of y only if the second term, the variance of the prediction errors, is negligible. This is currently a relatively unexplored area of research.

A second panelist, Steve Pollock, stated that the DoD community needs to determine how best to apply the various methods described at the workshop. Doing so would entail directly implementing some methods when applicable and otherwise tailoring them, if necessary, to DoD’s specific needs. Pollock added that additional workshops (structured similarly to this one) should be organized at regular intervals to help keep DoD abreast of recent advances in reliability methods.

A third panelist, Marion Williams, recommended that great care be exercised in using developmental test results in combination with opera-

Page 75 Cite

Suggested Citation:"4. Further Discussion and Next Steps." National Research Council. 2002. Reliability Issues for DOD Systems: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/10561.

×

tional test results. He suggested that the failure modes are so distinct that this linkage is unlikely to be useful for many systems. He added that he thought there was a place for Bayesian models in the evaluation of defense systems in development. A key issue for him is the justification of test sizes for operational testing. Williams’ comments elicited a discussion of the place of Bayes methods in operational evaluation. David Olwell said that the key issue is that priors need to be selected objectively. Francisco Samaniego added that a sensitivity analysis using an appropriately broad collection of possible priors would be especially important in DoD applications since assessment of the influence of prior assumptions should be part of the subsequent decision-making process concerning a system’s suitability.

A fourth panelist, Jack Ferguson (substituting for Hans Mark), returned to the theme of reliability management. He is convinced that testing and analysis must be moved upstream so that the system design is improved with respect to its operational performance as early as possible. These are the types of systems that generally work well in the field. Further, field data need to be used more often to update estimates of the costs of spares, maintenance, and so on.

Finally, we summarize discussions concerning the RAM Primer’s current value in disseminating state-of-the-art reliability methods to the DoD test and evaluation community and the possible form of an updated version. These discussions occurred primarily in the concluding panel session of the workshop.

The DoD test and evaluation community currently has limited access to expert statistical advice (see, e.g., National Research Council, 1998). It is typical (and has been for decades) in all four services for both operational test planning and operational test evaluation to be carried out by individuals with relatively limited statistical training. For this reason, the RAM Primer served an important function for many years in communicating omnibus, easily applied techniques for test planning and test evaluation with respect to measurement of system reliability, availability, and maintainability. The chapters of the RAM Primer cover basic definitions, reliability measures, test planning, reliability models and estimation, hypothesis testing and confidence intervals for reliability performance, data analysis, and reliability growth estimation. Also included are tables and charts for assistance in applying the methods described.

Steve Pollock pointed out a number of areas in which the RAM Primer is currently deficient. These include the lack of discussion of physics-of-

Page 76 Cite

Suggested Citation:"4. Further Discussion and Next Steps." National Research Council. 2002. Reliability Issues for DOD Systems: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/10561.

×

failure models, nonparametric and robust methods, variance estimation (e.g., jackknife, bootstrap), stress testing, accelerated testing, decision-analytic issues, repair and replacement policies, methods for dependent components, current methods in experimental design, and Bayesian approaches. (Since the application of physics-of-failure and Bayesian models is highly specific to the system at hand, it is not clear that any omnibus approaches to the use of these models could be represented in an updated RAM Primer. However, it might be helpful to suggest the utility of these models and provide a casebook of successful and unsuccessful applications of these models for estimating or evaluating the reliability of defense systems.) The view expressed was that the RAM Primer does not currently serve any set of potential users very well.

Many participants at the workshop believe that, 20 years after its last revision, the RAM Primer is substantially out of date. Jim Streilein observed that in many respects, it was already limited in its utility in 1982. One indication of its obsolescence is that it contains a large section on statistical tables and graphs that provide critical values for tests, whereas today a modest amount of embedded software would provide better information. Other documents may also be obsolete; recall that Paul Ellner called for the updating of Military Handbook 189, on reliability growth modeling. Streilein was concerned more broadly about the training of tomorrow’s reliability analysts.

To address this problem, a number of speakers strongly argued that the RAM Primer should be fully updated, possibly in a substantially different format. The suggestion was that a small planning group be charged with responsibility for deciding the goals and form of a new RAM Primer. The possibilities include (1) a primer, (2) a self-contained introductory text, (3) a set of standards, (4) a handbook, (5) a set of casebooks, and (6) a state-of-the-art reference book that well-educated and well-trained professionals could use to remind themselves of various methods. The form selected depends to some extent on whether the users are likely to be inexperienced junior analysts or experienced analysts. Perhaps several of these forms could be developed simultaneously. One interesting suggestion was for the RAM Primer to be a web-based document with embedded software or be linked to interactive software to carry out the variety of calculations necessitated by modern methods.¹ Taking this approach would provide greatly ex-

¹	An example of a handbook constructed in ths manner, the National Institute of Standards and Technology/SEMATECH Engineering Statistics Handbook, can be found at http://www.itl.nist.gov/div898/handbook/toolaids/sw/index.htm.

Page 77 Cite

Suggested Citation:"4. Further Discussion and Next Steps." National Research Council. 2002. Reliability Issues for DOD Systems: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/10561.

×

panded capabilities to the user with only modest demands for understanding the underlying theory.

Chapter 9 of the RAM Primer describes methods for modeling reliability growth. Current practice in DoD acquisition (i.e., absent the methods described in Chapter 2) makes it likely that a complex defense system will enter into the latter stages of development with system reliability substantially lower than that expected upon maturation. The expectation is then that system reliability will be improved through a series of steps of test, analyze, and fix. Typically for the latter stages of development, a limited amount of time is allocated for testing. To formally quantify system reliability at some point in time, one could rely solely on the results from the last testing carried out. However, the limited amount of testing implies less precise estimation than would be possible if more of the pattern of system reliability were somehow used. A key challenge here is that reliability under operational conditions differs from that under laboratory conditions, and appropriate linkage of the two is essential for the best estimation.

Several participants argued for greater use of reliability growth models that are consistent with the maturation process of test, analyze, and fix. Since Chapter 9 of the RAM Primer is focused on applying the power law process to all reliability growth problems, one objective of a revised RAM Primer would be either to augment the presentation there with a description of alternatives or to focus the presentation on these newer models. In this way, defense reliability growth modeling would (more often) take into account explicitly the process of fixing the faults found in testing.

SUMMARY

Ernest Seglie, the fifth panelist, provided a thorough summary of the workshop. First, the RAM Primer is antiquated, and a more useful tool needs to be developed. Second, the reason many systems are deficient with respect to their reliability when tested or fielded is in the management of the system development process. In particular, it must be understood that a change in taking a system to a new environment, using a new manufacturing process, or employing new users will cause a break in the reliability growth curve. Analysis should be able to illuminate how these various changes will affect system reliability. One related problem is that almost all of the operational testing for a system is clustered immediately before the decision point on proceeding to full-rate production. This result is a tre-

Page 78 Cite

Suggested Citation:"4. Further Discussion and Next Steps." National Research Council. 2002. Reliability Issues for DOD Systems: Report of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/10561.

×

mendous amount of risk for that decision. This is another aspect of the management problem.

Third, the problem with reliability growth modeling is not only the weakness of specific approaches, but also the inability to assess the uncertainty of predictions, an area in which statisticians need to make progress. Fourth, with respect to life-cycle costs, the current inability to forecast a system’s costs of ownership correctly is bankrupting the military. It is crucial that DoD improve its ability to forecast system life-cycle costs.

Fifth, with respect to the development of models for combining information, the linkage required in relating developmental test performance to operational test performance is not the only requirement. It is also important to link test performance to performance in the field.

Finally, Seglie agreed that it is important to get the science underlying reliability assessment right. Therefore, it is important to understand and use models of the physics of failure. A related need is understanding of the impact of “bad actors” on models and estimates. To make progress in this regard, it is important for statisticians and the relevant scientists to work together closely.

The concern was expressed at the workshop that much of the methodological progress described by the participants is not represented in the reliability assessments of defense systems. This is unfortunate since many of these methods offer substantial benefits relative to those used in the 1970s and 1980s. The newer methods are often more efficient, which is important as data collection becomes more expensive. They also make better use of datasets with either subjective elements, censoring, or missing data, again providing more reliable estimates for the same amount of data. Finally, they offer greater flexibility in handling alternative distributional forms, and as a result, the estimates derived are often more trustworthy.

Reinstitution of more active collaboration between the statistical and defense acquisition communities, along with leading to better statistical methods, could also increase the number of academics interested in the most pressing problems faced by the defense acquisition community. In addition, such collaboration could increase the chances of attracting highly trained statisticians to careers in defense testing and acquisition. Many participants in the workshop were strongly in favor of greater interaction between the two communities.