Skip to main content

Currently Skimming:

7 Assessing Operational Suitability
Pages 101-126

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 101...
... . For purposes of test and evaluation, operational suitability is defined officially in DoD Instruction 5000.2 as "the degree to which a system can be placed satisfactorily in field use with consideration given to availability, compatibility, transportability, interoperability, reliability, wartime usage rates, maintainability, safety, human factors, manpower supportability, logistics supportability, natural environmental effects and impacts, documentation, and training requirements." 2These estimates, in constant fiscal 1994 dollars, are provided in Annex D of the Longbow Apache Test and Evaluation Master Plan, which cites December 1993 estimates from the Longbow Program Office and the President's fiscal 1995 budget as the original sources.
From page 102...
... The challenges faced in designing and interpreting results of operational tests underscore the need for, and the potential value of, applying sound statistical methodology. For virtually every aspect of operational suitability, fundamental statistical questions arise concerning the duration and conditions of testing, the measurement and processing of suitability data, and the use of information from other sources, such as developmental tests and simulation models, in the design and subsequent analysis of operational tests.
From page 103...
... Such activities are currently undertaken in the defense testing community with varying degrees of frequency and effectiveness. Although the panel has gathered information on, and has had substantial exposure to, many different aspects of the military services' practices in assessing the suitability of a potential defense procurement, we have restricted our formal review to the elements of operational suitability in which the statistical issues appear most prominently: reliability, availability, and maintainability.
From page 104...
... the subjective scoring of "mission critical" failures in processing data for RAM evaluation; 5. current and potential uses of multiple sources of RAM information in the design of operational tests and in the analysis of test results, including the collection and use of field data for continuing RAM assessment after a system is acquired; and 6.
From page 105...
... In such cases, particularly in force-on-force tests, there is little hope of predicting accurately the amount of RAM data that will be obtained during the operational test event and, consequently, little opportunity to use statistical methods in planning the test to meet RAM information requirements. A common approach in operational testing seems to be to do the testing necessary to assess effectiveness and accumulate the requisite amount of RAM data in a resourceful manner.
From page 106...
... Formal statistical approaches in designing these events which can be called "operational suitability tests" will ensure more rigorous assessment of key RAM issues. Furthermore, statistical test design concepts provide a conceptual structure within which costs and benefits of data collection can be weighed and test resources can be allocated optimally.
From page 107...
... Moreover, id is significantly larger than the corresponding MTTF of p~ hours for an existing component that would be replaced by the prospective component if it 4we are ignoring for the purposes of this discussion, the difference between mean time between failure and mean time between operational mission failure. This difference relates to the issue of combining developmental and operational test data in that they are distinct concepts.
From page 108...
... We also consider the potential for making more reliable inferences when information from various sources is fully exploited to evaluate the aptness of proposed statistical models, and we suggest reasonable alternatives. Exponential Life Testing: Current Uses and Limitations One particular statistical model has been quite widely used for RAM testing with applications far outnumbering those of competing approaches.
From page 109...
... And conclusions based on exponential assumptions may differ substantially from the conclusions that would be drawn using more plausible statistical assumptions (Zelen and Dannemiller, 1961~. It can thus be very important for an analyst to determine when the exponential model is of dubious validity and to use an alternative analysis in such cases.6 Alternatives to Exponential Life Testing A key implication of exponentially distributed times to failure is that the conditional probability of experiencing a failure in any small time interval of 6When assessing whether an exponential model might be appropriate, it is often important to distinguish between different failure modes.
From page 110...
... A careful treatment of data from an operational test presumes that the models employed for the number and timing of observed failures are selected with attention to the special features of the application and to the quality of the fit of the model to the available data. Our general thesis is that it is important to move beyond exponential life testing.
From page 111...
... . further development of test plans under a Weibull model would be useful." As pointed out above, our examination of the Weibull model should not be taken as an endorsement of Weibull life testing.
From page 112...
... , it can be shown that the corresponding Weibull test would require only 4 observed failures and a maximum total test time of 2,980 hours a reduction of more than 70 percent in required test resources. In general, the results in this table confirm that potential resource savings are available when one recognizes an increasing hazard rate Weibull environment and carries out a Weibull life test instead of an exponential one.
From page 113...
... Because of the variety and complexity of military systems, RAM analysts cannot rely exclusively on exponential life testing methods to adequately address all problems requiring statistical modeling. An insupportable use of the exponential model could have several negative consequences: the operational test design may be inefficient and may forgo potential savings in test resources or the analysis of test results may yield misleading conclusions about system RAM performance.
From page 114...
... Discussions have recently been initiated within the military services regarding the need to move beyond exponential life testing as the standard or default method of analysis and to revise or supplement military handbooks and other reference materials that are based exclusively on exponential assumptions. We applaud these efforts, and encourage the development of effective networks that would enable an operational test team to seek outside statistical advice when faced with a complex modeling problem or limitations of staff resources.
From page 115...
... For example, a key reliability measure in operational testing of the Navy Tactical Command System-Afloat (NTCS-A) (a multifunction distributed system)
From page 116...
... Such instances underscore the need for objective failure definitions and scoring criteria, as well as rigorous documentation of the actual scoring of RAM data and subsequent evaluation. In many cases, precise measurements of failures and repair times observed during testing are "processed" into "rough" estimates of such characteristics as mean time to failure and mean time to repair, which are then combined with assumptions about system operating tempo and logistics support to produce estimates of operational availability.
From page 117...
... The observed hazard rates vary considerably from about 3 to almost 17 failures per 1,000 hours. The number of observed failures in the test varies from aircraft to aircraft for two reasons: intrinsic reliability differences between the aircraft and random error due to the fact that the number of failures would differ from test to test even if the intrinsic reliability remained constant.
From page 118...
... The crucial assumption used in shrinking the observed hazard rates toward the mean hazard rate is that these 13 aircraft are essentially indistinguishable from one another with regard to the process that produced their air conditioning systems. If individual aircraft had air conditioning systems of slightly different designs or if data had been gathered from aircraft under different operating conditions, then alternative assumptions and resulting estimators might be appropriate.
From page 119...
... Recommendation 7.7: Methods of combining reliability, availability, and maintainability data from disparate sources should be carefully studied and selectively adopted in the testing processes associated with the Department of Defense acquisition programs. In particular, authorization should be given to operational testers to combine reliability, availability, and maintainability data from developmental and operational testing as appropriate, with the proviso that analyses in which this is done be carefully justified and defended in detail.
From page 120...
... RAM data on a prospective system taken from earlier stages of development, and from similar systems, can significantly improve the accuracy of conclusions drawn from operational testing and can reduce the amount of resources required for such testing. Therefore, efforts should be made to archive early RAM performance data for use in assessing operational suitability.
From page 121...
... Formal statistical models and associated methods of analysis have been developed to represent such growth and to estimate future reliability and related quantities of interest. A reliability growth analysis typically involves fitting, to observed failure data, an equation expressing the underlying hazard rate as a (usually decreasing)
From page 122...
... Some organizations involved in test and evaluation, particularly AMSAA, are regularly engaged in reliability growth modeling for purposes of planning and analysis. One recent example involved the family of medium tactical vehicles, which underwent an initial phase of operational testing from September through December 1993.
From page 123...
... . One recent study cited several examples of such problems in new military systems, noting that in each case (Bridgman and Glass, 1992:6~: "The OT [operational testing]
From page 124...
... Such models tend to fall into two general types: physical, (based upon "first principles" of chemistry, physics, etc., and the various causative theories pertinent to the system; empirical, those based upon "fits" to experimental data obtained independent of the particular tests being performed. Given a model, statistical approaches to extrapolating reliability statements from accelerated tests are straightforward (if somewhat complicated)
From page 125...
... decision making by pooling suitability data from various sources require documentation of the data sources and of the conditions under which the data were collected, as well as clear and consistent definitions of all terms used. Such efforts underscore the potential value of standardizing RAM testing and evaluation across the services and encouraging the use of best current practices.
From page 126...
... The manuals, handbooks and reference materials presently serving as the basis for military life-testing applications should be upgraded, the statistical level of the personnel who carry out the military's operational RAM testing analysis should be comparably enhanced, and the consideration of alternative models and methods for RAM testing should become routine in operational testing across the services. Recommendation 7.12: Military reliability, availability, and maintainability testing should be informed and guided by a new battery of military handbooks containing a modern treatment of all pertinent topics in the fields of reliability and life testing, including, but not limited to, the design and analysis of standard and accelerated tests, the handling of censored data, stress testing, and the modeling of and testing for reliability growth.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.