Click for next page ( 2


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
Executive Summary This report provides an assessment of the U.S. Army's planned ini- tial operational test and evaluation (IOT&E) of the Stryker family of vehicles. Stryker is the intended platform for the Stryker Bri- gade Combat Team (SBCT). The Army Test and Evaluation Command (ATEC), charged with conducting operational testing and evaluation of Army systems in development, has been asked to take on the unusual re- sponsibility of designing and conducting operational testing and evalua- tion of both the vehicle and the SBCT concept and has requested the assis- tance of the National Research Council (NRC) in this effort. The Panel on Operational Test Design and Evaluation of the Interim Armored Vehicle (Stryker), building on the recommendations of an earlier National Research Council report (National Research Council, 1998), con- siders the Stryker JOT&E to be a case study of how ATEC (and the other service test agencies) can more effectively conduct operational test design and evaluation consistent with state-of-the-art statistical principles and practices. The panel has been asked to address three aspects of the operational test design and evaluation of Stryker: (1) the selection of measures of per- formance and effectiveness to be used to compare the SECT equipped with the Stryker against the baseline force, a light infantry brigade; (2) whether the current operational test design for Stryker is consistent with state-of- the-art methods for experimental design; and (3) the advantages for evalu- ating Stryker, and more generally any complex defense system, through the 1 1

OCR for page 1
2 IMPROVED OPERATIONAL TESTING AND EVALUATION use of information from the initial operational test combined with that from developmental tests, modeling and simulation, test data and field use of comparable systems, and engineering judgment and experience. The first two topics, measures and test design, were addressed in the panel's first phase report, which is appended to this report. The third item, combining information, is addressed in this report. This executive summary pertains to both reports. MEASURES OF EFFECTIVENESS The panel was asked to consider what measures of effectiveness (MOEs) would be useful for comparing Stryker against a baseline system and focused on issues such as: the disadvantages of rolling up disparate MOEs in a single overall number, the advantages of various force ratio measures, and the calibration and scaling of subjective evaluations made by subject-matter experts (SMEs). We have also pointed out the need to de- velop scenario-specific MOEs for noncombat missions and suggested some possible candidates. The panel concluded that no single measure could be devised for the value of situation awareness, and so approaches were pro- posed for collective measurement. Further, modeling and simulation were suggested for use in augmenting test data to help assess situation awareness. With respect to determining measures of reliability, availability, and maintainability (RAM), the initial operational test will provide a relatively small amount of vehicle operating data, compared with the information obtained in training exercises and developmental testing, and thus may not be sufficient to address all of the reliability and maintainability concerns of ATEC. This lack of useful RAM information will be exacerbated by the fact that the initial operational test is to be performed without using add- on armor. For this reason, the panel stressed that RAM data collection should be an ongoing enterprise, with failure times, failure modes, and maintenance information tracked for the entire life of each vehicle (and its parts) including data from developmental testing and training and re- corded in appropriate databases. System performance should be assessed both separately, by specific failure mode, and across failure modes, rather than assigning a single failure rate for a vehicle based on a simple exponen- tial model for all failures. Failure propensity should be related to environ- mental and operational causes and conditions, including maintenance.

OCR for page 1
EXECUTIVE SUMMARY TEST PLANNING AND EXPERIMENTAL DESIGN 3 The initial proposed experimental design for Stryker risked confound- ing observed differences between Stryker and the baseline system with im- portant sources of uncontrolled variation. In particular, the initial test de- sign called for the Stryker/SBCT trials to be run at a different time of year from the baseline trials, which may have confounded time of year with a difference in effectiveness between the baseline force and the Stryker/SBCT forces. We therefore recommended that these events be scheduled as closely together in time as possible and interspersed if feasible. We have been pleased to learn that the final design of the IOT for Stryker has these test events scheduled very closely together. In addition, we recommended that other potential sources of con- founding, such as player learning and nighttime versus daytime operations, should be addressed with alternative designs. One alternative suggested to avoid confounding due to player learning was to use four separate groups of players, one for each of the two opposing forces (OPFORs), one for the Stryker/SBCT, and one for the baseline system. Alternating teams from test replication to test replication between the two systems under test would also be a reasonable way to address differences in learning, training, fatigue, and competence. The panel is pleased to note that the design the Army now proposes has addressed player learning through the use of separate player teams for the Stryker and baseline systems. We pointed out the difficulty of identifying a single test design to address two distinct goals: (1) determining how various environmental or use factors affect Stryker's system performance with respect to dozens of measures of performance and (2) confirming a level of performance for Stryker against either a baseline system or a set of requirements. For ex- ample, the current test design, constructed primarily to compare Stryker/ SECT with the baseline, is balanced for a limited number of factors, allo- cating test samples to missions and environments similar to the propor- tion that would be expected in field use. The design precludes focusing test cases on environments in which Stryker is anticipated to have advan- tages over the baseline system, and it allocates a comparable number of test cases to environments for which Stryker is anticipated to provide little or no advantage. While the design may be effective in confirming that Stryker satisfies various criteria, it reduces the opportunity to understand the pos- sible nature and magnitude of the benefit that Stryker provides in various . . crucla . environments.

OCR for page 1
4 IMPROVED OPERATIONAL TESTING AND EVALUATION The panel therefore described some alternative approaches to opera- tional test design, including a two-stage design learning and confirm- ing and the use of small-scale pilot tests. The latter could be particularly useful in understanding the contribution of specific performance features, such as situation awareness for the Stryker system, for example, by running some test cases with the system's situation awareness capabilities intention- ally degraded or turned off in order to determine their value in particular . . . missions or scenarios. In addition, the panel in its earlier report provided technical advice in areas such as statistical power calculations, identifying the appropriate test unit of analysis, issues involving use of SME ratings, aggregation of mea- sures, and use of graphical methods in test evaluation. With respect to the general system development process, the panel believes that, absent strategic considerations, a system should not be for- warded to operational testing until the system design is relatively mature. ~ . . . . . . . torwarc sing an immature system to operational testing IS an expensive way to discover errors that could have been detected in developmental testing, and it reduces the ability of the operational test to carry out its proper functions of assessing the capabilities and limitations of the mature system and confirming that it satisfies its requirements. The panel suggested that, in the future, to assist in test design, ATEC should prepare a straw man test evaluation report (TER) well before the initial operational test is carried out. This TER should be based on ficti- tious data filled out using expert judgment, as if the initial operational test had been completed, and it should include examples of how a representa- tive data set would be analyzed, models to be used to carry out the analysis, anticipated standard deviations, confidence intervals, hypothesis tests, and other summaries. The fictitious data would be based on the experience and intuition of the analysts and what they think the results of the initial opera- tional test might look like, including how effective the new system is likely to be in various test situations. Of course, initial operational tests collect data in great detail and, for this purpose, some of that detail could be omitted but not discarded; we discuss in Chapter 4 of this report the utility of archiving these and other data for future use. SYSTEM EVALUATION BY COMBINING INFORMATION This report focuses on techniques for combining information to en- hance both operational test design and evaluation. The panel has concluded

OCR for page 1
EXECUTIVE SUMMARY that, as currently planned, the number of test replications in the IOT for Stryker, a complex system of systems, will be inadequate to support hy- pothesis tests at the usual significance and power levels to guide the deci- sion as to whether Stryker should be approved for full-rate production. This inadequacy is not specific to Stryker, as stated in the 1998 NRC re- port (National Research Council, 19981; rather, we suspect it to be true for the great majority of acquisition category (ACAT) I systems. Therefore, ATEC should seriously consider methods for augmenting information from operational testing in order to support better decision making, and also examine how information from earlier stages of system development and from analogous systems could be formally used to assist in operational test design. Various sources and types of information could help augment the data currently collected in operational tests. These sources include developmen- tal testing, training exercises, other less controlled uses of the system, and information obtained from both testing and field use of similar systems as well as systems with very similar components. While ATEC already makes use of some informal methods for combining such varied types of informa- tion, in particular the use of expert opinion for test design, this report focuses on the benefits of the use of more formal methods and suggests ways to implement these methods more broadly within ATEC. Of course, there are valid concerns about the comparability of data collected either in developmental testing or in uncontrolled use for prior versions of a system, and therefore the potential dangers of improper use of these methods is also discussed. Formal methods for combining information include complete or par- tial pooling of data from two or more comparable sets of tests or other use, accommodating data from disparate sources using hierarchical or random effects models, and updating prior uncertainties about critical performance measures using Bayesian techniques. We stress that both formal and infor- mal methods require the judicious selection and confirmation of underly- ing assumptions as well as a careful and open process by which various types of information, some of which involve subjective judgment, are gath- ered and combined. To demonstrate their breadth and nature, this report presents specific examples of these methods and their applications, including their use in test design to reduce the number of test combinations needed to capture factor interactions; pooling techniques; use of existing knowledge about a Weibull parameter to enhance the precision of the assessment of a critical

OCR for page 1
6 IMPROVED OPERATIONAL TESTING AND EVALUATION failure time performance measure; and their ability to incorporate uncer- tainty about the types and number of failure modes and associated failure rates. This report also presents some requirements for utilizing these meth- ods, especially with respect to data archiving, enhancing statistical capabili- ties within ATEC, and the necessity of using a formal process for eliciting expert judgments on system performance. TOWARD THE FUTURE Stryker is intended to be an integral part of a transformation to the Future Combat System (FCS) and the Future Brigade Combat Team (FBCT), whose test design and evaluation are likely to be substantially more complicated than those of the Stryker/SBCT. First, the FCS/FBCT is intended for use in a much broader array of operational missions and envi- ronments than the Stryker/SBCT. Second, it is a more complex family of systems than the Stryker/SBCT, and effective concepts, tactics, techniques, and procedures must be developed in advance of the operational test, pay- ing particular attention to the use of the command, control, communica- tions, computers, intelligence, surveillance, and reconnaissance (C4ISR). Third, the FCS/FBCT networking capability must be tested. Fourth, test designs will have to be effectively tailored to the evolutionary development process for the FCS/FBCT. Finally, its enhanced reliability requirements will have to be rigorously tested. To address these challenges, we suggest in the current report that ATEC develop a parametric space of test environments that can be strategically sampled for testing. ATEC should also develop a test and evaluation data archive to support evolutionary acquisition and a strategy for supporting test design within an evolutionary acquisition framework. LIMITATIONS We wish to include four points related both to the limited nature of our charge and to our advice regarding measures and experimental design. First, we note that an alternative baseline system that could have taken advantage of the SECT infrastructure could have been tested to help un- derstand the value of Stryker without the SBCT system. Similarly, it does not seem necessary to require that only a system that could be transported as quickly as Stryker serve as a baseline for comparison.

OCR for page 1
EXECUTIVE SUMMARY 7 Second, the current test compares the Stryker/SBCT system not only with a baseline system but also with the vehicles used in the baseline. For some purposes, isolating those comparisons could be important (for ex- ample, to determine Stryker's relative maneuverability in rural versus urban terrain and to examine the effects on its utility of its mobility in those environments). The third point concerns the capacity of the current operational test design to provide adequate information on how to tactically employ the Stryker/SBCT system. For example, how should greater situation aware- ness be best utilized and how should it be balanced against greater vulner- ability in various types of environments and against various threats? The answers to these questions do not rely on technical or statistical analyses but rather on the essential features of the test scenarios that we were not qualified to evaluate. The fourth issue is whether the selected missions, types of terrain, and intensity of conflict are the correct choices for operational testing to sup- port the decision on whether to pass Stryker to full-rate production. Other missions, types of terrain, intensities, and factors not included in the cur- rent test design might have an effect on the performance of Stryker, the baseline system, or both. These factors include, for example, temperature, precipitation, the density of buildings, building height, and characteristics of roads. Moreover, there are serious problems raised by the unavailability of add-on armor for the early stages of the operational test. The panel has been obligated to take the operational mode summary/missions profile (OMS/MP) as given, but it is not clear whether additional factors that might have an important effect on performance should be included as test factors. For these reasons, our assessment of the Stryker/SBCT IOT as cur- rently designed reflects only its statistical merits. The IOT may be deficient in other respects that may be substantially more important than the statisti- cal aspects of the test. Therefore, even if the statistical shortcomings dis- cussed in this report were to be mitigated, we cannot determine whether the resulting operational test design would provide sufficient information about whether Stryker should be promoted to full-rate production. CONCLUSIONS AND RECOMMENDATIONS We offer here several conclusions and recommendations that we be- lieve particularly deserve high priority (additional conclusions and recom-

OCR for page 1
8 IMPROVED OPERATIONAL TESTING AND EVALUATION mendations are discussed in the phase I report). We begin with a review of four sets of recommendations on test measures, statistical design, data analysis, and assessment of the Stryker/SBCT operational test in a broad context contained in our first report. After that are presented conclusions and recommendations on combining information, derived from our cur- rent report. Recommendations on Test Measures ATEC should not roll different MOEs up into a single overall MOE that tries to capture effectiveness or suitability. 2. To help in their calibration, ATEC should ask each subject-matter expert to review his or her own assessment of the Stryker IOT missions, for each scenario, immediately before he or she assesses the baseline missions (or vice versa). 3. ATEC should review the opportunities and possibilities for sub- ject-matter experts to contribute to the collection of objective data, such as times to complete certain subtasks and distances at critical times. 4. ATEC should use the force exchange ratio (and the loss exchange ratio when appropriate), and not the relative loss ratio, as the primary mis- sion-level MOE for analyses of engagement results. 5. ATEC should use fratricide frequency and civilian casualty fre- quency to measure the amount of fratricide and collateral damage in a . . mlsslon. , . . conaltlon. 6. ATEC should add scenario-specific measures of performance for security operations in a stable environment (SOSE) missions. 7. ATEC should add situation awareness as an explicit test 8. RAM data collection should be an ongoing enterprise. ATEC should track failure and maintenance information on a vehicle or part/ system basis for the entire life of the vehicle or part/system. To do this, ATEC should set up an appropriate database. Since this was probably not done with those Stryker vehicles already in existence, it should be imple- mented for future maintenance actions on all Stryker vehicles. 9. ATEC should analyze failure modes separately rather than trying to develop failure rates for the entire vehicle using simple exponential models.

OCR for page 1
EXECUTIVE SUMMARY 9 Recommendations on Statistical Design 10. Absent strategic considerations, ATEC should not commence op- erational testing until the system design is mature. 1 1. ATEC should consider, for future test designs, relaxing some of its current rules of test design, by (a) not allocating sample size to scenarios according to the OMS/MP but instead using principles from optimal ex- perimental design theory, (b) testing under more extreme conditions than typically will be faced in the field, (c) using information from developmen- tal testing to improve operational test design, and (~) separating the opera- tional test into at least two stages, learning and confirming. 12. When specific performance or capability problems arise in the early part of operational testing, ATEC should consider the use of small- scale pilot tests focused on the analysis of these problems. For example, ATEC should consider test conditions that involve using Stryker with situ- ation awareness degraded or turned off to determine its value in particular . . missions. Recommendations on Data Analysis 13. The IOT provides sparse vehicle operating data and thus may not be sufficient to address all of ATEC's reliability and maintainability con- cerns. The panel therefore recommends improved data collection regarding vehicle usage. In particular, ATEC should collect, separately for different failure modes, and maintain data for each vehicle over the vehicle's entire life' including training, testing, and field use. Recommendations on Assessing the Stryker/SBCT Operational Test in a Broil Context 14. The estimation of system suitability, in particular the estimation of mean fatigue life, repair and replacement times, and the identification of failure modes, should not be the primary responsibility of operational test- ing, since operational testing cannot be expected to run long enough to accurately estimate these quantities. Therefore, developmental testing should give greater priority to measurement of system (operational) suit- ability and should be structured to provide its test events with greater op- erational realism.

OCR for page 1
10 IMPROVED OPERATIONAL TESTING AND EVALUATION Conclusions and Recommendations on How to Combine Information 15. ATEC should prepare a strategy for operational testing of the FCS/ FBCT that will: recognize the sequential nature of the testing that will be re- quired as part of the evolutionary acquisition process for FCS, recognize both the need to evaluate the family of systems and the potential need for diagnostic experimentation of operational concepts in multiple operational situations, delineate relevant questions to be addressed by testing and evaluation, identify the additional data (from subsequent tests) needed to address these questions, and include modeling and simulation activities as an integral part of the testing and evaluation process. 16. The Department of Defense should provide the funds to establish a test data archive that will be a prerequisite for combining information for the testing and evaluation of future systems. 17. ATEC should consider ways to increase its statistical capabilities to support future use of techniques for combining information. As a first step, ATEC should consider providing all sources and types of information to a selected group of qualified statisticians in industry and academia as a case study to determine the potential advantages of combining information for operational evaluation.