Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 1
Executive Summary
This report provides an assessment of the U.S. Army's planned ini-
tial operational test and evaluation (IOT&E) of the Stryker family
of vehicles. Stryker is the intended platform for the Stryker Bri-
gade Combat Team (SBCT). The Army Test and Evaluation Command
(ATEC), charged with conducting operational testing and evaluation of
Army systems in development, has been asked to take on the unusual re-
sponsibility of designing and conducting operational testing and evalua-
tion of both the vehicle and the SBCT concept and has requested the assis-
tance of the National Research Council (NRC) in this effort.
The Panel on Operational Test Design and Evaluation of the Interim
Armored Vehicle (Stryker), building on the recommendations of an earlier
National Research Council report (National Research Council, 1998), con-
siders the Stryker JOT&E to be a case study of how ATEC (and the other
service test agencies) can more effectively conduct operational test design
and evaluation consistent with state-of-the-art statistical principles and
practices.
The panel has been asked to address three aspects of the operational
test design and evaluation of Stryker: (1) the selection of measures of per-
formance and effectiveness to be used to compare the SECT equipped with
the Stryker against the baseline force, a light infantry brigade; (2) whether
the current operational test design for Stryker is consistent with state-of-
the-art methods for experimental design; and (3) the advantages for evalu-
ating Stryker, and more generally any complex defense system, through the
1
1
OCR for page 2
2
IMPROVED OPERATIONAL TESTING AND EVALUATION
use of information from the initial operational test combined with that
from developmental tests, modeling and simulation, test data and field use
of comparable systems, and engineering judgment and experience. The first
two topics, measures and test design, were addressed in the panel's first
phase report, which is appended to this report. The third item, combining
information, is addressed in this report. This executive summary pertains
to both reports.
MEASURES OF EFFECTIVENESS
The panel was asked to consider what measures of effectiveness
(MOEs) would be useful for comparing Stryker against a baseline system
and focused on issues such as: the disadvantages of rolling up disparate
MOEs in a single overall number, the advantages of various force ratio
measures, and the calibration and scaling of subjective evaluations made by
subject-matter experts (SMEs). We have also pointed out the need to de-
velop scenario-specific MOEs for noncombat missions and suggested some
possible candidates. The panel concluded that no single measure could be
devised for the value of situation awareness, and so approaches were pro-
posed for collective measurement. Further, modeling and simulation were
suggested for use in augmenting test data to help assess situation awareness.
With respect to determining measures of reliability, availability, and
maintainability (RAM), the initial operational test will provide a relatively
small amount of vehicle operating data, compared with the information
obtained in training exercises and developmental testing, and thus may not
be sufficient to address all of the reliability and maintainability concerns of
ATEC. This lack of useful RAM information will be exacerbated by the
fact that the initial operational test is to be performed without using add-
on armor. For this reason, the panel stressed that RAM data collection
should be an ongoing enterprise, with failure times, failure modes, and
maintenance information tracked for the entire life of each vehicle (and its
parts) including data from developmental testing and training and re-
corded in appropriate databases. System performance should be assessed
both separately, by specific failure mode, and across failure modes, rather
than assigning a single failure rate for a vehicle based on a simple exponen-
tial model for all failures. Failure propensity should be related to environ-
mental and operational causes and conditions, including maintenance.
OCR for page 3
EXECUTIVE SUMMARY
TEST PLANNING AND EXPERIMENTAL DESIGN
3
The initial proposed experimental design for Stryker risked confound-
ing observed differences between Stryker and the baseline system with im-
portant sources of uncontrolled variation. In particular, the initial test de-
sign called for the Stryker/SBCT trials to be run at a different time of year
from the baseline trials, which may have confounded time of year with a
difference in effectiveness between the baseline force and the Stryker/SBCT
forces. We therefore recommended that these events be scheduled as closely
together in time as possible and interspersed if feasible. We have been
pleased to learn that the final design of the IOT for Stryker has these test
events scheduled very closely together.
In addition, we recommended that other potential sources of con-
founding, such as player learning and nighttime versus daytime operations,
should be addressed with alternative designs. One alternative suggested to
avoid confounding due to player learning was to use four separate groups of
players, one for each of the two opposing forces (OPFORs), one for the
Stryker/SBCT, and one for the baseline system. Alternating teams from test
replication to test replication between the two systems under test would
also be a reasonable way to address differences in learning, training, fatigue,
and competence. The panel is pleased to note that the design the Army
now proposes has addressed player learning through the use of separate
player teams for the Stryker and baseline systems.
We pointed out the difficulty of identifying a single test design to
address two distinct goals: (1) determining how various environmental or
use factors affect Stryker's system performance with respect to dozens of
measures of performance and (2) confirming a level of performance for
Stryker against either a baseline system or a set of requirements. For ex-
ample, the current test design, constructed primarily to compare Stryker/
SECT with the baseline, is balanced for a limited number of factors, allo-
cating test samples to missions and environments similar to the propor-
tion that would be expected in field use. The design precludes focusing
test cases on environments in which Stryker is anticipated to have advan-
tages over the baseline system, and it allocates a comparable number of test
cases to environments for which Stryker is anticipated to provide little or
no advantage. While the design may be effective in confirming that Stryker
satisfies various criteria, it reduces the opportunity to understand the pos-
sible nature and magnitude of the benefit that Stryker provides in various
. .
crucla . environments.
OCR for page 4
4
IMPROVED OPERATIONAL TESTING AND EVALUATION
The panel therefore described some alternative approaches to opera-
tional test design, including a two-stage design learning and confirm-
ing and the use of small-scale pilot tests. The latter could be particularly
useful in understanding the contribution of specific performance features,
such as situation awareness for the Stryker system, for example, by running
some test cases with the system's situation awareness capabilities intention-
ally degraded or turned off in order to determine their value in particular
. . .
missions or scenarios.
In addition, the panel in its earlier report provided technical advice in
areas such as statistical power calculations, identifying the appropriate test
unit of analysis, issues involving use of SME ratings, aggregation of mea-
sures, and use of graphical methods in test evaluation.
With respect to the general system development process, the panel
believes that, absent strategic considerations, a system should not be for-
warded to operational testing until the system design is relatively mature.
~ . . . . . . .
torwarc sing an immature system to operational testing IS an expensive way
to discover errors that could have been detected in developmental testing,
and it reduces the ability of the operational test to carry out its proper
functions of assessing the capabilities and limitations of the mature system
and confirming that it satisfies its requirements.
The panel suggested that, in the future, to assist in test design, ATEC
should prepare a straw man test evaluation report (TER) well before the
initial operational test is carried out. This TER should be based on ficti-
tious data filled out using expert judgment, as if the initial operational test
had been completed, and it should include examples of how a representa-
tive data set would be analyzed, models to be used to carry out the analysis,
anticipated standard deviations, confidence intervals, hypothesis tests, and
other summaries. The fictitious data would be based on the experience and
intuition of the analysts and what they think the results of the initial opera-
tional test might look like, including how effective the new system is likely
to be in various test situations. Of course, initial operational tests collect
data in great detail and, for this purpose, some of that detail could be
omitted but not discarded; we discuss in Chapter 4 of this report the
utility of archiving these and other data for future use.
SYSTEM EVALUATION BY COMBINING INFORMATION
This report focuses on techniques for combining information to en-
hance both operational test design and evaluation. The panel has concluded
OCR for page 5
EXECUTIVE SUMMARY
that, as currently planned, the number of test replications in the IOT for
Stryker, a complex system of systems, will be inadequate to support hy-
pothesis tests at the usual significance and power levels to guide the deci-
sion as to whether Stryker should be approved for full-rate production.
This inadequacy is not specific to Stryker, as stated in the 1998 NRC re-
port (National Research Council, 19981; rather, we suspect it to be true for
the great majority of acquisition category (ACAT) I systems. Therefore,
ATEC should seriously consider methods for augmenting information from
operational testing in order to support better decision making, and also
examine how information from earlier stages of system development and
from analogous systems could be formally used to assist in operational test
design.
Various sources and types of information could help augment the data
currently collected in operational tests. These sources include developmen-
tal testing, training exercises, other less controlled uses of the system, and
information obtained from both testing and field use of similar systems as
well as systems with very similar components. While ATEC already makes
use of some informal methods for combining such varied types of informa-
tion, in particular the use of expert opinion for test design, this report
focuses on the benefits of the use of more formal methods and suggests
ways to implement these methods more broadly within ATEC. Of course,
there are valid concerns about the comparability of data collected either in
developmental testing or in uncontrolled use for prior versions of a system,
and therefore the potential dangers of improper use of these methods is also
discussed.
Formal methods for combining information include complete or par-
tial pooling of data from two or more comparable sets of tests or other use,
accommodating data from disparate sources using hierarchical or random
effects models, and updating prior uncertainties about critical performance
measures using Bayesian techniques. We stress that both formal and infor-
mal methods require the judicious selection and confirmation of underly-
ing assumptions as well as a careful and open process by which various
types of information, some of which involve subjective judgment, are gath-
ered and combined.
To demonstrate their breadth and nature, this report presents specific
examples of these methods and their applications, including their use in
test design to reduce the number of test combinations needed to capture
factor interactions; pooling techniques; use of existing knowledge about a
Weibull parameter to enhance the precision of the assessment of a critical
OCR for page 6
6
IMPROVED OPERATIONAL TESTING AND EVALUATION
failure time performance measure; and their ability to incorporate uncer-
tainty about the types and number of failure modes and associated failure
rates.
This report also presents some requirements for utilizing these meth-
ods, especially with respect to data archiving, enhancing statistical capabili-
ties within ATEC, and the necessity of using a formal process for eliciting
expert judgments on system performance.
TOWARD THE FUTURE
Stryker is intended to be an integral part of a transformation to the
Future Combat System (FCS) and the Future Brigade Combat Team
(FBCT), whose test design and evaluation are likely to be substantially
more complicated than those of the Stryker/SBCT. First, the FCS/FBCT is
intended for use in a much broader array of operational missions and envi-
ronments than the Stryker/SBCT. Second, it is a more complex family of
systems than the Stryker/SBCT, and effective concepts, tactics, techniques,
and procedures must be developed in advance of the operational test, pay-
ing particular attention to the use of the command, control, communica-
tions, computers, intelligence, surveillance, and reconnaissance (C4ISR).
Third, the FCS/FBCT networking capability must be tested. Fourth, test
designs will have to be effectively tailored to the evolutionary development
process for the FCS/FBCT. Finally, its enhanced reliability requirements
will have to be rigorously tested.
To address these challenges, we suggest in the current report that ATEC
develop a parametric space of test environments that can be strategically
sampled for testing. ATEC should also develop a test and evaluation data
archive to support evolutionary acquisition and a strategy for supporting
test design within an evolutionary acquisition framework.
LIMITATIONS
We wish to include four points related both to the limited nature of
our charge and to our advice regarding measures and experimental design.
First, we note that an alternative baseline system that could have taken
advantage of the SECT infrastructure could have been tested to help un-
derstand the value of Stryker without the SBCT system. Similarly, it does
not seem necessary to require that only a system that could be transported
as quickly as Stryker serve as a baseline for comparison.
OCR for page 7
EXECUTIVE SUMMARY
7
Second, the current test compares the Stryker/SBCT system not only
with a baseline system but also with the vehicles used in the baseline. For
some purposes, isolating those comparisons could be important (for ex-
ample, to determine Stryker's relative maneuverability in rural versus urban
terrain and to examine the effects on its utility of its mobility in those
environments).
The third point concerns the capacity of the current operational test
design to provide adequate information on how to tactically employ the
Stryker/SBCT system. For example, how should greater situation aware-
ness be best utilized and how should it be balanced against greater vulner-
ability in various types of environments and against various threats? The
answers to these questions do not rely on technical or statistical analyses
but rather on the essential features of the test scenarios that we were not
qualified to evaluate.
The fourth issue is whether the selected missions, types of terrain, and
intensity of conflict are the correct choices for operational testing to sup-
port the decision on whether to pass Stryker to full-rate production. Other
missions, types of terrain, intensities, and factors not included in the cur-
rent test design might have an effect on the performance of Stryker, the
baseline system, or both. These factors include, for example, temperature,
precipitation, the density of buildings, building height, and characteristics
of roads. Moreover, there are serious problems raised by the unavailability
of add-on armor for the early stages of the operational test. The panel has
been obligated to take the operational mode summary/missions profile
(OMS/MP) as given, but it is not clear whether additional factors that
might have an important effect on performance should be included as test
factors.
For these reasons, our assessment of the Stryker/SBCT IOT as cur-
rently designed reflects only its statistical merits. The IOT may be deficient
in other respects that may be substantially more important than the statisti-
cal aspects of the test. Therefore, even if the statistical shortcomings dis-
cussed in this report were to be mitigated, we cannot determine whether
the resulting operational test design would provide sufficient information
about whether Stryker should be promoted to full-rate production.
CONCLUSIONS AND RECOMMENDATIONS
We offer here several conclusions and recommendations that we be-
lieve particularly deserve high priority (additional conclusions and recom-
OCR for page 8
8
IMPROVED OPERATIONAL TESTING AND EVALUATION
mendations are discussed in the phase I report). We begin with a review of
four sets of recommendations on test measures, statistical design, data
analysis, and assessment of the Stryker/SBCT operational test in a broad
context contained in our first report. After that are presented conclusions
and recommendations on combining information, derived from our cur-
rent report.
Recommendations on Test Measures
ATEC should not roll different MOEs up into a single overall
MOE that tries to capture effectiveness or suitability.
2. To help in their calibration, ATEC should ask each subject-matter
expert to review his or her own assessment of the Stryker IOT missions, for
each scenario, immediately before he or she assesses the baseline missions
(or vice versa).
3. ATEC should review the opportunities and possibilities for sub-
ject-matter experts to contribute to the collection of objective data, such as
times to complete certain subtasks and distances at critical times.
4. ATEC should use the force exchange ratio (and the loss exchange
ratio when appropriate), and not the relative loss ratio, as the primary mis-
sion-level MOE for analyses of engagement results.
5. ATEC should use fratricide frequency and civilian casualty fre-
quency to measure the amount of fratricide and collateral damage in a
. .
mlsslon.
, . .
conaltlon.
6. ATEC should add scenario-specific measures of performance for
security operations in a stable environment (SOSE) missions.
7. ATEC should add situation awareness as an explicit test
8. RAM data collection should be an ongoing enterprise. ATEC
should track failure and maintenance information on a vehicle or part/
system basis for the entire life of the vehicle or part/system. To do this,
ATEC should set up an appropriate database. Since this was probably not
done with those Stryker vehicles already in existence, it should be imple-
mented for future maintenance actions on all Stryker vehicles.
9. ATEC should analyze failure modes separately rather than trying
to develop failure rates for the entire vehicle using simple exponential
models.
OCR for page 9
EXECUTIVE SUMMARY
9
Recommendations on Statistical Design
10. Absent strategic considerations, ATEC should not commence op-
erational testing until the system design is mature.
1 1. ATEC should consider, for future test designs, relaxing some of its
current rules of test design, by (a) not allocating sample size to scenarios
according to the OMS/MP but instead using principles from optimal ex-
perimental design theory, (b) testing under more extreme conditions than
typically will be faced in the field, (c) using information from developmen-
tal testing to improve operational test design, and (~) separating the opera-
tional test into at least two stages, learning and confirming.
12. When specific performance or capability problems arise in the
early part of operational testing, ATEC should consider the use of small-
scale pilot tests focused on the analysis of these problems. For example,
ATEC should consider test conditions that involve using Stryker with situ-
ation awareness degraded or turned off to determine its value in particular
. .
missions.
Recommendations on Data Analysis
13. The IOT provides sparse vehicle operating data and thus may not
be sufficient to address all of ATEC's reliability and maintainability con-
cerns. The panel therefore recommends improved data collection regarding
vehicle usage. In particular, ATEC should collect, separately for different
failure modes, and maintain data for each vehicle over the vehicle's entire
life' including training, testing, and field use.
Recommendations on Assessing the Stryker/SBCT Operational Test
in a Broil Context
14. The estimation of system suitability, in particular the estimation
of mean fatigue life, repair and replacement times, and the identification of
failure modes, should not be the primary responsibility of operational test-
ing, since operational testing cannot be expected to run long enough to
accurately estimate these quantities. Therefore, developmental testing
should give greater priority to measurement of system (operational) suit-
ability and should be structured to provide its test events with greater op-
erational realism.
OCR for page 10
10
IMPROVED OPERATIONAL TESTING AND EVALUATION
Conclusions and Recommendations on
How to Combine Information
15. ATEC should prepare a strategy for operational testing of the FCS/
FBCT that will:
· recognize the sequential nature of the testing that will be re-
quired as part of the evolutionary acquisition process for FCS,
· recognize both the need to evaluate the family of systems and
the potential need for diagnostic experimentation of operational
concepts in multiple operational situations,
· delineate relevant questions to be addressed by testing and
evaluation,
· identify the additional data (from subsequent tests) needed to
address these questions, and
· include modeling and simulation activities as an integral part of
the testing and evaluation process.
16. The Department of Defense should provide the funds to establish
a test data archive that will be a prerequisite for combining information for
the testing and evaluation of future systems.
17. ATEC should consider ways to increase its statistical capabilities
to support future use of techniques for combining information. As a first
step, ATEC should consider providing all sources and types of information
to a selected group of qualified statisticians in industry and academia as a
case study to determine the potential advantages of combining information
for operational evaluation.
Representative terms from entire chapter:
test design