There are many opportunities for combining information to improve test design and analysis even in traditional single-stage acquisition (see, for example, National Research Council, 1998). These include data from developmental tests, operational tests, and even training exercises, as well as test results of earlier systems composed of similar components. Such opportunities are considerably greater in an evolutionary setting, in which there is substantial information about the operational performance of the system from earlier, fielded versions.
The similarity of the systems at different stages of development can be exploited so that information from past stages can be used to support efficient test design for later stages. In particular, statistical models can be used to link the performance of a system in developmental tests with that of previous versions of the system in developmental tests, between different stages of the system in operational tests, between developmental and operational test results for various versions of the same system, and linking test data with field performance data. There are both informal and formal methods for combining information from various sources (see, for example, National Research Council, 1992, 2004). See also Berry (2005) for related results in the area of drug development.
Among the various techniques in the experimental design literature, sequential design and analysis are the most relevant to evolutionary acquisi-
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 28
Testing of Defense Systems in an Evolutionary Acquisition Environment 3 Combining Information in Staged Development There are many opportunities for combining information to improve test design and analysis even in traditional single-stage acquisition (see, for example, National Research Council, 1998). These include data from developmental tests, operational tests, and even training exercises, as well as test results of earlier systems composed of similar components. Such opportunities are considerably greater in an evolutionary setting, in which there is substantial information about the operational performance of the system from earlier, fielded versions. The similarity of the systems at different stages of development can be exploited so that information from past stages can be used to support efficient test design for later stages. In particular, statistical models can be used to link the performance of a system in developmental tests with that of previous versions of the system in developmental tests, between different stages of the system in operational tests, between developmental and operational test results for various versions of the same system, and linking test data with field performance data. There are both informal and formal methods for combining information from various sources (see, for example, National Research Council, 1992, 2004). See also Berry (2005) for related results in the area of drug development. TEST DESIGN Among the various techniques in the experimental design literature, sequential design and analysis are the most relevant to evolutionary acquisi-
OCR for page 28
Testing of Defense Systems in an Evolutionary Acquisition Environment tion. However, there are also important differences with traditional use of sequential designs. Evolutionary acquisition represents a “block” sequential case, in which experiments corresponding to a particular stage are carried out in blocks. Furthermore, the system under study changes from stage to stage; unlike many other sequential studies, the decision on what to do in the next stage does not necessarily depend on results from the previous stage. Rather, this decision is based on strategic consideration about the new additional capabilities that are needed for the system in the field. In each block there are several types of experimental strategies that could be considered, such as screening experiments for identifying important factors, response-surface designs for determining the optimal factor combinations, and so on.1 Their usefulness for developmental and operational testing is discussed in Statistics, Testing and Defense Acquisition: New Approaches and Methodological Improvements (National Research Council, 1998; see also National Research Council, 2004; Box, Hunter, and Hunter, 1978; Box and Draper, 1987; Myers and Montgomery, 2001; Wu and Hamada, 2000). This section provides a qualitative description of how one could use information from past stages to improve test design in the current stage. A more technical description of some of these ideas is given in Appendix B. One can use test results and field performance of the system from the previous stage in both qualitative and quantitative ways in test design. These involve obtaining information on (a) factors that were found to be unimportant in previous tests; (b) how the performance (response surface) varied as a function of changes to key inputs or test scenarios; (c) hazard rate behavior of failure data, such as increasing hazard rate, infant mortality, etc., for reliability test design; (d) estimates of variability that are needed for allocating the test resources to different test scenarios in the current stage; and so on. Results from the earlier stages can also suggest whether to oversample or undersample certain test scenarios in the current stage of development or to push the testing envelope in certain directions. The system in a subsequent stage in development could be tested in problematic circumstances to see whether reliability growth or the addition of new com- 1 The naïve use of both screening experiments and response-surface models will almost always be inferior to the use of techniques that are developed in collaboration with system experts, who can help guide the choice of variables and model forms.
OCR for page 28
Testing of Defense Systems in an Evolutionary Acquisition Environment ponents is having a beneficial impact on performance. Poor performance can be analyzed in subsequent stages of system development to understand why certain prototypes were poorly manufactured, if variables can be found that discriminate between the good actors and poor actors, and so on. Clearly, the extent to which information from previous stages can be used effectively depends on the similarities and differences between the stages. This is explored in the context of a regression model in Appendix B for several cases. In situations in which there is good prior information, one can use a Bayesian approach effectively to develop good test designs (Chaloner and Verdinelli, 1995). It is also related to the work done by Los Alamos and Procter & Gamble reported on at the workshop. One can also develop and use a more formal decision-theoretic approach for combining information from developmental, operational, and field tests (as well as combining information from tests at different stages of development). Appendix B includes a discussion of some preliminary ideas. A key problem with this approach is that it requires inputs that cannot be realistically quantified, for example, the price of late fielding for a system, the price of having a fielded system perform poorly in operations, the benefit of deploying a good system earlier, the benefit of winning a battle faster as a result of a fielded new system, the deterrence value of fielding a new system (even if they are not effective), and so on. ANALYSIS Many techniques exist for combining data from various sources to improve the quality (efficiency) of the analysis and conclusions (see, for example, National Research Council, 1992, 2004). These are particularly relevant in evolutionary acquisition, in which data from developmental tests, operational tests, and field performance from previous stages of the system are available. We do not provide a detailed discussion of these methods, as the issues are very similar to those in National Research Council (2004) and references therein. Combining information from the various sources, which rely on relevant linkages between previous and current sources of data, requires considerable care and subject-matter expertise. When done correctly, however, it has a tremendous payoff. For example, ACAT I defense systems typically have dozens of primary measures of performance and measures of effectiveness that are functions of system requirements. These measures must be evaluated in a wide variety of operational scenarios and environments. It is
OCR for page 28
Testing of Defense Systems in an Evolutionary Acquisition Environment true that measures of performance and effectiveness often are interpreted as average performance across these scenarios, but one cannot focus entirely on these averages, since it is also important to identify any extreme heterogeneity of system performance across scenarios. This is an area in which considerable expertise in modeling and statistics is needed. In addition, it should be noted that, while Bayesian methods can be used to formally incorporate information from previous stages, they also provide considerable opportunities for abuse in the use of prior information. Thus, in addition to the right expertise, there is also need for oversight and proper documentation of the analysis and conclusions. These issues are discussed in the next chapter. Current practice in DoD does in fact employ many of the above suggestions, albeit informally and not always consistently, and it is subject to the infrastructure limitations described elsewhere in this report. For example, deficiencies discovered in the field are often the focus of subsequent testing, and new capabilities are often the drivers of operational test design. However, this use of prior information is not routine and is not nearly as effective as it could be. The techniques discussed in the appendixes and similar ones need to be formalized and more broadly adopted and institutionalized in the documentation and casebooks of best practices. Recommendation 5: The Service test agencies should undertake a pilot study, involving a few selected systems developed under the evolutionary acquisition paradigm, in order to identify specific opportunities for incorporating information from previous stages to improve system design and analysis. These case studies will be beneficial in demonstrating both the application of the various techniques and the benefits to be gained from combining data in staged development. MODELING AND SIMULATION Evolutionary acquisition will bring about changes in the role and value of modeling and simulation for operational evaluation and operational test design. The most substantial change is that staged development provides the opportunity to refine models and simulations via the validation provided by the collection of field performance data of earlier stage versions of the system. This, in turn, supports feedback loops for model (and system) improvement. Through this process, the utility of a model or simulation to
OCR for page 28
Testing of Defense Systems in an Evolutionary Acquisition Environment better mimic the crucial determinants of effectiveness and suitability becomes greatly enhanced. In other words, the process of model-test-model becomes much more effective through the “testing” that field use (or training exercises) provides. The recalibrated models and simulations can then be used to assist in next-stage test design and operational evaluation. In the mid- to late 1990s, the Air Force Operational Test Center had a pilot project in which it attempted to employ modeling and simulation to track development of the B1B Defensive System Upgrade. The idea was to incorporate the information gained in developmental testing, operational testing, live-fire testing, training exercises, etc., into one software representation. The model would then serve as a repository of information on current system performance that could be queried in various ways. For example, if successfully formulated, this software representation of the system could be used to track reliability growth, to identify which components needed either additional work or further testing, to identify problematic scenarios of use, and so on. Because of the complexity and risk with electronic warfare systems, modeling and simulation were to be major evaluation tools to evaluate jamming effectiveness against radio frequency threats. Although this effort was the first of its kind (as far as we know) and the program was ultimately discontinued, evolutionary acquisition programs, particularly their intrinsic feedback loop mechanisms, naturally support the construction of similar virtual system representations. Scarcity of time and resources is the usual reason offered for expanding the role of engineering-level modeling and simulation within operational test and evaluation. These science-based models that directly incorporate detailed information on the mechanisms of system functionality and operation, have been proposed as a means for expanding the scope of evaluations beyond the sets of experimental circumstances actually tested. This is a plausible approach only when the relevant technical components of system performance can be modeled sufficiently well for the problem at hand, via physical and/or chemical modeling, and when the testing yields direct validations of modeling and simulation constructs and assumptions. Moreover, extrapolations to nontested regions may be supportable to the degree that their salient characterizations are well understood and encompassed by validated modeling and simulation depictions. Typically (e.g., in the ballistic missile defense arena), such extremely detailed modeling and simulation representations entail considerable complexity and comprehensiveness, and their development often requires extensive dedicated resources and substantial time commitment.
OCR for page 28
Testing of Defense Systems in an Evolutionary Acquisition Environment In addition to their potential utility for expanding the scope of operational performance evaluations, modeling and simulation can also serve as valuable test planning tools—exploring test scenario options, addressing test sizing issues, identifying critical performance issues, and so on. Depending on the application, this can be accomplished by detailed physics-based models or by lower fidelity models that capture primary system performance without attempting to replicate the underlying physical processes that define system performance. One advantage of simpler models is increased responsiveness and flexibility, both in terms of initial availability to support operational testing and evaluation planning as well as turnaround time to complete detailed studies (e.g., comprehensive sensitivity analyses). However, without a physical justification, extrapolation away from conducted testing conditions and circumstances is generally not warranted. As Art Koehler described in a presentation at the workshop, a particular application of modeling and simulation that is directly relevant to evolutionary acquisition is being developed by Procter & Gamble and Los Alamos National Laboratories. They have designed a simulation and analysis system that can examine the impacts on system performance and system reliability from changing a major component of an existing complex system. This simulation and analysis system therefore provides a key component of what one would need to know in an evolutionary context in order to plan for and accommodate changes resulting from component upgrades, possibly through redesign of other parts of the existing system. SOFTWARE DEVELOPMENT IN AN EVOLUTIONARY CONTEXT Software development is one area in which evolutionary acquisition is expected to play a major role. Given the increasing role of software in complex defense systems, staged development should take place consistent with current best practices. The general approach described in Appendix C, sometimes referred to as the cleanroom process, is one of several similar approaches representative of current best practices. The software engineering process described in Appendix C, however, is designed to be carried out by contractors, and the extent to which it could (or should) be mandated in government contracts is unclear. Nevertheless, there is a need to explore how this might be done.