The Defense Acquisition Process
The DoD acquisition process comprises a series of steps that begin with a recognized need for a new capability, advance through rough conceptual planning to the development of system prototypes, and lead, ultimately, to a new system in full production that meets the stated need. This is a difficult process to manage well. Many of the technologies involved in a new system have never been used for that particular application (or may not even exist), the system can be extremely expensive, and the finished product can have significant implications for national security. Simply put, the stakes are high, and it is very difficult to prevent the occurrence of unanticipated problems.
Appendix A provides a description of the military acquisition process as currently executed for many military systems, although this process is changing rapidly. We summarize the process here as the context for the panel 's general approach to its mission. We also define some terminology used in the remainder of the report.
The procurement of a major weapon system is divided into milestones. When a new capability is needed, the relevant service prepares a Mission Needs Statement that describes the threat this new capability is addressing. If it is determined that the new capability can be met only with new materiel, and a fairly specific approach is agreed upon, a new acquisition program is initiated. The most expensive systems are given the acquisition category (ACAT) I designation. After some additional review, the program passes milestone 0.
Between milestone 0 and milestone I the program plans become more specific. At milestone I the budget process begins, and a program office and the first of (potentially) many program managers is assigned to the program. The program manager is given the job of ensuring that the program passes the next milestone in the process. In addition, between milestones 0 and I, various planning documents are prepared. In particular, the Operational Requirements Document details the link between the mission need and specific performance parameters, and the Test and Evaluation Master Plan provides the
structure of the test and evaluation program, including schedule and resource implications. Other documents also provide program specifications; because these documents have different purposes and are produced by different groups, the specifications may not agree with those in the Operational Requirements Document. Furthermore, different documents prepared at different stages of the process may contain different program specifications.
Between milestones I and II the program undergoes further refinement, including some developmental test and evaluation. Developmental testing is carried out on prototypes by the developing agency to help in designing the system to technical specifications.1 In this demonstration and validation phase, the capabilities of the system become better understood: that is, it becomes clearer whether the specifications can be achieved. Milestone II is Development Approval, at which point it is decided whether the system is mature enough to enter low-rate initial production. This decision is based on an assessment of the system's affordability and the likelihood that the parameters specified in the Operational Requirements Document can be achieved. Furthermore, the resource requirements for operational testing are specified by the testing community of the relevant service and also by the Director of Operational Test and Evaluation.
The phase between milestones II and III is called engineering and manufacturing development. The objective is to develop a stable, producible, and cost-effective system design. Operational testing is a major activity that takes place at this time; some developmental testing continues, too. Operational testing is field testing carried out on production units under realistic operating conditions by typical users to verify that systems are operationally effective and suitable for their intended use and to provide essential information for assessment of acquisition risk. At milestone III, the decision is made whether to go into full production; this decision is based heavily on the results of the operational testing. Figure 1-1 is a diagram of the DoD acquisition process.
Operational Testing as Part of the Acquisition Process
In assessing how to make optimum use of best statistical practices in operational testing, when done as part of the acquisition process, sometimes it is necessary to consider various aspects of the larger acquisition process. For example, starting operational testing earlier in the acquisition process—an idea that has won support in the DoD community and among members of our panel—has implications for how statistical methods would be applied. Similarly, the operational test design or evaluation of system performance might conceivably make use either of optimal experimental design methods which depend on parameters that must be estimated or of statistical techniques that “borrow strength” from data earlier in the process. These approaches might use information from developmental testing, but concern about preserving the independence of operational and developmental testing could make such ideas controversial. Organizational constraints and competing incentives complicate the application of sequential testing methods, as well as some statistical ideas about how to allocate operational testing resources as a function of the total system budget. Furthermore, ideas of quality management that have gained great acceptance in industry seem relevant to the task of developing military systems, despite the obvious contextual differences, and the implementation of such ideas would require a complete understanding of the DoD acquisition process.
The terms “development test and evaluation” and “developmental testing” are used synonymously in this report. Similarly, the terms “operational test and evaluation” and “operational testing” are also used synonymously.
The above context motivates the following blueprint for the panel 's activities. The panel is working in both reactive and proactive modes. In the former mode, we are investigating current operational testing practice in DoD and examining how the use of statistical techniques can improve this practice. We will suggest improvements in four areas that overlap substantially: (1) assessment of reliability, availability, and maintainability; (2) use of modeling and simulation; (3) methods for software testing; and (4) use of experimental design techniques. We expect that suggested improvements in these areas can be implemented almost immediately because they will require no adjustment to the general acquisition process as currently structured. We also hope to develop a taxonomic structure for characterizing systems that require different types of operational test procedures.
In its more proactive mode, the panel anticipates taking a longer and more expanded view of how operational testing fits into the acquisition process. The prospectus for the panel's study anticipated the need for breadth in the scope of the study: “In addition to making recommendations on how to improve operational testing under current requirements and criteria, the panel would also take a longer term perspective and consider whether and to what extent technical, organizational, and legal requirements and criteria constrain optimal decision making. ” Furthermore, the prospectus mentions a major point expressed in the workshop that gave rise to the panel study: “An idea that was expressed often [at the workshop] is to encourage moving away from the present advocacy environment surrounding quality assurance, in which one party or the other is characterized as being at fault. Several speakers urged moving toward a more neutral and cooperative environment in managing quality in weapon systems,” with an emphasis on achieving final quality rather than clearing interim hurdles at program milestones (see Rolph and Steffey, 1994, for a final report of the workshop). After its initial stage of activity, the panel is inclined to echo this sentiment.
In future work, we hope to articulate general principles and a philosophy of managing information and quality, drawn from broad experience in industry and government. From this perspective, we may
suggest directions for change in the general acquisition process that would make operational testing more informative. The panel understands that the acquisition process as a whole satisfies many needs and goals and that numerous interdependencies have arisen in support of this process. Thus, changes to the acquisition process would have wide-ranging effects that would be difficult to foresee. The panel is also aware that it is not constituted in a way that would permit recommendations concerning a major restructuring of the acquisition process. However, certain changes in the process could expand the usefulness of operational testing, and we believe we have relevant experience in how information on product development should be managed and analyzed. The next section presents some preliminary thoughts on testing in product development and how statistics can be used to improve it.
STATISTICS AND INFORMATION MANAGEMENT IN DEFENSE TESTING
In a wide variety of applications, statistics has provided plans to meet sequences of information needs during product development cycles, methods and strategies for controlling and assuring the quality of products or systems, and designs of experimental studies to demonstrate performance outcomes. Thus, statistical science can make broad contributions in the development, testing, and evaluation of such complex entities as defense systems.
Operational Testing of Complex Systems
Modern methods of manufacturing and product development recognize that operational test and evaluation is a necessary part of placing any product or system into widespread public use. Evaluation of the performance of products and systems in operational use against prespecified performance standards, such as effectiveness, suitability, efficacy, or safety criteria, is usually the last experimental testing stage in an evolutionary sequence of product development.
At least four aspects of operational testing contribute to its difficulty and complexity:
The operational testing paradigm often does not lead to a pass/fail decision. Instead, testing can involve redesign, iteration on concepts, or changes in subcomponents. This aspect especially characterizes the operational testing of complex systems for which no competing capability exists. The statistical methodology appropriate for one-at-a-time pass/fail decisions is inappropriate for sequential problems; thus there is a need for more proper sequential methods that will increase the information derived from tests of this type.
Operational testing involves realistic engagements in which circumstances can be controlled only in the broadest sense. Human intervention, training, and operator skill level often defy control, and can play as important a role in the performance outcome as the system hardware and software.
Operational tests are often expensive. With increasingly constrained budgets, there is enormous pressure to limit the amount of operational testing solely because of cost considerations. Experiments with sparse data cannot produce information with the associated levels of statistical uncertainty and risk traditionally used to support decision making.
When attempted, the incorporation of additional sources of relevant data—before, during, and after operational testing—in the evaluation of complex systems poses methodological and organizational challenges. Methodological challenges arise from the difficulty of combining information from disparate sources using standard evaluation techniques. Such sources include training data on operators
involved in field tests and observational data of in-use situations when they present themselves. Organizational challenges can arise when there is disagreement about the validity of certain types of information or when attempting to gather information in settings (e.g., combat) in which the primary objective is not data collection.
A Continuum of Information Gathering
The development, testing, and evaluation of modern complex systems does not often fall into easily segmented phases that can be assessed separately. Therefore, prospective planning is frequently important in guiding the collection and use of information during the learning and confirmation phases of system development.
The iterative nature of the testing process is especially important for new, one-of-a-kind, state-of-the-art technologies that are often key components of prospective systems, because the specific capabilities of the system cannot be determined completely in advance. Information from early stages of development can provide feedback for recalibrating operational criteria. Without such recalibration, operational testing standards may be set in an unrealistic or unachievable manner. Interestingly, an Office of the Secretary of Defense memorandum requires a link between the measures of effectiveness used in cost and operational effectiveness analysis and in operational testing (Yockey et al., 1992). However, notwithstanding this required linkage, DoD and Congress have placed constraints on the sharing of experimental data between developmental and operational testing.2 (These constraints were imposed to ensure objectivity—and the appearance of objectivity—in operational testing.)
Furthermore, it is important to collect in-use data (e.g., from training exercises or actual combat) on the effectiveness and suitability of a system after it has been deployed. Comparisons of operational test and in-use data can be very instructive; discrepancies might reveal flaws in the operational test design, execution, or analysis. Alternatively, differences may reflect deployment of the system in unanticipated operating environments or with significantly different functional requirements. Reconciliation of these two sources of data can yield valuable information about operational test procedures as well as the system that is now in the field.
Complex Testing Conditions
Complex systems have multiple measures of performance, can operate in many possible scenarios, and require a high degree of interaction between the system and the operator. These characteristics often complicate the application of classic experimental designs in operational testing. Also, uncontrolled and imperfectly controlled scenarios and conditions are part of the testing environment. In most cases, for example, operational testing involves human interaction as part of product or system usage, and the isolation and control of human factors usually cannot be achieved to the same extent as is the case, say, with certain environmental or physical factors.
Public Law 99-661 states, “In the case of a major defense acquisition program, no person employed by the contractor for the system being tested may be involved in the conduct of the operational test and evaluation required under subsection (a).” This has been interpreted to mean that the processing and evaluation of test data must be carried out so that there is no possibility or even appearance of involvement on the part of the system contractor in the operational test in any way other than as it would be involved with the system in combat.
Effects of Constrained Test Resources
Statistical designs typically control a limited number of factors and permit the drawing of conclusions with a degree of statistical confidence that a product or system meets predetermined standards. For test results to possess a certain degree of statistical confidence, these experimental study designs may require sample sizes that exceed the number of available products or systems manufactured, or are infeasible because of cost and budget constraints.
The statistical problems faced in operational testing almost always derive from the need to make acquisition decisions with sparse amounts of data. These decisions involve a higher degree of uncertainty (and, therefore, risk) than is typically desirable, but there is no easy solution to this predicament. Furthermore, acquisition decisions must, by their nature, depend on a variety of subjective inputs in addition to operational test data.
Because there are significant costs to operationally test with enough samples under the wide range of plausible scenarios to provide a level of confidence desired by the public, government agencies, or Congress, it is especially important that the continuum of information derived from data collection, testing, and evaluation be used effectively. The complexities and costs associated with operational testing underscore the need to take full advantage of supplementary sources of information. Such sources include results from developmental testing, operational testing of similar systems or subsystems, and post-acquisition data from training exercises and actual combat. The effective use of supplemental data requires prospective planning during the learning and confirmation phases of system development.
Testing and Evaluation in Nonmilitary Applications
Constructive analogies to defense testing can be drawn from other application areas, such as the development, testing, and approval of pharmaceutical products and medical devices. Use of pilot studies is quite common in these applications. In manufacturing industries, the focus of quality improvement efforts has shifted upstream to the product design and development phases. Frequently heard expressions such as “quality by design” and “do it right the first time” express the new philosophy that quality should be built into the product at the design stage. Statistical methods such as experimental design and reliability engineering are now used up front to compare designs and vendors, as well as to optimize product and process designs. There is also less reliance on end-product testing and inspection to assure quality, especially in settings in which operational testing of the manufactured product cannot feasibly be carried out under all likely scenarios of use.
The panel has not concluded its examination of the parallels between the activities of product design, end-product testing, and information and quality management as practiced by DoD and by private industry or other federal agencies. However, there are certainly other paradigms that are regularly used and have real advantages as compared with the current DoD acquisition process, and thus deserve further examination for their relevance to DoD acquisition.
In this section we have introduced several of the broader principles we will consider in what we have referred to as our proactive mode. In our deliberations, we will apply these principles in attempting to formulate recommendations for improving testing and evaluating of defense systems. Because some of these principles imply very different organized production processes from those used today, we
anticipate that some of our recommendations will be in the form of long-term goals rather than changes that can be implemented in the existing acquisition process.
THIS REPORT AND FUTURE WORK
The remainder of this report presents results of the panel's work to date in five areas being addressed by our working groups:
Use of experimental design in operational testing (Chapter 2)
Testing of software-intensive systems (Chapter 3)
System reliability, availability, and maintainability (Chapter 4)
Use of modeling and simulation in operational testing (Chapter 5)
Efforts toward a taxonomic structure for DoD systems for operational test (Chapter 6)
In addition, five appendices are provided: Appendix A describes in detail the organizational structure of defense acquisition; Appendix B presents a short history of experimental design, with commentary for operational testing; Appendix C addresses the optimal selection of a small number of operational test environments; Appendix D lists the individuals consulted by the panel; and Appendix E provides charts showing the organization of test and evaluation within DoD overall and within the Army.
In view of the study's objectives, there are at least two distinct audiences for this report: the defense testing community and the statistical community. Therefore, we may sometimes present material that is obscure to one audience yet obvious to another. Appendix A and Appendix B are intended to provide relevant background information for the statistical and testing communities, respectively. Of the material in this report, Appendix C is written at the highest mathematical level. Despite the assumptions made about some of our readers, we hope this report will nevertheless advance the general goal of increasing the level of interaction among testers, other defense analysts, and statisticians.
As noted above, further work is required before the panel will be able to offer recommendations. Chapter 2, Chapter 3, Chapter 4, Chapter 5 through Chapter 6, respectively, describe our future planned work in experimental design; software-intensive systems; reliability, availability, and maintainability; modeling and simulation; and development of a taxonomic structure. In addition, the panel expects to undertake several other tasks before issuing its final report.
First, we plan to perform comparisons between the current acquisition-operational testing structure in DoD and its counterparts in (1) other defense communities, such as those in Great Britain, Australia, Israel, France, Japan, and Russia; (2) other nonmilitary federal agencies, such as the National Aeronautics and Space Administration and the Food and Drug Administration; and (3) private industry, such as the automobile, semiconductor, and telephone industries. These three areas are extremely broad, and we aim simply to understand their major components. While the current DoD acquisition world is relatively singular, we hope that by investigating how others have dealt with the difficult problem of developing unique, high-cost, technologically advanced products, we will discover interesting ideas that can be modified for application to the DoD context.
Also, for purposes of background, and to provide a context for readers new to this area, we will prepare a history of operational testing in DoD, focusing on the years 1970 to 1995. It will detail the births (and deaths) of agencies responsible for testing, both developmental and operational; the roles of various advisory and oversight groups; the interaction with Congress; and in general the place and necessity of operational testing in DoD acquisition.
In the area of organizational context we will examine case studies of how systems have progressed from the Mission Needs Statement through final production or termination of the system to see how the acquisition system works and what opportunities exist for change, for example, to incorporate current state-of-the-art industrial practices. To this end, we will study the role of the program manager. We will also look at the current role in the acquisition process of various oversight and consulting groups, such as the Institute for Defense Analyses and the Office of the Director of Operational Test and Evaluation. Finally, we will address issues involving the extent to which additional statisticians, statistical training, or access to expert statisticians would improve operational testing.
Our efforts in this area will include studying recent General Accounting Office reports examining the manner in which systems have progressed through the stages of development (milestones). In addition, we will interview principals involved in the systems identified for case study and general experts in the acquisition process for their perspectives on how the process works and what changes might produce improvements. We will also seek some indication of the degree of statistical training of the various members of the operational testing community.