Read "Statistical Issues in Defense Analysis and Testing: Summary of a Workshop" at NAP.edu

Page 1 Cite

Suggested Citation:"INTRODUCTION AND OVERVIEW." National Research Council. 1994. Statistical Issues in Defense Analysis and Testing: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9686.

×

Introduction and Overview

BACKGROUND

The Committee on National Statistics has had a long-standing goal of helping to develop and encourage the use of state-of-the-art statistical methods across the federal government and, in particular, in expanding their application within the Department of Defense (DoD). This interest, shared by the Committee on Applied and Theoretical Statistics, fortunately converged with some challenging problems in which DoD officials wanted to involve the statistical community. The result was a two-day workshop on statistical issues in defense analysis and testing, held in September 1992 at the National Academy of Sciences.

The workshop covered the evaluation and testing of weapon systems, which are done as part of the acquisition process for all the major systems that DoD uses. The broad range of statistical issues that arise in the process of testing and evaluating major weapon systems prompted the Office of the Director for Operational Testing and Evaluation and the Office of the Assistant Secretary (now Director) for Program Analysis and Evaluation in the department to sponsor the workshop with the goal of improving the statistical methods used in DoD.

The workshop was structured to bring statisticians together with defense analysts in a setting that allowed the statisticians to understand the real-world issues and problems sufficiently to suggest how statistical thinking could usefully contribute and then to articulate the relevant methods and

Page 2 Cite

Suggested Citation:"INTRODUCTION AND OVERVIEW." National Research Council. 1994. Statistical Issues in Defense Analysis and Testing: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9686.

×

principles. For each workshop session, one or more defense analysts prepared papers that were circulated in advance. These papers were a primer on the topic of the session and its associated statistical problems. Appendix A lists the sessions and their participants. For each of the six sessions, two statisticians were asked to prepare written comments in advance on that session's papers. Appendix B lists the briefing papers prepared for the workshop; these papers are occasionally referred to explicitly throughout this report. All 12 invited statistical commentators were also encouraged to read all the papers and the written comments in advance. After a brief period of learning one another's terminology, analysts from both the defense and the statistical communities succeeded to a remarkable degree in communicating effectively with each other about a variety of statistical issues in weapon system evaluation and testing.

THE WEAPON SYSTEM EVALUATION AND TESTING PROCESS

Key steps in the evaluation and testing process are (1) cost and operational effectiveness analysis, known as COEA; (2) developmental testing; and (3) operational test and evaluation, called OT&E. These occur as a weapon system is being developed, but before it goes into full-scale production. Regular reviews by decision makers at designated steps, called milestones, assess the performance potential and costs of the system.

COEAs are submitted or updated by the military services at the milestones in the acquisition process to evaluate the costs and benefits of new systems and their alternatives in terms of operational effectiveness in meeting defense needs. COEAs frequently include simulation experiments, analysis of which requires the use of appropriate statistical methods to make comparisons between alternatives.

Early testing during a weapon system development program (developmental testing) covers a wide range that includes component testing, modeling and simulation of anticipated performance, and engineering systems testing. Developmental testing presents the first opportunity to measure the performance and effectiveness of the system against the criteria developed during the COEA. This stage may include a period of combined operational and developmental testing (OT/DT) that is conducted by the interested branch of service, thus providing input from the prospective user during system development.

OT&E is the process that the Department of Defense uses to assess whether the weapon system actually meets its planned capability before deciding whether to begin full-scale production. By law, the director of OT &E must report to Congress on these tests and, in particular, state whether the testing and evaluation performed were adequate and whether the results confirm that the items or components actually tested are effective and suit-

Page 3 Cite

Suggested Citation:"INTRODUCTION AND OVERVIEW." National Research Council. 1994. Statistical Issues in Defense Analysis and Testing: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9686.

×

able for combat. Besides addressing the oft-asked question “How much testing is enough?” workshop participants covered a gamut of statistical issues related to testing.

CRITICAL PROBLEMS IN THE TESTING AND EVALUATION PROCESS

The participants from the defense community described a wide range of issues and problems for which they thought statisticians could contribute ideas and possible solutions. Reviewing in depth the entire set of issues and problems is beyond the scope of this report, but the list below suggests the critical “big ten” (U.S. Department of Defense, 1992) that OT&E staff are encouraged to keep in mind; they were referred to in Robert Duncan's opening remarks:

Evaluations of test results are too optimistic.
Reports to Congress are incomplete and inaccurate.
Problems and limitations in operational tests are not reported.
Operational testers have higher skill levels than regular users.
Testing is not realistic and objective.
Not enough stress is imposed on equipment and personnel in tests.
On-site observation by OT&E staff is infrequent.
OT&E reports are based primarily on the military service's test reports (i.e., are not independent of the services).
The test resources provided are inadequate.
Differences between operational testing and developmental testing are blurred.

Whether these criticisms are wholly valid or not, they convey a sense of the context that workshop participants had in mind when proposing new methods and approaches.

CASE STUDIES

During the workshop discussion, a number of statistical themes emerged as particularly relevant to current issues in defense analysis and testing. These themes are outlined in the next section and form the structure for the remainder of the report. Before outlining these themes, we briefly describe two examples discussed in the workshop papers. It should become clear that general modeling issues, as well as purely statistical issues, figure prominently in this work. We refer to these and other case studies for illustration at appropriate places in the report.

Page 4 Cite

Suggested Citation:"INTRODUCTION AND OVERVIEW." National Research Council. 1994. Statistical Issues in Defense Analysis and Testing: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9686.

×

Case Study #1

Consideration of the requirement for a new medium-weight antitank system led to development of the Infantry Anti-Armor Weapon System —Medium (AAWS-M, now called the Javelin system). As described by Cyrus Staniec (Appendix B), assessment of the system's cost-effectiveness required evaluation over more than 200 distinct scenarios that reflected the problem's many dimensions. The COEA had to address issues related to the mix and application of other systems on the battlefield. These issues included trade-offs against alternative systems—e.g., heavy antitank weapons and certain air defense weapons (that might also serve an antiarmor role)—and the effects of battlefield countermeasures on the systems under consideration. In addition, the AAWS-M had to be studied in both mechanized force and light infantry contexts, in both offensive and defensive combat, against various types of threats, and in different combat environments (e.g., Europe, Southeast Asia).

The number and nature of cases to be studied required the use of several different combat simulation models with different attributes (e.g., level of resolution, stochastic versus deterministic). Effectiveness was measured in terms of exchange ratios—i.e., enemy versus friendly losses. Point estimates of costs were combined with the effectiveness analysis to draw conclusions.

A number of questions raised by this study are essentially statistical: How to compare results across different scenarios? How to assess the validity of the combat simulation models? How to identify and characterize sensitivity to model assumptions? How to quantify and incorporate uncertainties in cost estimates? How to determine which scenarios are critical in designing operational tests?

Case Study #2

In the context of developmental and operational testing, Charles Horton (Appendix B) described the following case study involving the Sense and Destroy Armor (SADARM) Artillery Munitions:

SADARM is in the class of smart conventional artillery munitions currently being developed by the Army. The primary target of these munitions is a self-propelled howitzer (SPH) (or, possibly, a battery of SPH). The threat SPH is detected when it fires and [is] located by an artillery locating radar, FIREFINDER. There are two packaging variants: a 155mm projectile with two submunitions per projectile, and a Multiple Launched Rocket System (MLRS) with six submunitions per rocket. The submunitions are delivered by their respective carriers, are expelled over an aim point, and, as they fall, use a combination of millimeter wave radar and infared sensors to detect and classify an appropriate target. An explosively formed

Page 5 Cite

Suggested Citation:"INTRODUCTION AND OVERVIEW." National Research Council. 1994. Statistical Issues in Defense Analysis and Testing: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9686.

×

projectile is fired by the submunitions and penetrates the top of its intended target.

For evaluating the effectiveness of the SADARM munitions, the following criteria are extracted from the Required Operational Capability document (Horton, Appendix B):

The 155mm SADARM will be required to achieve a s expected number of howitzer kills with t desired against a Soviet 152mm self-propelled howitzer (SPH) battery with four projectiles. The MLRS SADARM will be required to acheive a u expected number of howitzer kills with v desired against a Soviet 152mm SPH battery. The requirements apply to a 152mm SPH battery (8 guns) in the open in a European environment on a summer day. Aim point is a flank platoon center, and howitzers are assumed to have been exercised and fired with[in] the last hour.

The document also provides for a maximum degradation of w percent because of countermeasures (s, t, u,v, and w are classified values).

In designing the operational test, one consideration was the required sample size. The original decision was to test 72 SADARM 152mm projectiles. This value was determined from operating characteristic curves. Such curves are often used during acceptance sampling of continuously manufactured items to monitor product quality. One consequence of this sample size was that, within the structure of a statistical hypothesis test, the chances of decision errors were relatively high. In particular, the conditional probabilities of either rejecting a “good” system or accepting a “bad” system were both .50. Subsequently, the sample size was increased to 216 projectiles in order to reduce these error probabilities to .20.

The unit costs for MLRS warheads and 155mm projectiles were estimated (in 1990) to be approximately $80,000 and $20,000, respectively, based on full production of the system. (Projectiles for the operational test were estimated to cost between $60,000 and $80,000 per unit.) The total costs for research, development, and procurement of the SADARM system were estimated at $4.7 billion. Hence, the cost of supplying units for operational testing represents less than 1 percent of the total program cost.

This case study provokes questions about whether the analytic framework being used is appropriate for designing this operational test. Should one begin with a presumption about whether the system is good or bad? What are the consequences of making incorrect decisions based on the operational test? Would conducting a series of smaller tests, instead of one larger test, be more cost-effective? Statistical science may help in finding answers to these questions and many others that arise in defense analysis and testing.

Page 6 Cite

Suggested Citation:"INTRODUCTION AND OVERVIEW." National Research Council. 1994. Statistical Issues in Defense Analysis and Testing: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9686.

×

STATISTICAL ISSUES ADDRESSED IN THE WORKSHOP

We summarize below the major statistical issues that were covered in the workshop. The overarching theme was that using more appropriate statistical approaches could improve the evaluation of weapon systems in the DoD acquisition process. See Samaniego (1993) for one participant 's summary of some of the important issues.

Experimental Design. Because testing is expensive and potentially dangerous, it is important that tests be designed to permit the efficient collection and analysis of test data. Statistical principles of experimental design allow more informed choices in scenarios for tests and, more generally, clarify trade-offs among the various ways in which limited testing resources can be allocated.
Sources of Variability. Informed decision making requires understanding all sources of variability in an analysis. More formal attention to analysis of sensitivity to model assumptions, validation of models, sampling and nonsampling sources of error, and selection biases would contribute to this goal.
Communicating Uncertainty. The analyst's responsibility in presenting results is to ensure that the uncertainties from an analysis are reported to decision makers. Several suggestions were made to encourage explicit ways of presenting uncertainty, including graphical methods. Such information should be presented in a way that avoids technical jargon and makes policy implications clear.
Linking and Effectively Using Information. Operational tests are currently viewed as starting with a blank slate. Statistical methods of combining information and borrowing strength across experiments (e.g., hierarchical Bayes and empirical Bayes models) could be employed to use information from earlier stages in designing and analyzing operational tests. This would result in more informative operational tests.
Pitfalls of the Hypothesis Testing Paradigm. The classical statistical hypothesis testing framework is commonly adopted to evaluate a system against performance criteria using results from its operational test. This is problematic, because the asymmetry of significance tests leads to unproductive arguments about what the null hypothesis should be and, hence, where the burden of proof lies in the testing process. More neutral approaches from statistical decision theory would be more appropriate.
Cooperation Versus Advocacy. An idea that was expressed often is to encourage moving away from the present advocacy environment surrounding quality assurance, in which one party or another is characterized as being at fault. Several speakers urged moving toward a more neutral and cooperative environment in managing quality in weapon systems with an

Page 7 Cite

Suggested Citation:"INTRODUCTION AND OVERVIEW." National Research Council. 1994. Statistical Issues in Defense Analysis and Testing: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/9686.

×

emphasis on achieving consistent improvement rather than clearing interim hurdles at program milestones.

Data Storage and Use. Data are a precious resource in the Defense Department and could be used more effectively. Creating new capability in data archiving and management—e.g., developing a relational data base—coupled with a statistical and data analysis unit could improve the DoD's use of data.