Increasing Access to Statistical Expertise for Operational Testing
As we argue throughout this report, operational testing and, more broadly, the process of system development are activities that can greatly benefit from the (further) application of statistical methods and statistical principles. The panel repeatedly learned of operational tests for which additional statistical knowledge and expertise could have been used to improve test design and evaluation. Examples of missing statistical knowledge include:
applying principles of optimal experimental design to non-standard situations,
methods for combining information from developmental test and tests of other systems with the results of operational test;
state-of-the-art validation of simulation models for augmenting operational testing;
use of Markov chain methods for software testing; and
use of such techniques as non-homogeneous Poisson processes to model time between failures for a defense system.
As detailed in Chapter 3, the acquisition process itself would benefit from the changes that statistically based principles have brought about in many industrial applications of system development.
The panel is aware of some efforts in the DoD test community to make greater use of the statistical knowledge and tools available to it and to increase the statistical resources that are at hand. Courses in statistics are made available to staff at the service test agencies, particularly the Air Force Operational Test and
Evaluation Center, in relevant areas such as experimental design and reliability theory. The service test agencies, particularly the Army Operational Test and Evaluation Command, make use of statistical consultants. Also, DOT&E has access to expert statistical assistance at the Institute for Defense Analyses. All the service test agencies have military staff who operate with both dedication and professionalism, but they operate in the context of statistical training that prepares them to apply standard methodology, rather than to produce customized solutions as needed. Finally, the service test agencies make use of and occasionally develop statistical software to help in test design and evaluation. RAPTOR, developed at the Air Force Operational Test and Evaluation Center and used to evaluate the reliability of component systems, is a particularly relevant and impressive example of this.
However, the DoD test community generally has limited access to, and makes little use of, individuals who have highly advanced training in statistics, specifically, the level of training that is typical of a doctorate from a graduate program in statistics. The panel knows of only one Ph.D.-level statistician who is a full-time employee in any of the three largest service test agencies at this time. The service test agencies also do not make enough use of the statistical expertise at the Naval Postgraduate School, the Center for Naval Analysis, the Institute for Defense Analyses (through DOT&E), Aerospace, RAND, and other similar institutions. It appears that the Army, Navy, and Air Force test agencies rarely consult with academic statisticians, even for test design and evaluation issues concerning multibillion dollar systems.
CONSTRAINTS ON ACCESS TO STATISTICAL EXPERTISE
Some of the limited interaction with statistical experts is understandable. First, as stated above, the problems in design and evaluation are heavily constrained by various factors, including test designs that are constrained by budgets to have small sample sizes, test facility scheduling constraints, and test facility limitations. Navy tests seem particularly constrained. Evaluations are often limited by time, and they are focused on the calculation of means and percentages and, sometimes, significance tests for identified measures to be used in the decision regarding whether to enter into full-rate production; this focus reduces incentives for more thorough and sophisticated analyses. These constraints can at times limit the utility gained through interaction with statistical experts. (We note, however, that many of the constraints would disappear with adoption of a test and acquisition strategy recommended in Chapter 3.) Yet these constraints can also increase the value of interactions with statistical experts, since constraints present non-standard design problems; budgetary limitations make efficient test design even more important; and evaluations can be expedited by
making use of effective diagnostics that quickly identify data features worth investigating.
Second, military staff rotate from one assignment to another as they are promoted, and it is not reasonable for them to have comprehensive or advanced training in statistics, which would necessarily limit their education for other areas important to their success in later assignments. Therefore, we believe the military staff of service test agencies cannot have appreciably greater statistical training than they currently do; however, the civilian work force has no such limitation.
Third, to be of assistance in any applied setting and work with system specialists, particularly in defense testing, a consulting statistician would have to be knowledgeable about DoD systems, how these systems are used, and DoD procedures. A key example of when the interaction between statistician and subject matter specialist would be most useful is in identifying design flaws in a system under development. Statistics can be extremely useful towards this goal, though significant progress would depend on close interaction of a statistician, system specialists, and operational testers. Since knowledge of defense systems and procedures could take quite a while to acquire, a long-term commitment would be needed on the part of, say, a service test agency, and a statistical consultant. Given the uneven demand for statistical expertise in any given program, such a commitment would need to be carefully considered.
Fourth, there is the widespread belief in the DoD test community that statisticians will only recommend unaffordably large operational tests, based on arguments related to tests with sufficient power, regardless of the costs of test units; that they will ask for too much time for evaluation, regardless of the need for a timely decision; and, more broadly, that the discipline of statistics is generally useful only for large sample size questions. The last point, in particular, does not reflect awareness of the most up-to-date statistical advances. One of the themes of this report is that, through careful modeling and efficient experimental design or both, it is possible to achieve substantial resources savings and, in particular, a reduction in the sample size required for a fixed level of precision.
While the panel accepts the first three arguments above as possible constraints on the availability and utility of statistical expertise, it remains that test design, test evaluation, and system development are all activities that are informed through the application of statistical techniques and principles. We therefore argue that the DoD test and acquisition community needs to develop increased interaction with statistical expertise at the doctorate level. Although some of the technical issues discussed in this report are fairly routine, many require methods that are either not presented until late in graduate work or are current research problems. Therefore, the need for interaction with expert statisticians is clear and currently unmet. This lack of high-level statistical advice and interaction in the DoD test community also reduces the chances that when this type of expertise is needed in new application areas, it will be recognized.
Conclusion 10.1: The level of statistical expertise in the service test agencies and at the Director, Operational Test and Evaluation is currently inadequate to effectively carry out their missions.
There are a variety of ways to enhance access to statistical expertise in the defense testing community, considering the constraints discussed above. More statistical expertise can come in at least four forms: (1) increased training of current staff, (2) increased hiring of expert (master's- and doctorate-level) statisticians, (3) increased use of statistical consultants including those available through federally financed research and development centers, government laboratories, and supporting institutions, and (4) increased hiring of temporary staff through interagency professional agreements, use of temporary openings for academic statisticians on sabbaticals, and formation of fellowships in conjunction with groups such as the American Statistical Association.
Increased access to statistical expertise and more staff expertise in statistics can be instituted at several places: DOT&E, the service test agencies, and institutions such as the Institute for Defense Analyses, the Naval Postgraduate School, Aerospace, the Center for Naval Analysis, Lincoln Laboratories, and universities. Since there are four methods for enhancing statistical expertise and three types of places where it can occur, there are a number of possibilities for new approaches. As discussed above, when considering anything other than training existing staff, a key problem is that for individuals to be useful they must know more than statistics; they must also have some knowledge of the defense acquisition system and the system under test. They should also be somewhat familiar with the physics of combat systems and the realities of military operations. It would also be extremely valuable for the individuals to have some training in physics and engineering. (While this need for some subject-matter expertise is true in most applied settings of statistics—e.g., biostatisticians usually need to be familiar with medicine—it is extremely important in this situation.) We are confident that creative solutions to this problem can be developed by the defense testing and academic communities.
The development of statistical expertise for the defense testing agencies need not be focused exclusively on test and evaluation. The interaction of the defense and statistical communities, in general, is less than it could be; one approach could be the establishment of a defense statistical consulting unit within the Department of Defense, where test design and evaluation would be one among several of its primary responsibilities. Another approach may come from the academic community: for example, the Georgia Institute of Technology has recently established the Test and Evaluation Research and Education Center, a program that will be conducive to various kinds of interactions with the defense testing community, such as sabbaticals for visiting faculty at federally financed
research and development centers and sabbaticals for members of the defense testing community at the university. In addition, graduates of this program could be provided with substantial statistical training and other expertise to prepare them for employment at service test agencies. One possible use of academic statisticians would be to assist in developing and advancing educational programs on test design and evaluation for professional staffs of service test agencies and monitoring their implementation.
Finally, other things being equal, the closer the statistical expertise is to those involved with test and evaluation, the more likely it is that they will be used.
Recommendation 10.1: The service test agencies should place greater emphasis on the statistical training of their non-military staff, especially in the areas of experimental design, reliability theory, data analysis, use of statistical software, and total quality management.
In addition to increasing the availability of statistical expertise in the defense testing community, it is also desirable to increase the familiarity of defense decision makers, particularly those in the acquisition process, with statistical concepts and the benefits that advanced statistical applications can provide to improving the process of defense acquisition.
There is some evidence that improving the decision makers' understanding of the application of statistical principles could reduce the likelihood that they would unintentionally approve goals that almost guarantee poor operational test and evaluation results. One straightforward example was the approval of an original objective for a jamming system that would improve aircraft survivability by X percent over the performance without the jamming system: The objective was impossible to meet because the application of the criterion would have required achieving a survivability greater than 100 percent. A second example was the setting of a requirement that a replacement item be ''at least as reliable as the item it was to replace and that this requirement be met with a confidence factor of Y percent." The unintended result was that the developers of the replacement system had to strive for an extremely high design reliability, potentially at an unreasonably high cost, in an attempt to achieve the combination of the reliability and confidence specified. If the approval authorities of the requirements for the replacement item had better understood the statistical implications of what they were requiring the designers to achieve, in all likelihood they would have made a different decision. In the second example above, the performance standard was inadvertently set too high; in the first example it was impossible to meet. Developing a better appreciation of statistical principles could help DoD acquisition decision makers both avoid making bad decisions and setting unachievable goals and improve the quality of test plans and evaluations provided to them.
Over two decades ago, Deputy Secretary of Defense David Packard expressed major concern about the growth of cost estimates during the system
development process and with the poor initial estimates. As a result he required the use of parametric cost estimates1 to test the reasonableness of estimates provided when the Department was making major commitments of funds for development and initial production (Packard, 1971). This led Secretary of Defense Melvin Laird to establish the Cost Analysis Improvement Group (CAIG) as part of his office (Laird, 1972); it continues to function well today. The initial CAIG objectives were to review the cost estimates provided to the Defense Systems Acquisition Review Council (the predecessor to the current Defense Acquisition Board) and to develop uniform criteria to be used by all DoD units making such cost estimates. By the time DoD Directive 5000.4 provided a permanent charter for the CAIG, the membership had been expanded beyond the five members from the Office of the Secretary to include service members appointed by the secretaries of the individual military departments (U.S. Department of Defense, 1973). We mention this story because we believe that some of the concepts applied to improving cost analysis can be applied to improving statistical analysis for acquisition programs.
Specifically, we believe it would be desirable for the Department of Defense to form a small group of the best statistically trained individuals available in the Office of the Secretary and in the individual services who are involved in the acquisition process, to:
review, comment, and provide advice on key system specific acquisition proposals and documents that can affect the outcome of tests and evaluations for major defense acquisition programs; and
develop criteria and provide advice on statistical criteria to be used by all DoD components responsible for setting goals and testing systems against such goals in the DoD acquisition process.
It is suggested that such a group be given a permanent charter and that it be responsible primarily to the Undersecretary of Defense for Acquisition and Technology, who can direct the group's support to any part of the acquisition community. Such a group needs to be large enough and stable enough to maintain continuity over time in methods and advice. The specifics of the charter for such a Statistical Analysis Improvement Group should be developed by key personnel in the office of the Secretary of Defense:
Recommendation 10.2: DoD should create a Statistical Analysis Improvement Group in the Department of Defense (using existing person
nel on a part-time basis) to support the Under Secretary of Defense (Acquisition and Technology) in applying the best statistical principles in the acquisition of defense systems.
A FINAL ISSUE: SPONSORED RESEARCH
There are a number of military organizations with responsibility for sponsoring research on technical areas with potential military applications. Among the most active and visible of these are the Army Research Office (ARO), the Office of Naval Research (ONR), and the Air Force Office of Scientific Research (AFOSR). Over many years, these agencies have played important leadership roles in the initiation of relevant research in the mathematical and physical sciences and in several key areas in the engineering sciences. Of particular interest to the panel are the policies, practices, and priorities of these agencies in sponsoring statistical research.
These agencies have, over the past decade, given little emphasis to the development of appropriate statistical methods relevant to the developmental and operational testing associated with the Department of Defense's acquisitions programs. In fact, there seems to have been a substantial change in their priorities in this regard. The Army Design of Experiments Conference, instigated by Samuel Wilks in the early 1950s and sustained by ARO for some 40 years, has been discontinued. Similarly, AFOSR, which sponsored a number of major research initiatives in the fields of reliability and stochastic processes in the 1970s and 1980s, now makes fairly modest investments in research in these areas. ONR announced in the mid- 1980s that it would no longer designate the area of reliability as an area of emphasis; this was soon followed by a sharp reduction in ONR funding for research in reliability. The other agencies have since followed suit. There is a clear need to modernize statistical practices in the OT&E community, and the research that might facilitate the needed advances is today happening by accident rather than by design.
As is evident to readers of this report, there are a number of statistical subfields that are central to modern test and evaluation activity in military acquisitions. They include:
the design of experiments;
reliability theory, and more generally, the "quality sciences," including standard suitability issues such as availability and maintainability and areas such as statistical process control;
validation of modeling and simulation;
Bayesian methods; and
The panel has noted that there are deficiencies, some quite serious in magnitude, in current knowledge and practices in these subject areas vis-à-vis military testing and evaluation. These are some areas that could be emphasized by agencies such as the ARO, ONR, and AFOSR.
It is not appropriate to call for specific work in areas like experimental design, reliability, or software testing without having a clear notion about the potential value of different contributions in these areas. An effective program of sponsored research would require prior input from both the potential users of research findings and the research community. It would require specifying the innovations that practitioners need and the theoretical developments required to address the applications of interest.
There is a fundamental need for a collaborative effort between the military operational test and evaluation community and the statistical research community directed at defining new research initiatives. Developing such a research program, including preferred providers in government, industry, and academia would be a good continuing task for the Statistical Analysis Improvement Group that we recommend be established. One of its missions should be to advise the Undersecretary of Defense for Acquisition and Technology on the priorities of research on technical issues raised by the developmental and operational testing of defense systems and the potential sources for such research. At the same time, we suggest that the ARO, ONR, and AFOSR, consider increasing the priority of research on technical issues raised by the developmental and operational testing of defense systems. These applications are extremely important and might greatly benefit by new advances in statistical and related methodology. Sponsored research can be an extremely useful tool, for generating progress on many of the important issues described here, and it has not been used particularly effectively in recent years. Support of both external sponsored research and internal statistical methods research is critical if the services are to embrace improved statistical methods in designing and interpreting test and evaluation activities.