A Potential Taxonomy for Operational Tests
Over the course of this study, the panel has heard presentations on the following types of defense systems: airport runway repair systems, software systems for personnel administration, radar-assisted combat helicopters, shoulder-fired anti-tank missiles, radar jammers, systems for prioritizing strategic threats, and operating systems designed to run other software. Systems can make use of entirely new technology, or well-tested technology that has worked well on similar systems. They can be slight upgrades or embody entirely new capabilities. They can be 100 percent software or 100 percent hardware (though all recent ACAT I systems contain software components). There are systems whose failure could result in loss of life. One would expect that operational tests for systems this varied should take on very different forms. It is necessary to consider a new system's characteristics to select an appropriate operational test design.
Certainly, when a member of the relevant service test agency is called upon to design a test for a system, if a similar system could be discovered that was considered to have been well-tested in the past, it would make sense to borrow certain features of that previous test to use on the new system, especially if the previous system was tested relatively recently. Unfortunately, it would seem that a taxonomy that could encompass such diversity would necessarily have a large number of cells, and that would require a large number of well-designed, fully analyzed tests that covered the range of DoD operational test experience. On the other hand, taxonomies that tried to avoid this difficulty by collapsing cells to form aggregate cells would run the risk of recommending tests that were not
appropriate since there would be too much heterogeneity within cells. These two requirements for few cells and homogeneous cells run against each other. The panel presents an attempt at the creation of a practical taxonomy, but this is only a preliminary attempt at such a construct. It is important to keep in mind that the kinds of military systems under development is an extremely dynamic process. The systems that are under development today are tremendously different from those developed in the 1950s. Therefore, any taxonomy would have to be reevaluated on a regular basis, and flexibility of the implementation of such a taxonomy would be very important. Further, there is no reason why a taxonomy is the preferable construct. Another possibility is a type of checklist, where answers to a succession of questions about system characteristics add different types of features to the recommended operational test.
Following is a discussion of characteristics that may affect the nature of the operational test of a military system. Boxes C-1 and C-2 list some of the many characteristics considered for inclusion in our taxonomy. These potential taxonomy dimensions have been divided into two broad categories, (1) those which have a broader application but probably do not have a direct bearing on the operational test, and (2) those dimensions which may have a direct bearing on how an operational test is designed and how preparations are made for it.
The panel selected three characteristics that seem important for test design. The taxonomy is designed to serve the following purposes:
Reflect the prevalence of various types of systems;
Highlight attributes that might call for different statistical approaches, affect decision trade-offs, or involve qualitatively different consequences; and
Provide a framework for developing the test scenarios.
A USEFUL TAXONOMY
The panel has developed a taxonomy for defense systems for the purpose of classifying systems by their operational test design needs rather than the uses to which the system will be put. For example, if the test issues and the type of data collected to address those issues are similar for a missile and a combat aircraft, then the taxonomy should put these two systems together in the same category. On the other hand, if say, a telephone for administrative use and a similar telephone for transmissions of intelligence information have quite different test issues, then the taxonomy should put them in different categories.
The panel's taxonomy has a broader conceptual base and utility than test design, however. A proper taxonomy entails notions of the loss function underlying the decision of whether to enter into full-rate production—since that is the purpose of operational test—and as discussed above, this decision directly involves the issue of test benefit versus test cost. To understand the benefit gained from the use of various test scenarios, one must consider the likely variability of
the system across environments, threats, tactics, and doctrine, across prototypes, across levels of training of the users involved, etc. It forces the developer of the taxonomy to think statistically, that is where is the variability of system performance likely to lie? So the goal of a taxonomy is a way of formalizing a decomposition of variance of the system as a function of various characteristics of the situations in which the system might be used.
Characteristics that are considered important in some systems are represented by dimensions of the taxonomy. For example, there are systems for which the environment presents real stress, and systems for which it does not. Therefore, sensitivity to environment is a potential dimension of the taxonomy. Likewise, characteristics that are considered less important for most systems are ignored in the taxonomy. There is finally the fact that the population size of each cell in the taxonomy will differ considerably. So, to take an extreme example, if all systems but one are insensitive to direct sunlight, it is unreasonable to expect the taxonomy to have a sunlight dimension to accommodate a single system. Any taxonomy has to be used with a view towards its limitations. These notions of what factors are related to system response, and how many systems are affected by a given factor, governed the panel in its attempt to create the taxonomy, and assisted in the goal of creating a taxonomy with few cells, but enough to create relatively homogeneous cells.
The panel used the following criteria to develop its taxonomy. These three criteria should be used in planning, conducting, and evaluating a test.
Skill level required of the people part of the system. Skill levels may be:
Highly skilled with extensive, multifaceted training. Such systems include aircraft systems, naval combat vessels, and armored ground combat systems.
Highly skilled in a single or limited number of actions. Such systems include individual weapons, ground and sea transportation systems, communication systems, and radar systems.
Little skill required. In this case, training is not a significant issue. Such systems may be clothing, rations, or temporary shelter systems.
Nature of opposition. Often in the use of a military system there is an opposition force whose mission is to degrade or prevent the effective performance of the system under test. We will divide the nature of this opposition into three categories: active, passive, and none.
Active opposition. This is used in an operational test of a system designed for combat operations. The nature of the test is usually but not always a force-on-force combat test in which there is a manned active opposing force.
Passive opposition. This category is for scripted unmanned opposition which is under total control of the test designer. Included among this type of opposition is scripted jamming of a communication network and the emplacement of an enemy minefield.
No opposition. Included are items which are tested with no opposition and include such things as clothing, rations, and most information management and transportation systems.
Effect of a system failure. The nature of the seriousness of a system failure depends upon several factors. If the failure occurred during a mission, was it a (1) total system failure (loss of system forever), (2) critical component failure (causes immediate withdrawal), (3) component failure with ability to continue mission in a degraded mode, or (4) component failure which will not degrade mission capability. If the failure occurred while not on a mission, can the system embark upon the next mission? If so, will it be in a degraded mode or can it continue to function without degradation?
Also included in this criterion is the effect of a failure on the mission. For example, if a system failed during a mission, the total loss of that system for that mission may have a significant effect while the ability of the system to continue after a component failure would probably have little or no effect. Combining these ideas, we can divide system failures into the following categories:
Catastrophic. A failure would probably cause a total failure of the mission; or would represent the loss of that system forever.
Serious. A failure would seriously degrade the chance of mission success or would represent the loss of that system for the duration of that mission.
Significant. A failure would probably degrade the mission, or would cause a delay in the accomplishment of the mission.
Minor. Would have little or no effect on mission success. May cause inconvenience or create additional cost.
The Effect on Test Design
The Effect of Training on an Operational Test
One of the major factors to be considered in the design of a test is the control and evaluation of the variability of the dependent variables; i.e., the measures which indicate how a system performed. In addition to the hardware and software, a ''system" also includes the people and how the system is employed. A great source of variation in the performance of most systems is the variability in human behavior. The degree to which human variability can be understood and minimized will have a large effect on the reliability of the overall test data.
At the very outset of test planning, provision should be made to minimize the effect of human variability. This can best be done by planning for appropriate training of the people operating the systems. The higher the skill level required, the greater the importance of adequate training. For those systems requiring highly skilled operators, testing should be conducted to assess whether the training is adequate. Force development tests and experiments are excellent tools for the assessment of the adequacy of training. There are times when even a "golden
crew," an extremely competent crew, is useful to examine the upper limit of the performance of a system. The data generated by such crews should be used carefully and not be advertised as the "expected" performance of a system.
Finally, when there is an active opposing force, these crews should also be well trained, and they should present an opposition of as near as possible equal quality in each trial. Variability in opposing force performance causes unwanted variability in the measured performance in the system under test. This leads us to the next criterion.
The Effect of Opposing Force on an Operational Test
The sources of variability in the outcome of a force-on-force battle are legion—so much so that in most operational tests, trial based data (such as exchange ratios) seldom behave in a manner that lend themselves to reliable statistical measures. Some sources of variation can be controlled (time of day, mission, terrain, etc.), others cannot be controlled but can be measured (temperature, accidents, failures, etc.), and still others are neither controlled nor measurable (interpersonal relations, inattention, etc.). All of these affect the active opposition in a similar manner to how they affect the system under test. In such tests, the designer should concentrate on three things: control unwanted variability to the extent possible, design for event based data rather than trial based data,1 and design for evaluation of the system using qualitative measures.
In cases of passive opposition, the nature of the opposition should be designed into the test and scripted exactly to assure no unwanted variability is produced by the opposition. Often in such tests, sufficient data can be generated to do an adequate statistical analysis.
When there is no opposition, the variability of the results comes solely from the variability of the system under test and its environment. Most measures from such tests lend themselves readily to statistical analysis.
The Effect of Failures in an Operational Test
Simply put, a failure is not necessarily a failure. The effect of each failure should be considered in the evaluation of reliability, availability, maintainability (RAM) data. In the scoring conference (the evaluation of RAM data) this effect should be considered. In an aircraft system should a failure of the heating system, a failure of the RADAR warning device, a failure of a missile to launch when it should have, and an engine failure (in a single engine aircraft!) all be treated
alike? Provision should be made in the test plan to evaluate RAM failures according to some taxonomy similar to that given above.
Figure C-1 is a graphic representation of a proposed taxonomy for operational test that the panel believes represents a good first step towards the development of a taxonomy that categorizes defense systems by the type of operational test that is needed. If such a taxonomy is found useful, to operationalize this taxonomy an example system in each cell would have to be identified that had gone through operational test recently, and a well-designed test would have to be drawn up through a collaboration of test experts in the relevant test service agency, DOT&E, IDA and possibly statistical experts from academia.2 Then, new entrants to each cell could make use of minor adjustments to operational test designs of the leading case in the cell. Certainly, a feedback system could modify the test design of the leading cases as deficiencies in tests of other cell members were discovered.
To repeat some cautionary statements from above, this attempt at a taxonomy is only a preliminary step. Again, the types of military systems under development is an extremely dynamic process and therefore, any taxonomy would have to be re-evaluated on a regular basis. Finally, there is no reason why a taxonomy must be the preferable construct: the formation of checklists for this purpose should also be investigated.
Figure C-2 places a few defense systems in some of the categories. The intent is to demonstrate the kinds of systems that would fit into some cells. We are not proposing that they be seen as the systems on which one would base other operational tests.