The testing problems raised by operational testing of military systems may be quite different depending on the nature of the system under test. For example, systems that have a major software component create different testing problems than systems that are chiefly hardware. Systems whose failure could result in loss of life must be tested differently from those whose failure could not. Systems that are modest improvements over existing systems raise issues different from those that embody entirely new technologies. A number of attributes of military systems necessitate differing approaches to operational testing and create distinctions that should be kept in mind when applying statistical techniques. Because of the many different factors that need to be considered, the panel decided it would be worthwhile to consider the development of a scheme for classifying weapon systems and weapon system testing issues.
The utility of examining aspects of operational tests that are linked to features of systems under test is clear; however, attempts to progress have raised fundamental issues regarding its scope, depth, and structure. This chapter is intended to raise some of these issues and to promote discussion. It begins by presenting the results of our preliminary work toward developing a taxonomic structure and then briefly describes our planned future activities in this area.
The objective of the panel's efforts in this area is to develop a taxonomic structure that can support, and help structure, analyses of the use of statistical techniques for the efficient testing and evaluation, especially operational testing, of military systems. The term “taxonomic structure” is used to emphasize that the exact nature of any proposed scheme is still under consideration, and will evolve as the work proceeds. The structure, when developed, should serve the following purposes:
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 41
Statistical Methods for Testing and Evaluating Defense Systems: Interim Report 6 Efforts Toward a Taxonomic Structure of DoD Systems for Operational Testing The testing problems raised by operational testing of military systems may be quite different depending on the nature of the system under test. For example, systems that have a major software component create different testing problems than systems that are chiefly hardware. Systems whose failure could result in loss of life must be tested differently from those whose failure could not. Systems that are modest improvements over existing systems raise issues different from those that embody entirely new technologies. A number of attributes of military systems necessitate differing approaches to operational testing and create distinctions that should be kept in mind when applying statistical techniques. Because of the many different factors that need to be considered, the panel decided it would be worthwhile to consider the development of a scheme for classifying weapon systems and weapon system testing issues. The utility of examining aspects of operational tests that are linked to features of systems under test is clear; however, attempts to progress have raised fundamental issues regarding its scope, depth, and structure. This chapter is intended to raise some of these issues and to promote discussion. It begins by presenting the results of our preliminary work toward developing a taxonomic structure and then briefly describes our planned future activities in this area. PRELIMINARY WORK TOWARD A TAXONOMIC STRUCTURE The objective of the panel's efforts in this area is to develop a taxonomic structure that can support, and help structure, analyses of the use of statistical techniques for the efficient testing and evaluation, especially operational testing, of military systems. The term “taxonomic structure” is used to emphasize that the exact nature of any proposed scheme is still under consideration, and will evolve as the work proceeds. The structure, when developed, should serve the following purposes:
OCR for page 41
Statistical Methods for Testing and Evaluating Defense Systems: Interim Report Reflect the prevalence of various types of systems. Highlight attributes that might call for different statistical approaches, affect decision tradeoffs, or involve qualitatively different consequences. Facilitate the integration of commercial approaches by helping to align military and commercial contexts. With these general purposes in mind, one is led to think of taxonomy dimensions such as the following: Cost of system and of testing What is the cost of a test item? What is the number of items to be procured? Is the testing destructive? Role of software Is the system a software product? Does the system have significant software content? Does the system use a dedicated computer or require the development of new computer hardware? Environment of use. How stressful is the environment within which computer hardware, sensors, motors, electronics, etc. must operate? Environment of test and evaluation How close are test environments to actual-use (combat) environments? What is the relevance of simulation? To what extent are performance evaluations dependent upon indirect measurements and inference? To what extent is relevant prior knowledge available and able to be used (1) in the design of evaluation studies or (2) in drawing conclusions from test and evaluation? New versus evolutionary system Is the system a de novo development? Is it an upgrade? Is it a modification? Is it a derived design? Is it a replacement for another system? Testing consequences What are the consequences of not achieving a successful replacement? What are the consequences of achieving a replacement at a much higher cost than anticipated? What are the consequences of receiving it at a much later date than planned? What are the consequences of receiving it at a much lower level of performance than promised? A useful taxonomic structure might be developed simply by expanding on this list, adding, deleting, or elaborating as deemed useful. But addressing questions of what to put in and what to leave out raises other questions about the various uses and purposes of the taxonomic structure. Does one wish to recognize all distinctions that may be significant for characterizing:
OCR for page 41
Statistical Methods for Testing and Evaluating Defense Systems: Interim Report The nature of the weapon system? The intended combat environment(s)? Other (possible) combat environments? The intended role of the weapon system and other (possible) roles of the system? The range of possible decisions that might be appropriate, given the outcome of this operational test? The cost of repairing and supporting the system? The logistics costs of fielding the system? It quickly becomes clear that the taxonomic structure could be developed with more or less ambitious purposes in mind. The choice of purposes might well affect the number of dimensions and the necessary levels of disaggregation within each dimension. It might appear that some of these dimensions go well beyond the objective of characterizing the weapon system. But the goal of operational testing goes beyond testing, per se, to evaluation. Decisions whether to proceed to produce and field a weapon system often hinge not simply on whether the system can perform a physical function, but also on whether it can be employed to perform that function so as to provide a decisive advantage in combat and whether the range of contexts in which it could do so justifies its cost. Given issues of assessing the value of a weapon system, the taxonomic structure might include dimensions such as the following: Scenario dependence. To what extent is the value of the weapon system, or its operational performance, affected by the testing scenario? For example, does the scenario correspond to operations on the first day of the war or after air superiority has been achieved? Does it correspond to a scenario in which we have ample warning or are caught by surprise? Is it assumed that air bases are available nearby, or that operations must be adapted for primitive air strips? Roles and missions Could this weapon system perform in roles and missions different from those which are tested? Could this system provide a backup for other systems, in case they perform badly or are seriously attrited? Does the operational testing provide information (direct or indirect) regarding possible alternative uses of the system? Force flexibility Would this weapon system significantly improve the flexibility inherent in our fielded portfolio of weapon systems? Would it allow us to perform new missions, or to perform existing missions in more than one way? Would this system free up other systems for more valuable uses? It might be said that these questions go beyond the narrower issues that are normally addressed in operational testing. But to the extent that operational tests can be designed to shed light on such questions, they will provide valuable information that bears directly on the decisions operational testing is meant to inform. Discussions of force flexibility and roles and missions raise another set of considerations, relating to whether the weapon system in question provides radical new capabilities, or, alternatively, simply can do the same job a bit better than the existing fielded system. If a weapon system represents a radical advance, it is important to recognize that its value may well not be entirely appreciated at the time a
OCR for page 41
Statistical Methods for Testing and Evaluating Defense Systems: Interim Report decision is made. Thus it might be useful to address in the taxonomic structure the following questions related to tactics and doctrine: To what extent do the capabilities inherent in the system raise questions about the nature of tactical operations or even about existing doctrine? Have the potential users and testers had adequate time to develop tactics that will utilize this weapons system most effectively? Is it plausible that the system opens up opportunities for radical approaches that are not yet well understood? These questions relate to what is, or is not, revealed about the potential uses of the weapon system through operational testing. But it is also important to recognize that capabilities that are not explicit, not revealed, or not even tested could be more significant than those that are. A complete evaluation must recognize yet another dimension that relates to the characterization of weapons systems —deterrence. The taxonomic structure might address the following questions related to this dimension: To what extent could this weapon system create fear among potential adversaries about just what capabilities might be demonstrated in the midst of a conflict? To what extent does the system affect the perceptions of adversaries (and allies), as well as actual capabilities? Considerations such as these all relate to the question of whether the nature of this weapons system (and its potential uses) is now understood and the extent to which it will be better understood after the operational tests are concluded. That question leads in turn to another candidate dimension for the taxonomic structure—human factors. Questions related to this dimension include the following: To what extent does the performance of the weapon system depend on the training of those who will operate it? Of those, who will operate any “enemy systems” during the testing? To what extent may the assessed performance of the weapons system be affected by the training of those who will collect, reduce, and interpret the data collected during the operational test? Note that while the competence of the operators is always recognized as a factor that may be critical to the assessed performance of a weapon system, it is also important to recognize the extent to which test results may be influenced by those who are collecting and interpreting the data. That point leads to another significant dimension that relates simultaneously to the weapon system and the test range—instrumentation. Questions here include the following: To what extent is test range instrumentation adequate for assessing the system performance during the operational tests? To what extent might the act of instrumenting the test articles interfere with their performance? It is clear that assessing performance is more difficult with some weapon systems than with others. It is also clear that difficulties may relate to both the nature of the weapon system under test and the capabilities of the test range. Thus one is led to want to characterize not just the weapon system, but the weapon system/test range as a combined entity.
OCR for page 41
Statistical Methods for Testing and Evaluating Defense Systems: Interim Report At the same time, thinking about the weapon system and the test range as two components of one unitary problem is itself a faulty premise for a taxonomic structure of testing issues and contexts because very few weapons systems are themselves unitary. They often combine subsystems and components, with the performance of the overall system depending on the overall operation of and interaction among those elements. Thus it seems quite important to recognize dimensions such as the following: Segmentation To what extent is it possible to segment the weapon system into subsystems and components, particularly ones that can be tested independently? To what extent do systems integration problems and subsystem interactions interfere with the validity of “segmented tests?” Architecture To what extent does the design or nature of this system allow for improvements on a subsystem-by-subsystem basis? To what extent is system performance affected by the current state of development of the individual subsystems? Recognition of these dimensions raises another set of relevant considerations: while the milestone paradigm is based on notions of phases called development, production, and operations and support, it is increasingly true that development continues to proceed over the lifetime of many modern weapon systems. Yet it is also often the case that one may not understand how the current version of a weapon system “works, ” in particular, what the fault modes are of complex electronic subsystems, even after we have begun to field it. Thus the taxonomic structure might include the following dimensions: Maturation To what extent is this system matured? To what extent is its performance likely to improve markedly as it is better understood? After it has been fielded? After test and operational data have accumulated? Sources of information. To what extent will there be continued reliance on data collected through operational tests or ongoing use (with proper documentation) in understanding the performance, reliability, and nature of this weapon system? Process perfection. To what extent will the performance of this weapon system gradually improve as production, testing, repair, and support processes are perfected? Heterogeneity To what extent will differences among produced items be testable, or recognizable, before the items are used? To what extent will military commanders be able to either manipulate or hedge against apparent heterogeneity among fielded units of this weapon system? The 19 dimensions noted above are not meant to be exhaustive or definitive, only to illustrate some of the directions in which the taxonomic scheme could be extended. One important conclusion is that a set of dimensions should be developed only after there is agreement on what purposes the taxonomic structure should serve, taking into account its various potential uses. It is also apparent that one objective of a conceptual taxonomy could be to list exhaustively dominant sources of variability that are relevant to the testing, evaluation, and decision-making contexts for different types of weapons systems. Clearly that represents a very ambitious goal, but not an unthinkable
OCR for page 41
Statistical Methods for Testing and Evaluating Defense Systems: Interim Report one. The panel could define a family of taxonomies, attempt to implement only one or two relatively simple versions, but also present and define more complex versions. For some purposes, a taxonomic structure is helpful only if it entails considerable aggregation. There is always a tradeoff between the homogeneity of the cells of a taxonomy at the end of the process and the parsimony of the cell definitions. Obviously, the number of cells grows very quickly with tree depth. The question arises of how to arrive at the point of greatest utility. One might either add branches to a simple structure or prune from a complex structure. We have not yet decided how to proceed, or even whether to proceed toward developing a taxonomy. Some panel members believe a tree structure could never work, given the necessity that many of the branch definitions are cross-referencing while others are not. Instead, some suggest a list of features, that is, a checklist, consisting of features that are either present in one or another form, or absent, with no overriding structure. This would work if many of the levels of branches did not depend on the presence or absence of other characteristics. Then instead of every member of a cell receiving a different test methodology, as would happen with a usual taxonomy, one would have a collection of test features that relate to individual properties of the checklist. Clearly, the panel is in the preliminary stages of its work on this topic. We have determined that such a taxonomic structure would be difficult to produce and that its appropriate scope, depth, and nature depend on the uses one has in mind for it. The panel would find any information about previous efforts in this area of great interest. FUTURE WORK Building on the preliminary work described above, the panel will develop a taxonomic structure that provides categories of defense systems that require qualitatively different test strategies. To this end we will examine various databases for their utility in classifying recent and current systems and in helping us determine the relative sizes of various cells of the taxonomic structure.