National Academies Press: OpenBook
« Previous: 3. Combining Information in Practice
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 53
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 54
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 55
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 56
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 57
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 58
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 59
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 60
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 61
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 62
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 63
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 64
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 65
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 66
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 67
Suggested Citation:"4. Prerequisites for Combining Information." National Research Council. 2004. Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report. Washington, DC: The National Academies Press. doi: 10.17226/10871.
×
Page 68

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

4 Prerequisites for Combining Information The development and implementation of techniques for combining information, whether in design or in evaluation, are often sophis- ticated activities. What may conceptually seem to be relatively straightforward applications often require original thought, nontrivial modification of existing techniques, and software development. But the use of methods for combining information can be made easier if the appro- priate methodological and logistic frameworks are in place. This chapter discusses several key steps that should be taken to establish these frame- works: broader definitions of data so that nontest data (e.g., expert judg- ment and computer models) can be formally and correctly included in analyses; development of test data archives so that what is learned about a system continues to be of use to future evaluators; use of graphical repre- sentations of complex systems to aid in the understanding of overall reli- ability and performance; and use of formal statistical methods for informa- . . . . tlon combination. ~ . . . . ~ . His chapter also identifies the statistical capabilities required to imple- ment such strategies. There is no clear evidence that the service test agen- cies have these capabilities in place today, and so if they find the advantages presented here compelling, it will be necessary for them, with help from higher-level officials within the services, to acquire the capabilities described in this chapter. 53

54 IMPROVED OPERATIONAL TESTING AND EVALUATION NEED FOR A BROADER DEFINITION OF DATA When performing an assessment of a complex system, the most com- monly used data are test data, whether from operational, developmental, or contractor tests. Other sources of information about the system include training exercises, field use, computer models and simulation, and military and engineering judgment. Figure 4-1 is a schematic diagram of a system and its available data sources. If resources were available, it would be desirable to collect test data on every part of the system and to perform system tests under a variety of conditions. For large and complex systems, however, that is seldom pos- sible, and so the assessment often resembles that shown in Figure 4-1, where some parts are not tested at all, some have computer modeling and simula- tion data, some have historical data, others have test data, and some have multiple sources of data. The challenges in methods for combining information are to (1) repre- sent the system under test in a way that all of the stakeholders can under- stand (in Figure 4-1, a fault tree is used, one of many useful representa- tional schemes); (2) collect data (broadly defined) to assess the system and map them onto the representation; and (3) perform appropriate statistical analyses to combine the available information into estimates of the metrics of interest. All of these steps are performed in some way by ATEC's current operational evaluation; this chapter provides suggestions for additional ca- pabilities. For example, the graphical representation of Figure 4-1 could be used to facilitate understanding of the system evaluation plan data source matrix, to suggest areas where data are (or will be) missing and where data combination is possible, and to provide a structure for test planning. It is important to acknowledge and account for possible weaknesses in different kinds of data. The use of nontest data for evaluation can be con- tentious, although it is done routinely. Military, engineering, and statistical judgments are required to design test plans and interpret data; and com- puter modeling and simulation are applied to test data collected under certain scenarios to extrapolate the scope of their validity to other scenarios or to larger fighting units. Methodological contention arises when attempts are made to use military judgment or computer modeling and simulation results formally as data, instead of using them only to inform design, mod- . . . e. .lng, or interpretation. The use of expert judgments, in particular, is especially vulnerable to inappropriate application due to procedural or cognitive biases. For

55 C) o ~ . Q ~ o ~ in + C) . _ IL C' a I c`i in ~ ~ ~ C) ~ S ~ o Z + i , :. in c) c) A_ c) c) TIC .= ~ C) .' =._ ~ ~ in Q .m in o be o .; Fig H

56 IMPROVED OPERATIONAL TESTING AND EVALUATION example, suppose that test data are not available for a particular component but that engineering judgment considers the system design "unreliable" (Figure 4-11. Methods have been proposed (Meyer and Booker, 2001) to formally elicit and quantify engineering judgment for inclusion in statisti- cal calculations, and there is a growing body of literature by statisticians, decision analysts, social scientists, and cognitive psychologists, developed over the past two decades, describing methods for eliciting and using expert judgment. Using information based on expert judgment requires consider- able care, explicit documentation, and careful sensitivity analysis. With the recognition that all statistical analyses depend, to some degree, on subjec- tive judgment (Berger and Berry, 1988) comes the obligation to ensure that such judgments are made in a rational and defensible manner. It is well established that the major barrier to successful elicitation is the presence of biases inherent in the process used to evoke expert responses (see, for example, the pioneering work of Tversky, Slovic, and Kahneman, 19851. These biases are often characterized as cognitive or motivational, and attributed to a variety of sources, including: intrinsic cognitive failures, the instrument used to elicit responses, the social or institutional setting within which the expert operates, and the response mode. Cognitive biases are evident in effects such as anchoring, the tendency not to adjust from a first response even after receiving information contrary to the position; availability, the elicitation of event probabilities or other values based on what readily comes to mind; conservatism, a reluctance or inability to draw inferences agreeing with those that would be obtained using Bayes' rule; and underestimation, an understatement of the uncer- tainty of an assessment. Motivational biases include group think, whereby experts tend to slant their assessments to what they perceive to be a consen- sus; and misinterpretation, in which the method or instrument of elicitation affects the expert's responses (as when, for example, the framing of a ques- tion cues the expert to provide a preferred response). The test and evaluation environment contains strong institutional in- centives and is therefore possibly subject to equally pervasive motivation biases on the part of experts asked to provide their judgment. These experts can be specifically trained in methods to avoid or mitigate an array of cog- nitive biases, and the elicitations themselves can be structured to minimize the effects of bias. A growing literature of methods addresses these issues. In particular, Meyer and Booker (2001) and Booker and McNamara (2003) provide exemplary guides to such ameliorative methods as indirect prob-

PREREQUISITES FOR COMBINING INFORMATION 57 ability assessment, use of documented processes for elicitation, expert iden- tification, motivation and training, modes of communication, and appro- priate framing. If expert judgment is used in methods for combining infor- mation, it is extremely important that these or similar techniques be used, especially when arriving at prior distributions for critical parameters, such as failure rates. Some industrial organizations have become comfortable using such techniques, while being aware of and adjusting for potential biases, in high- profile, politically sensitive analyses. For example, General Motors Corpo- ration reports on its ability to assess technical success probabilities in Bordley (1998) and has used panels of over 40 experts to develop cumula- tive prior probability distributions for the improvement of fuel economy by using a novel powertrain concept. Computer modeling and simulation, which can be thought of as com- bining the original data with the knowledge incorporated in the model, can also provide a cost-effective way of expanding the use of the data. The appropriate use of computer modeling and simulation methods depends crucially on the trustworthiness of the models in transporting data to other scenarios. Although simulation can generate a large amount of new data, it is a serious mistake to combine these generated data directly with the origi- nal data to increase the sample size. Instead, more sophisticated statistical methods (e.g., as described by Reese et al., 2000) should be employed. NEED FOR A TEST DATA ARCHIVE Given the wide variety of data sources available when performing a system assessment, a mechanism should be developed to archive the data and make them available for current and future assessments. At present, such data are not saved in a readily accessible database along with contex- tual information. This is true even for previous development stages of a system. Once a system has been fielded, the absence of rigorous informa- tion on system performance greatly limits the effectiveness offeedback loops relating performance in the field to performance during testing, feedback that could be very useful for improving system designs, the system develop- ment process, and operational and developmental test design. A data archive of military system performance could be put to several uses that would assist in test design and system evaluation. In support of test design, data archiving can be used to:

58 IMPROVED OPERATIONAL TESTING AND EVALUATION help set the requirements for the test design and develop the opera- tional mission summary/mission profile (OMS/MP); determine the set of conditions and miniscenarios to be included in the developmental and operational tests; · identify scenarios in which the new system is expected to perform better than previous systems (e.g., by providing information on how other systems performed in similar scenarios); · similarly, identify scenarios in which the new system may perform poorly; · identify factors that have an important impact on system pertor- . mance; · understand the factor levels that stress the system weakly, moder- ately, and severely; and determine adequate sample sizes through power calculations. In support of system evaluation, data archiving can be used to provide information to support analysis of the validity of computer models and simulations used in test evaluation; support identification of appropriate statistical models for use in system evaluation; and support pooling and other forms of information combining. With the increasing development of statistical methods and models for information combining this last rea- son has become increasingly more compelling. Data archiving can also contribute to improvement of defense system assessment by providing a means to better understand the differences be- tween failure modes and failure frequencies in moving from developmental to operational testing and from operational testing to field use; understand the sources of system deficiencies identified in the field, which can then be used to guide design improvements; improve both developmental and op- erational testing and evaluation, e.g., by understanding how deficiencies identified in the field escaped detection in the developmental and opera- tional tests; and estimate system and component residual lifetimes and life cycle costs. The current lack of priority for data archiving, given the above advan- tages, suggests that the primary purpose of test data is to evaluate a system for promotion to the next stage of the milestone process of defense system development. Processes and techniques for combining data across acquisi- tion stages either for a given system or across systems are not currently envisioned or well supported. However, such data, often acquired at enor-

PREREQUISITES FOR COMBINING INFORMATION 59 mous cost (e.g., operational tests can cost many millions of dollars), could and should be stored in an accessible form that would facilitate the above uses. Averaged over all defense systems in development, the cost of such an archive would be extremely small, but its value, as has been discovered in many industrial settings, could be substantial. A test data archive would need to contain a rich set of variables to adequately represent the test environment, the system under test, and the performance of the system. Failure to initially include such a comprehen- sive set of variables should not be used as an argument for not getting started, since many of the potential benefits from such an archive could be derived from a subset of what is described here, with increasing detail added over time. In order to accurately represent system performance, including the ap- pearance of various failure modes and their associated failure frequencies, the circumstances of the test must be understood well enough that the test, training exercise, or field use can be effectively replicated, including the environment of use (e.g., weather, terrain, foliage, and time of day) and type of use (e.g., mission, intensity, and threat). This information is not easy to collect in controlled settings such as operational testing, and is con- siderably more difficult to collect in less controlled types of use, such as training exercises or field use. However, much in this direction can be ac- complished. In addition, contextual information that might be relevant for an operational test might have little relevance in the developmental test, because often only particular components are under test. While a system is under development, the system design is often under constant modification. Given the need, stated above, to be able to replicate a test event in the database, it is crucial to represent with fidelity the system that was in operation during the event so that proper inference is possible. Since modifications can and do occur during late-stage operational testing and after fielding, this is not only a concern for the developmental test. Even for systems produced at the same stage of development, knowledge of the order and location of manufacture can be useful to understanding why some prototype systems perform differently from others. In addition to storing the length of time between system failures, it is also important to identify which hardware or software component mal- functioned; the maintenance (including repair) record of the system; the time of previous failures; the number of cycles of use between failures; the degree offailure; and any other variables that indicate the stresses and strains

60 IMPROVED OPERATIONAL TESTING AND EVALUATION to which the system was subjected, such as speed and payload. It is also useful to include the environments and stresses to which individual system prototypes have been exposed historically (e.g., in transport, storage, and repeated on/off cycling), in order to support comprehensive failure mode analysis, especially if an apparent declining trend in system reliability ap- pears. This sort of information is difficult to collect in less controlled set- tings; however, in many industries sensors have been attached to systems to collect much of the information automatically. The information stored should be both quantitative and qualitative. The latter is important to include because the contextual information needed to help recreate the environment of use often includes qualitative information. To facilitate use across services, such an archive should make use of terminology common across services and, in its design and accessi- bility, should address classification issues. With respect to the structure and function of the database, it should be able to track failures over time and identify systems that, while considerably different, have similar components. These needs argue for a database in which these linkages are facilitated. An analysis of similar data archives in industry would enable the DoD to build on existing processes and tech- niques. The panel is pleased to note that there are defense databases that satisfy some of the above needs; the ATEC Distributed Data Archive and Re- trieval System and several servicowide reliability or failure reporting data- bases are leading examples. However, those that the panel has seen support only a few of the potential benefits listed above, rather than the breadth, structure, and accessibility that we envision. The marginal costs of data collection, input, and maintenance could be easily met through routine allocation of a small percentage of the devel- opment funds from every ACAT I program. The initial fixed costs for the Army might be funded by the Army Materiel Command and other related groups. Finally, as mentioned in Chapter 5 for the Future Combat System, systems developed using evolutionary acquisition provide an additional ar- gument for the establishment and use of a test (and field) data archive, since it is vital to link the performance of the system as it proceeds through the various stages of development. This test and field data archive could (1) assist in operational test design for the various stages of system develop- ment, (2) help in diagnosing sources of failure modes, and (3) assist in operational evaluation.

PREREQUISITES FOR COMBINING INFORMATION 61 Recommendation: The Department of Defense should provide the funds to establish a test data archive that will be a prerequisite for combining information for test and evaluation of future systems. REPRESENTATIONS The fault tree represented in Figure 4-1 captures logically how the parts of the system under study interact. The same can be conveyed in reliability block diagrams. These and other classes of representations can be quite useful when assessing systems as large and complex as those evaluated by ATEC. For large, complex systems with heterogeneous data sources, representations of a system have several advantages: they set out a common language that all communities can use to discuss the problem; heteroge- neous data sources can be explicitly located in the representation; and the representation provides an explicit mapping from the problem to the data to the metrics of interest. If the system and its assessment are to be put in a decision context- tor example, an overall assessment of system effectiveness and suitability supporting an acquisition decision the fault trees and block diagrams may need to be embedded in a representation that supports these broader goals and connects the disparate and heterogeneous sources of data. Within the data archive one can use representations of the test environments to under- stand and compare variables such as the environment of use (weather, ter- rain, foliage, and time of day) and type of use (e.g., mission, intensity, and threat) across multiple tests. It is important to develop a set of higher-level representations of the system under evaluation for use both within the data archive and more broadly in the system assessment. These representations, of necessity, change over time as the system and the context of the evaluation change. Standard reliability assessment methods focus on individual parts or simple groups of parts within a system. Assessing the overall reliability and performance of a complex system, however, involves understanding and integrating the reliabilities associated with the subsystems and parts, and this understand- ing and integrating are not always straightforward. Multiple and heteroge- neous data types may exist, and the wider community that owns the system may not understand all the features and relationships that can affect system reliability. One way to illustrate all of the factors that characterize and im- pinge upon system reliability is by building qualitative graphical systems

62 IMPROVED OPERATIONAL TESTING AND EVALUATION representations that can be migrated to graphical statistical models to assess reliability. For this reason, it is important that the information stored in a data archive be both quantitative and qualitative, as noted above. Most groups developing complex systems do develop compartmental- ized graphical representations of reliability. These representations may in- clude reliability block diagrams; timelines, process diagrams, or Gantt/ PERT charts dealing with mission schedule and risk; and engineering sche- matics of physical systems and subsystems. But none of these disparate representations capture all aspects or concerns of the integrated complex system. Moreover, since the system is likely under development with users, procurers, planners, managers, designers, manufacturers, testers, and evalu- ators spanning multiple organizations, geographical locations, and fields of expertise, these numerous, specialized, and compartmentalized representa- tions foil attempts for the multiple groups to meaningfully discuss (or even understand) total system reliability or performance. There are sets of methods and graphical representations that capture the full range of features and relationships that affect system reliability. For example, Leishman and McNamara (2002) employ ethnographic methods to elicit a model structure from the pertinent communities of experts in- volved in developing the system. The information on system reliability can initially be captured using "scratch nets" (Meyer and Paton, 2002) (i.e., simple diagrams that sketch out the important features of the system and its decision frame), which also allow a preliminary mapping of the key relationships between features. These scratch nets form the basis for more formalized representations called conceptual graphs (Sowa, 1984), which are a formal graphical language for representing logical relationships; they are used extensively in the artificial intelligence, information technology, and computer modeling communities. Similar to the less formal scratch nets, conceptual graphs use labeled nodes (which represent any entity, at- tribute, action, state, or event that can be described in natural language) and arcs (relationships) to map out logical relationships in a domain of knowledge. The example in Figure 4-2 is a typical use of a conceptual graph to convey the meaning of natural language propositions within the context of a complex system. Generally, representations of complex systems are used to capture higher-level concepts, but the grounding of conceptual graphs in both natural language and formal logic also allows them to be used for expert judgment elicitation (and even potentially for text mining) to build

63 1 ~ 1 o W o o o tr In in ~ g o ~ In lo ~ 2 ~ Cot ~ ~ o O A Q ~ ~ O m ~ _~ ,~\ L~',~ be - Cat o o be o Fag H

64 IMPROVED OPERATIONAL TESTING AND EVALUATION formal logic models that can then become formal mathematical and statis- tical models. From the initial scratch nets, conceptual graphs are used to create an ontology (i.e., a representation of high-level concepts and main ideas relat- ing to a particular problem domain) for the system. This ontology repre- sents the major areas of the system and its decision frame, such that any pertinent detail that needs to be added to the representation can be added hierarchically under one of the existing nodes. The ontology is also used as a boundary object (i.e., an information object that facilitates discussion and interaction between divergent communities that share common inter- ests but have different perspectives on those interests) so that the diverse stakeholders involved with the project can understand and agree on the features that must be taken into account when assessing system reliability and performance. Building on the ontology, important features and relationships from . . . . . . . . . .. the various existing representations e.g., engineering ( diagrams, tlmellne and process diagrams) are integrated in a conceptual graph (or series of graphs). One of the strengths of conceptual graphs is that they are an effec- tive common format to capture diverse concepts and relationships and thus provide an effective structure for combining information. If the concept or relationship can be described in natural language, it can be represented logically and eventually mathematically (the process works backward as well). The graphical statistical models developed in this process can be eas- ily explained to stakeholder communities because they are representations of natural language in which relationships can be understood without hav- ing to explain the underlying mathematical and statistical notation. Unlike reliability block diagrams and fault trees, conceptual graphs do not correspond directly to a particular statistical model. There must be a translation from the qualitative conceptual graph model to a quantitative model. Bayesian networks, in particular, are a flexible class of statistical graphical models that capture causal relationships Jensen, 1996) in a way that meshes well with conceptual graphs; they are considered flexible be- cause standard reliability diagrams (like block diagrams and fault trees) can easily be represented as Bayesian networks (Almond, 19951. The Bayesian network can also be used to model the conditional dependence and inde- pendence relationships important for specifying the more complex Baye- sian, hierarchical, and random effects models mentioned earlier in this re- port. (For an example of the development of representations and the subsequent use of a Bayesian network for analysis, see Appendix C.)

PREREQUISITES FOR COMBINING INFORMATION 65 Not every performance and reliability assessment requires the careful development of an integrated series of representations. However, these kinds of representations do help accomplish the goals of a system evaluation plan by making explicit the relationships among parts of the system and the analysis and by providing an explicit mapping of the evaluation to the data sources and the metrics of interest. The conceptual graph representations are flexible enough to achieve these goals within a framework that can change dynamically as the system and evaluation goals develop. COMBINING INFORMATION FOR COMPLEX SYSTEMS One of the steps required in an industrial reliability assurance program is the use of failure modes and effects analysis (FMEA) and reliability block diagrams to quantify the relationships between a system's subsystems, com- ponents, interfaces, and potential environmental effects. (As noted in the previous section, other representational methods can also be employed to create a unified picture of the system and decision space under consider- ation.) These representations result in a reliability model. For large, complex, changing systems, however, developing and quan- tifying a reliability or performance model can be an extremely challenging problem. Complex systems tend to have complex problems, which usually exhibit one or more of the following characteristics (Booker and McNamara, 20031: a poorly defined or understood system or process, such as high cycle fatigue effects on a turbine engine; a process characterized by multiple exogenous factors whose impacts are not fully understood, such as the effects on a new system of changing combat missions; an engineered system in the very early stages of design, such as a new concept design for a fuel cell; a system, process, or problem that involves experts from different disciplinary backgrounds, who work in different geographical locations, and/or whose problem-solving tools vary widely (as is the case in the work involved to ensure the reliability of a manned mission to Mars); and any new groups of experts in novel configurations brought together for its solu- tion. Any time these sorts of complexities are involved, stakeholders may have difficulties coming to a common understanding of the problem to be addressed. As discussed previously, experts are always involved in the devel- opment and justification of assumptions used for modeling and analyses. In complex systems, one approach to dealing with the difficulties of formu-

66 IMPROVED OPERATIONAL TESTING AND EVALUATION rating the model is to formalize the involvement of the experts. Briefly, the stages of expert involvement include the following: 1. Identifying the problem: What is the system under consideration? What are the primary metrics that must be evaluated? Who are the relevant stakeholders and what are their needs and expectations? 2. Operationalizing the problem: What are the operational definitions of the metrics? How will the metrics be evaluated? What are the constraints on the evaluation? 3. Developing the model: What are the core concepts that structure this problem? How are they related to one another? What classes of qualita- tive and quantitative models best fit the problem as currently structured? 4. Integrating and analyzing information: What data sources can be used to characterize this problem? Who owns those data sources? Can the postulated model answer the questions identified previously? 5. Statistical analysis: What are the appropriate techniques to combine the information from the available data sources? What are the appropriate graphical displays of information? What predictions or inferences must be made to support decisions? NEED FOR ADDITIONAL STATISTICAL CAPABILITIES The procedures, both informal and formal, for combining informa- tion comprise a broad set of techniques ranging from those that are ex- tremely easy to apply (being robust to the precise circumstances of the application and with solutions in a simple, closed form) to very sophisti- cated models that are targeted to a specific application, require great imagi- nation and technical expertise to identify and construct, and often require additional technical expertise to implement, possibly involving software development.] The more sophisticated category contains the rich collec- tion of hierarchical and random effects models that have enjoyed recent, very rapid development and that have been successfully applied to a large number of new situations. Flexible, public-use software currently exists for a rich set of models, greatly simplify- ing software development. For example, R (http://cran.r-project.org/), BUGS (Bayesian In- ference Using Gibbs Sampling), and WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/ winbugs/contents.shtml) are widely used in a variety of fields of application and are available at no cost on the Internet.

PREREQUISITES FOR COMBINING INFORMATION 67 This report has described the potential applicability of various infor- mation-combining techniques to operational test design and evaluation for defense systems. Given the great complexity of ACAT I defense systems, and the many important facets of their development and evaluation, it is likely that many of these systems will require technically complicated meth- ods to support their evaluation. It is also likely that use of these compli- cated methods will provide tangible benefits in operational evaluation. With respect to both hierarchical and random effects modeling, while there are some standard models that have been repeatedly applied and that may be useful for some defense applications, it is very likely that procedures at the leading edge of research will often be needed for high-quality operational test designs and evaluations. Operational test evaluation is carried out under fairly substantial time pressures, in circumstances where errors can have extremely serious conse- quences. Experts with demonstrated proficiency in the use of combining information methods are required to ensure their fast, correct application. Furthermore, so that the ultimate decision makers can fully understand the findings based on these techniques, careful articulation of the methods and findings, including the important contribution of sensitivity analyses of divergence from assumptions, is very important, and also argues for the involvement of individuals with a complete understanding of the methods and their strengths and weaknesses. Though there are exceptions, these sophisticated techniques are ordi- narily not fully understood and correctly applied by those with a master's degree in statistics. A doctorate in statistics or a closely related discipline is generally required. This raises the question as to how ATEC can gain access to such expertise. The 1998 NRC report (National Research Council, 1998) mentioned available expertise at the Naval Postgraduate School, the Insti- tute for Defense Analyses, RAND, and other federally funded research and development centers, as well as academia. This panel generally supports the recommendations contained in that report. One complication with the use of statisticians either on staff or, even more crucially, as consultants, is that more than statistical expertise will be required. Statisticians working on the methodology for system evaluation will need to work in close collaboration with experts in defense acquisition, military operations, and the system under test. Knowledge of physics and engineering would also be extremely useful. Given the need for collaboration and acquired expertise, having appro- priately qualified on-site staff is clearly the best option. Another option that .. . . . .

68 IMPROVED OPERATIONAL TESTING AND EVALUATION should be considered, especially in the short term, is to offer sabbaticals and other temporary arrangements to experts in industry and academia. The panel suggests that one approach, which would institutionalize the use of sabbatical arrangements but that would require cooperation of the ser- vices, would be for each service to create an Interagency Personnel Act (PIA) position for a statistical expert for test and evaluation, reporting to the head of each service's operational test agency. DOT&E could also cre- ate a similar position. These statistical experts would work both as indi- vidual resident experts in each of the service test agencies, and would also be available to work jointly on the most challenging test and evaluation issues in the DoD. These temporary positions would rotate every three years and would have sufficient salary and prestige to attract leading statis- ticians from academia and industry. In addition, ATEC should consider making available all sources and types of information for a candidate defense system to a selected group of qualified statisticians in industry and academia as a case study to under- stand the potential advantages of combining information for operational evaluation. Recommendation: ATEC should examine how to increase statistical capabilities to support future use of techniques for combining infor- mation. As a first step, ATEC should consider providing all sources of information for a candidate defense system to a group of qualified statisticians in industry anti academia as a case stutly to untlerstantl the potential advantages of combining information for operational evalua- t~on.

Next: 5. Testing Challenges and Opportunities Posed by the Future Combat System »
Improved Operational Testing and Evaluation and Methods of Combining Test Information for the Stryker Family of Vehicles and Related Army Systems: Phase II Report Get This Book
×
Buy Paperback | $67.00 Buy Ebook | $54.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The U.S. Army Test and Evaluation Command (ATEC) is responsible for the operational testing and evaluation of Army systems in development. ATEC

requested that the National Research Council form the Panel on Operational Test Design and Evaluation of the Interim Armored Vehicle (Stryker). The charge to this panel was to explore three issues concerning the IOT plans for the Stryker/SBCT. First, the panel was asked to examine the measures selected to assess the performance and effectiveness of the Stryker/SBCT in comparison both to requirements and to the baseline system. Second, the panel was asked to review the test design for the Stryker/SBCT initial operational test to see whether it is consistent with best practices. Third, the panel was asked to identify the advantages and disadvantages of techniques for combining operational test data with data from other sources and types of use. In a previous report (appended to the current report) the panel presented findings, conclusions, and recommendations pertaining to the first two issues: measures of performance and effectiveness, and test design. In the current report, the panel discusses techniques for combining information.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!