The National Academies Press

Currently Skimming:

6 Analyzing and Reporting Test Results
Pages 87-100

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 87... ... In its review of reports documenting the analysis of operational test data, the panel found individual examples of sophisticated analyses. However, the vast majority focused on calculating means and percentages of measures of effectiveness and performance, and developing significance tests to compare aggregate means and percentages with target values derived from operational requirements or the performance of baseline systems. Read the entire page →
From page 88... ... The ability to combine information is hampered by institutional problems, by the lack of a process for archiving test data, and by the lack of standardized reporting procedures. This chapter discusses problems with current procedures for analyzing and reporting of operational test results, and recommends alternative approaches that can improve the efficiency with which decision relevant information can be extracted from operational tests. Read the entire page →
From page 89... ... " Consider the problem of testing whether a missile system meets a specified requirement for accuracy. The hypothesis selected as the "null hypothesis" might be that the system fails to meet a minimum acceptable level of performance that would justify acquisition.2 What is reported to the decision maker is whether the system passed or failed the significance test (passing the test would mean the null hypothesis was rejected) Read the entire page →
From page 90... ... The correct interpretation of a passing result, when the null hypothesis is that the performance is less than minimally acceptable, is that the results that were observed are inconsistent with the assumption that the system does not meet the minimal acceptable level. While it is clear that the question answered by a significance test is related to a problem decision makers care about (whether the system meets its requirement) Read the entire page →
From page 91... ... It is extremely important to understand the tradeoffs between decreases in both error probabilities and increases in test costs, and when this trade-off supports further testing, and when it does not. INSUFFICIENT FORMAL ANALYSIS OF INDIVIDUAL SCENARIOS AND VARIABILITY OF ESTIMATES The focus on the reporting of significance tests as summary statistics for operational test evaluation and their prominence in the decision process Reemphasizes important information about the variability of system performance across scenarios and prototypes. Read the entire page →
From page 92... ... It is our impression that scenario-to-scenario variability will dominate prototype-to-prototype variability for the vast majority of systems, but this impression would be useful to investigate to identify the kind of systems for which this is not true.4 FAILURE TO USE ALL RELEVANT INFORMATION Chapter 4 discussed the need, especially given the cost and therefore limited size of much operational testing, for making use of data from alternative sources (tests and field performance of related systems and developmental tests of the given system) Read the entire page →
From page 93... ... legal obstacles to using some of these sources of information for operational evaluation.5 Finally, even if these other data sources were available and accessible, there is a scarcity of statistical modeling skills available in the test community that would be needed to make full use of this data; this issue is discussed in more detail in Chapter 10. LIMITED ANALYSES AND STATISTICAL SKILLS There are also some limitations concerning the use of sophisticated statistical techniques to fully analyze the information provided by operational tests and alternative data sources. Read the entire page →
From page 94... ... RECOMMENDATIONS FOR IMPROVING ANALYSIS AND REPORTING Perform More Comprehensive Analysis Test evaluation should provide several types of decision relevant information in addition to point estimates for major measures of performance and effectiveness and their associated significance tests. First, a table of potential test sizes and the associated error levels for various hypothesized levels of performance for major measures should be provided to help guide decisions about the advantages of various test sizes. Read the entire page →
From page 96... ... The panel acknowledges that limits on sample sizes often constrain the ability to analyze individual scenarios. However, there exist sophisticated statistical methods that often can be used to extract useful information from limited sample sizes. Read the entire page →
From page 97... ... Adopting this view means moving away from rote application of standard significance tests and toward the use of statistics to estimate and report both what is known about a system's performance and the amount of variability or uncertainty that remains. Rather than thinking of a significance test as a comprehensive evaluation of a system's performance with respect to a measure of interest, significance testing instead should be thought of as a method for test design that is very effective in producing operational tests that provide a great deal of relevant information and for which the costs and benefits of decision making can be compared. Read the entire page →
From page 98... ... To pursue this greater use of relevant information, expert assessments of the utility of the system's developmental tests and information from related systems should be included in the operational evaluation report to justify a decision either to use or not use such information to augment operational evaluation. If the decision is made to use the additional information, the validation of assumptions of any statistical models used to combine information from operational tests with that from alternative sources should be carried out and communicated. Read the entire page →
From page 99... ... Archive Data from Developmental and Operational Tests In order to facilitate combination of information, the results of developmental and operational test and evaluation need to be archived in a useful and accessible form. This archive, described in Chapter 3, should contain a complete description of the test scenarios and methods used, which prototypes were used in each scenario, the training of the users, etc. Read the entire page →
From page 100... ... By addressing each of these deficiencies, the test evaluation reports will be much more useful to acquisition decision makers. The panel advocates that the analysis of operational test data move beyond simple summary statistics and significance tests, to provide decision makers with estimates of variability, formal analyses of results by individual scenarios, and explicit consideration of the costs, benefits, and risks of various decision alternatives. Read the entire page →

From page 87...

... In its review of reports documenting the analysis of operational test data, the panel found individual examples of sophisticated analyses. However, the vast majority focused on calculating means and percentages of measures of effectiveness and performance, and developing significance tests to compare aggregate means and percentages with target values derived from operational requirements or the performance of baseline systems.

Read the entire page →

From page 88...

... The ability to combine information is hampered by institutional problems, by the lack of a process for archiving test data, and by the lack of standardized reporting procedures. This chapter discusses problems with current procedures for analyzing and reporting of operational test results, and recommends alternative approaches that can improve the efficiency with which decision relevant information can be extracted from operational tests.

Read the entire page →

From page 89...

... " Consider the problem of testing whether a missile system meets a specified requirement for accuracy. The hypothesis selected as the "null hypothesis" might be that the system fails to meet a minimum acceptable level of performance that would justify acquisition.2 What is reported to the decision maker is whether the system passed or failed the significance test (passing the test would mean the null hypothesis was rejected)

Read the entire page →

From page 90...

... The correct interpretation of a passing result, when the null hypothesis is that the performance is less than minimally acceptable, is that the results that were observed are inconsistent with the assumption that the system does not meet the minimal acceptable level. While it is clear that the question answered by a significance test is related to a problem decision makers care about (whether the system meets its requirement)

Read the entire page →

From page 91...

... It is extremely important to understand the tradeoffs between decreases in both error probabilities and increases in test costs, and when this trade-off supports further testing, and when it does not. INSUFFICIENT FORMAL ANALYSIS OF INDIVIDUAL SCENARIOS AND VARIABILITY OF ESTIMATES The focus on the reporting of significance tests as summary statistics for operational test evaluation and their prominence in the decision process Reemphasizes important information about the variability of system performance across scenarios and prototypes.

Read the entire page →

From page 92...

... It is our impression that scenario-to-scenario variability will dominate prototype-to-prototype variability for the vast majority of systems, but this impression would be useful to investigate to identify the kind of systems for which this is not true.4 FAILURE TO USE ALL RELEVANT INFORMATION Chapter 4 discussed the need, especially given the cost and therefore limited size of much operational testing, for making use of data from alternative sources (tests and field performance of related systems and developmental tests of the given system)

Read the entire page →

From page 93...

... legal obstacles to using some of these sources of information for operational evaluation.5 Finally, even if these other data sources were available and accessible, there is a scarcity of statistical modeling skills available in the test community that would be needed to make full use of this data; this issue is discussed in more detail in Chapter 10. LIMITED ANALYSES AND STATISTICAL SKILLS There are also some limitations concerning the use of sophisticated statistical techniques to fully analyze the information provided by operational tests and alternative data sources.

Read the entire page →

From page 94...

... RECOMMENDATIONS FOR IMPROVING ANALYSIS AND REPORTING Perform More Comprehensive Analysis Test evaluation should provide several types of decision relevant information in addition to point estimates for major measures of performance and effectiveness and their associated significance tests. First, a table of potential test sizes and the associated error levels for various hypothesized levels of performance for major measures should be provided to help guide decisions about the advantages of various test sizes.

Read the entire page →

From page 96...

... The panel acknowledges that limits on sample sizes often constrain the ability to analyze individual scenarios. However, there exist sophisticated statistical methods that often can be used to extract useful information from limited sample sizes.

Read the entire page →

From page 97...

... Adopting this view means moving away from rote application of standard significance tests and toward the use of statistics to estimate and report both what is known about a system's performance and the amount of variability or uncertainty that remains. Rather than thinking of a significance test as a comprehensive evaluation of a system's performance with respect to a measure of interest, significance testing instead should be thought of as a method for test design that is very effective in producing operational tests that provide a great deal of relevant information and for which the costs and benefits of decision making can be compared.

Read the entire page →

From page 98...

... To pursue this greater use of relevant information, expert assessments of the utility of the system's developmental tests and information from related systems should be included in the operational evaluation report to justify a decision either to use or not use such information to augment operational evaluation. If the decision is made to use the additional information, the validation of assumptions of any statistical models used to combine information from operational tests with that from alternative sources should be carried out and communicated.

Read the entire page →

From page 99...

... Archive Data from Developmental and Operational Tests In order to facilitate combination of information, the results of developmental and operational test and evaluation need to be archived in a useful and accessible form. This archive, described in Chapter 3, should contain a complete description of the test scenarios and methods used, which prototypes were used in each scenario, the training of the users, etc.

Read the entire page →

From page 100...

... By addressing each of these deficiencies, the test evaluation reports will be much more useful to acquisition decision makers. The panel advocates that the analysis of operational test data move beyond simple summary statistics and significance tests, to provide decision makers with estimates of variability, formal analyses of results by individual scenarios, and explicit consideration of the costs, benefits, and risks of various decision alternatives.

Read the entire page →

← Previous Chapter Skim

Next Chapter Skim →

This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.

6 Analyzing and Reporting Test Results Pages 87-100

6 Analyzing and Reporting Test Results
Pages 87-100