6

Statistical Considerations in Body Armor Testing

During Phases I and II of the study, the committee was requested to consider the use of statistics to permit a more scientific determination of sample sizes to be used in body armor testing. Specifically, the committee was requested to review a statistically based protocol that had been developed by the Office of the Director of Operational Test and Evaluation (DOT&E) with assistance from Army statisticians and testers. The Phase II report provided initial insights on statistics-related issues. This chapter consolidates those insights and provides more detail.

In this chapter, the committee presents its findings on statistical aspects of body armor testing with a focus on body armor plate testing beginning with general discussions of (1) what it means to conduct statistically principled testing, (2) how uncertainty and variation can influence overdesign and overmanufacture, and (3) important considerations in test protocol design.37 The chapter then proceeds to describe the Army and the U.S. Special Operations Command (USSOCOM) historical test protocols and discusses the new first article testing (FAT) protocol and the proposed lot acceptance testing (LAT) of the DOT&E, including a discussion of the assumptions underlying the statistical methods and protocol design trade-offs.

INTRODUCTION

This introduction discusses the concepts of statistically principled testing, how uncertainty and variation drive overdesign, and key test protocol design requirements and considerations.

Statistically Principled Testing

The testing of body armor and helmets is destructive, meaning that the tested items are damaged as a result of the test and thus are no longer fit for use upon completion of the test. For this (and other) reasons, only a fraction of the items produced are tested. All such testing is inherently statistical since we use the information (i.e., the data and the resulting statistics) from a sample of tested items to learn something about the quality, acceptability, and/or fitness for use of the larger (untested) population of items. In statistical terminology this is referred to as

________________________

37Statistical considerations for helmet test protocols are not discussed.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION 6 Statistical Considerations in Body Armor Testing During Phases I and II of the study, the committee was requested to consider the use of statistics to permit a more scientific determination of sample sizes to be used in body armor testing. Specifically, the committee was requested to review a statistically based protocol that had been developed by the Office of the Director of Operational Test and Evaluation (DOT&E) with assistance from Army statisticians and testers. The Phase II report provided initial insights on statistics-related issues. This chapter consolidates those insights and provides more detail. In this chapter, the committee presents its findings on statistical aspects of body armor testing with a focus on body armor plate testing beginning with general discussions of (1) what it means to conduct statistically principled testing, (2) how uncertainty and variation can influence overdesign and overmanufacture, and (3) important considerations in test protocol design.37 The chapter then proceeds to describe the Army and the U.S. Special Operations Command (USSOCOM) historical test protocols and discusses the new first article testing (FAT) protocol and the proposed lot acceptance testing (LAT) of the DOT&E, including a discussion of the assumptions underlying the statistical methods and protocol design trade-offs. INTRODUCTION This introduction discusses the concepts of statistically principled testing, how uncertainty and variation drive overdesign, and key test protocol design requirements and considerations. Statistically Principled Testing The testing of body armor and helmets is destructive, meaning that the tested items are damaged as a result of the test and thus are no longer fit for use upon completion of the test. For this (and other) reasons, only a fraction of the items produced are tested. All such testing is inherently statistical since we use the information (i.e., the data and the resulting statistics) from a sample of tested items to learn something about the quality, acceptability, and/or fitness for use of the larger (untested) population of items. In statistical terminology this is referred to as 37 Statistical considerations for helmet test protocols are not discussed. -107-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION “inference,” where the goal is to infer something about the unobserved population based on the data obtained from the observed sample. Thus it is correct to say that all such tests, including the Army's original body armor test protocol, are statistically based. However, it is critical to note that not all statistically based tests are statistically principled. A “statistically principled” test uses appropriate statistical methods to properly make formal inferences about the population from the sample. Formal inference means that the desired characteristic or characteristics in the population are estimated from the sample data in such a way that uncertainty inherent in the inference from sample to population is appropriately and explicitly accounted for by the statistical methods. In the case of testing, this generally means a particular sample size is specified (as well as other sampling and estimation details) to minimize the uncertainty to some acceptable level. Thus, the use of statistically principled test procedures and test methods allow decision makers, test organizations, and manufacturers to all have confidence that the test performance of the sample appropriately characterizes the performance of the population. Uncertainty and Variation Drive Overdesign Larger and/or thicker body armor insert plates provide additional survivability but at the cost of more weight. Heavier body armor can contribute to fatigue, may inhibit mobility and effectiveness, and, at its worst, may result in personnel choosing not to wear the body armor, completely defeating its purpose (OTA, 1992). Body armor is designed to protect against a particular level of threat. To the extent that the armor exceeds this level, it can be thought of as overdesigned or overmanufactured, in the sense that lighter plates could have been produced to achieve the desired level of protection. Uncertainty and variation in the manufacture, testing, and employment of body armor, as well as the natural concern for protecting personnel, tend to result in conservative decision making, which in turn can result in body armor overdesign and/or overmanufacture. For example,  Variation in body armor manufacturing processes can drive suppliers to produce plates that are generally heavier than required to lower the risk of producing nonconforming plates.  Variation in FAT and LAT can further drive suppliers to produce heavier-than-necessary body armor to ensure their product successfully meets the FAT and LAT test standards.  Uncertainty about the particular threat that personnel may face can result in tighter specifications and/or testing to a higher possible threat and sometimes to threats beyond what personnel would actually experience in order to ensure that the threats are clearly met. -108-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION Furthermore, with statistically principled test protocols, variation in both the manufacturing and testing processes requires testing greater quantities of body armor to achieve a given level of certainty about performance. To the extent that variation in the manufacturing and testing processes is reduced, higher certainty about body armor performance can be achieved within a given testing protocol or, alternatively, fewer tests can be conducted, with attendant savings in cost and effort, to achieve an equivalent level of certainty. In sum, uncertainty and variation at each step of design, manufacture, and test are frequently accounted for with safety margins, the cumulative effect of which can be overdesign. To the extent that uncertainty and variation in manufacturing and testing are minimized, body armor with the desired level of performance could be achieved with greater certainty and perhaps lighter weight. Key Test Protocol Design Requirements The most fundamental requirements for the new protocols are that they are (1) statistically principled and (2) implemented across the Department of Defense (DoD). As previously described, having a statistically principled test protocol ensures that acceptance decisions are based on defensible, scientifically sound principles and methodology. DoD-wide implementation of the protocols ensures that all body armor in DoD meets a common, minimum standard of performance. Both of these requirements are reflected in the DoD Inspector General (IG) report (DoD, 2009) and in a DOT&E memorandum (DOT&E, 2010a). A corollary is that the standards in the proposed protocol, and any subsequent modifications to them, should be based on empirical evidence. There are two aspects to this:  Body armor procured under the Army's original (statistically based but not statistically principled) test protocols have performed well in theater. As discussed in Chapter 2, there are no known deaths attributed to failure of existing body armor against the threats for which it is designed.38 Thus, the new protocol standards should not 38 “There have been no known soldier deaths due to small arms that were attributa ble to a failure of the issued ceramic body armor.” (PEO-S, 2010). Likewise, as discussed in personal communication between James Zheng, Chief Scie ntist, Office of the Program Executive Officer, Soldier, and Larry Lehowicz, Chair, December 29, 2009, in no case has it been determined that an issued enhanced small arms protective insert (ESAPI) or enhanced side ballistic insert (ESBI) armor plate failed to prevent an armor piercing by small arms projectiles of 7.62 mm × 63 mm or less. The committee notes that the statement in PEO-S (2010) is carefully qualified. It is possible that soldiers wearing body armor may suffer casualties when their ceramic armor is defeated by rounds of caliber larger than 7.62 mm × 63 mm , when projectiles or shrapnel strike a -109-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION eliminate currently acceptable body armor designs from continued production, nor should they negatively impact the design or inappropriately incentivize changes to the design solely because of the new standards.  On the other hand, changes to the proposed new protocol, and any changes to future protocol requirements, should be based on empirical evidence and solid statistical analyses. This is not meant to suggest that expert judgment should be ignored; such judgment is often crucial for insight and understanding. However, given that the test protocol design is intended to be a scientifically defensible, statistically principled protocol, changes to the protocol should be based on the same criteria. Under these conditions, proposed changes must be based on empirical evidence, not anecdote and opinion. Finally, a design consideration is that any protocol should (1) be flexible enough to accommodate mission-specific needs (or lack thereof) and (2) as necessary, allow the standards to vary by threat. As for flexibility, and as previously described, it is critical that the protocol specifies requirements that ensure a scientifically sound, statistically principled test that achieves a minimum standard of body armor performance DoD-wide. However, there are likely benefits to a protocol that is not unnecessarily overly specific. As for the latter threat consideration, since the performance of the body armor varies by threat, it may be useful to have threat-specific standards. In particular, and perhaps more to the point, having a single common set of protocol standards for all threats could result in body armor that is overdesigned for the actual or most likely threat. portion of the body not protected by body armor, when the blast comes from improvised explosive devices (IEDs) or other explosives, and so forth. In addition, it is also possible that casualties have occurred but were not attributed to failure in the body armor, for example, when a casualty becomes separated from issued body armor and it is not be possible to track the armor back to the original casualty. According to Lt. Col. Edward L. Mazuchowski, Deputy Medical Examiner, Armed Forces Institute of Pathology, in a presentation to the committee entitled “Body Armor and Blunt Force Injury: A Medical Examiner’s Perspective,” August 11, 2010, there has been no evidence of a failure of body armor against the threats for which it is designed based on forensic analy sis of the casualties. This report must be qualified by the fact that not all casualties are returned with their body armor. However, it is not unreasonable to conclude that body armor failures (against threats for which the armor is designed) must at most be quite rare since, if such failures were more common or frequent, it is likely that at least one failure would have been observed over the years that body armor has been deployed in combat. Finally, Lt. Col. Peter Greany, USSOCOM, stated in discussion with the committee statistics working group on October 12, 2010, that there had been “zero USSOCOM fatalities due to body armor failure.” -110-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION BACKGROUND This section discusses the historical FAT protocols, the DOT&E protocol for body armor FAT, test protocol assumptions, and LAT. Historical First Article Testing Protocols FAT is used to ensure that body armor (and helmets) conform to all contract requirements for acceptance, including specific inspections and tests as well as drawings or other specifications. As described in the DoD IG report DoD Testing Requirements for Body Armor, the U.S. Army and the USSOCOM originally conduct FATs using the same measures (probability of penetration and backface deformation [BFD]) but to separate standards (DoD, 2009). Under their original protocols, both the Army and USSOCOM assess ballistic performance using penetration and BFD under various threats and environmental conditions. They both assess V50, the highest velocity of a threat at which the probability of complete penetration is 50 percent, by measuring plate penetration over a range of velocities. In addition, USSOCOM tests plate shatter gap, which occurs when a bullet penetrates body armor at a lower velocity than the body armor was designed to defeat (DoD, 2009). As described in the IG’s report, the original Army protocol for body armor testing is statistically based but not statistically principled. It is statistically based because a sample of plates is tested with the intention of inferring the properties of a larger but unobserved population of plates. However, it is not statistically principled, because small sample sizes and an ad hoc scoring rule do not support formal statistical inference of the population. In particular, for enhanced small arms protective inserts, the Army requires testing of the following:  Three plates, each against a defined matching threat in ambient conditions,  Three plates against a defined overmatching threat in ambient conditions, and  One plate for each of nine environmental conditions. In addition, the original Army protocol uses 12 plates for V50 testing, so that in total 27 plates are tested (Dunn, 2010). FAT failure occurs with (1) one or more catastrophic penetrations or BFD failures on -111-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION V0 tests39; (2) accumulation of a limited number of failure points; or (3) failure to meet minimum V50.40 When testing a plate, the first shot must be within ¾ in. to 1¼ in. or 1 in. to 1½ in. (depending on the threat) of an edge. The second shot (assuming the plate passes the first shot) is targeted either 3 in. to 6 in. or 4 in. to 5 in. (again, depending on the threat) away from the impact location of the first shot, at least 1.5 in. from any edge, and at the ballistically weakest point of the plate (RDECOM, 2009). The original USSOCOM protocol is statistically principled with sample sizes that can vary from a minimum of 146 plates tested to a maximum of 480 plates tested. At the minimum, USSOCOM requires the following:  Sixteen plates each against four defined threats, including one overmatching threat under ambient conditions, and  Six plates for each of eight environmental conditions. When testing a plate, the first shot must be within 0.75 in. to 1.25 in. of an edge and then the second shot (assuming the plate passes the first shot) is targeted 4 in. toward the center of the plate from the impact of the first shot. As shown in Figure 6-1, subsequent plates are tested by proceeding clockwise. In addition, the original USSOCOM protocol uses 6 plates for V50 testing and another 28 for shatter gap testing. Should a plate fail in any category, the USSOCOM protocol requires additional testing in that category. Successful completion of the USSOCOM FAT is based on achieving the following:  A 90 percent probability the plate will stop the first shot and not exceed BFD requirements (44 mm), with 80 percent confidence for all four defined threats,  A 90 percent probability the plate will stop the second shot with 80 percent confidence for the three matching threats, and  A 60 percent probability the plate will stop the second shot for the overmatching threat with 80 percent confidence (USSCOM, 2010). 39 V0 testing is “resistance to penetration” testing and occurs at velocities where there should be no plate penetrations. 40 LTC Jon Rickey, Product Manager, Soldier Protective Equipment, “Historical XSAPI Results APR 09 - JUN 10,” presentation to the committee’s statistics working group, October 12, 2010. -112-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION FIGURE 6-1 USSOCOM FAT shot pattern. SOURCE: P. Greany, discussion with the committee’s statistics working group, October 12, 2010. In its report, the DoD IG analyzed Army and USSOCOM protocols and, based on a first shot comparison for the overmatching threat under ambient conditions, found that “. . . the USSOCOM sampling plan provided a 27 percent better chance that defective plates are detected during first article testing. . . .” (DoD, 2009, pp. 30-31). The DoD IG attributed the 27 percent improvement in defective plate detection “primarily to the total number of plates tested” by USSOCOM (DoD, 2010). Finding. Because of their differences, and as demonstrated in the Department of Defense (DoD) Inspector General calculations, neither the historical Army protocols nor the United States Special Operations Command protocols met the key protocol design requirement as a common standard DoD-wide. In addition, the historical Army protocol did not meet the key design requirement as a statistically principled test. DOT&E Protocol for Body Armor FAT In DoD Testing Requirements for Body Armor, the IG recommended that “the Director, Operational Test and Evaluation (DOT&E) develop a test operations procedure for body armor ballistic inserts and involve the Services and USSOCOM to verify the procedure is implemented DoD-wide” (DoD, 2009, p. i). It also stated that “standardization of body armor testing and acceptance will ensure that Service members receive body armor that has been rigorously tested and will provide uniform protection in the battlefield” and proposed that “the test procedure should include, at a minimum, requirements for sample size, -113-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION shot pattern, types of testing, and acceptance criteria to verify the rigor of testing.” (Ibid., p. 32) On the same page, the report went on to say that “. . . body armor testing should provide a certain level of confidence that the manufacturing process is capable of producing an armor product that will meet the established requirements.” In response, the DOT&E issued a new protocol, “Standardization of Hard Body Armor Testing” (DOT&E, 2010a). Assessment standards for both old and new DOT&E protocols are based on two measures: the probability of no penetration [Pr(nP)] and the depth of BFD. The new standard establishes a statistically principled protocol that sets minimum requirements for first article tests, including “standard testing references, protocols, procedures, and analytical processes for hard body armor testing.” A key component of the protocol is a 60-plate design matrix that specifies the number and sizes of plates to be tested in each of nine environments and under ambient conditions and by shot order (Table 6-1). The 60-plate design matrix is replicated for each threat. The proposed standard does not specify the threats for testing. An important consideration when evaluating this design matrix is to recall that it is designed for acceptance (i.e., contractual) testing as opposed to operational testing. An acceptance test is intended to evaluate the hard body armor against contractual requirements—in this case, against a requirement for the probability of penetration and BFD under a variety of environmental conditions. In contrast, an operational test assesses performance under realistic operational conditions and, as such, might lead to different choices about the allocation of tests. For example, an operational test might allocate additional plates to ambient conditions if that was determined to be the most likely environment in which the plates would actually be used. The committee notes that the design is reasonably balanced, with every size plate appearing in each environment and an equal number of tests for the two shot orders. Based on analytical results of past test data conducted by Army statisticians, the inclusion of shot order (first shot edge/second shot crown vs. second shot crown/second shot edge) in the 60-plate design matrix is relevant since plate performance varies by shot order. The committee also notes that the current design matrix requires USSOCOM to test under one ambient condition not currently tested and to test extra small plates, which USSOCOM does not use.41 Finding. Because the protocol requires the same 60-plate protocol for all tests, some user communities are required to test for environmental conditions and plate sizes that are not necessarily relevant to those communities. 41 Lt. Col. Peter Greany, USSOCOM, discussion with the committee’s statistics working group, October 12, 2010. -114-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION TABLE 6-1 60-Plate Protocol First Shot Edge/ First Shot Crown/ Environment Second Shot Crown Second Shot Edge Ambient 1 extra small plate 1 small plate (unconditioned) 1 large plate 1 medium plate 1 extra large plate 1 extra large plate Temperature 1 medium plate 1 extra small plate cycling 1 large plate 1 small plate 1 extra large plate 1 medium plate JP-8 soak 1 extra small plate 1 medium plate 1 small plate 1 large plate 1 medium plate 1 extra large plate Oil soak 1 small plate 1 extra small plate 1 medium plate 1 small plate 1 large plate 1 extra large plate Salt water 1 extra small plate 1 extra small plate 1 medium plate 1 small plate 1 extra large plate 1 large plate Weathered 1 small plate 1 extra small plate 1 medium plate 1 large plate 1 extra large plate 1 extra large plate High temperature 1 small plate 1 extra small plate 1 large plate 1 medium plate 1 extra large plate 1 large plate Low temperature 1 extra small plate 1 small plate 1 small plate 1 medium plate 1 extra large plate 1 large plate Altitude 1 extra small plate 1 small plate 1 medium plate 1 large plate 1 large plate 1 extra large plate Total 27 27 2 extra small plates, 1 small plate, 1 medium plate, 1 Impacted a large plate, 1 extra large plate Total plates tested 60 a Shot order is not relevant for impacted plates since the first shot is taken at the most severely damaged part of the plate as identified by X-ray. In the absence of a visible crack, the first shot is taken at the crown and the second shot 5-6 in. away from the first shot impact location and not closer than 1.5 in. to an edge. SOURCE: DOT&E, 2010a. -115-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION In addition, the committee notes that 54 out of the 60 plates are tested under other than ambient conditions. To the extent that these conditions are not often experienced during operational use, the testing may skewed toward assessing plate performance under less common conditions. On the other hand, as previously discussed, this is an acceptance test and not an operational test, so if performance in these environments and under these conditions is contractually required, then testing six plates per environment is certainly appropriate. However, it is worth noting that the resulting statistical inference is to a population of plates that experiences environmental conditions in proportion to the fraction of plates tested in each condition in the design matrix. To the extent that some of the environmental conditions are stressing, this could result in a conservatively biased test, in that the resulting estimates for probability of penetration and/or BFD may be greater than that experienced under actual operational conditions. The committee understood that the choice of a 60-plate sample size resulted from the necessity to balance statistical precision against the real- world constraints of test range capacity as well as the cost and length of the tests, as is the case with all statistical test designs. Specifically, in the absence of constraints, more testing will provide better estimates of plate performance. However, test range capacity is not unconstrained, nor are budgets, and the time it takes to conduct a FAT must be reasonable so that production is not unduly delayed. In the case of body armor, it was determined that testing 60 plates per threat is executable and provides sufficient statistical precision to assess the aggregate performance of a manufacturer’s plates for that threat. A consequence of that choice (and the design of the test matrix) is that the effects of plate size, shot location, and environment can all be estimated, as can the size by location and the location by environment two-way interactions; size and environment, however, are confounded. Finding. The 60-plate protocol makes a reasonable (and necessary) trade- off between the precision of the statistical tests and real-world constraints, such as test range capacity and test costs. The assessment standards for the DOT&E protocol are based on two measures, the probability of no penetration and BFD. Table 6-2 provides an overview of the statistical basis for the proposed FAT protocol. For any threat, the following is required to successfully pass the FAT:  For the first shot, the 90 percent lower confidence bound for the probability of no complete penetration must be greater than or equal to .9. -116-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION  For the second shot, the 90 percent lower confidence bound for the probability of no complete penetration must be greater than or equal to .7.  For the first shot, with 90 percent confidence the 90 percent upper tolerance limit for the BFD must be less than 44.0 mm.  For the second shot, with 90 percent confidence the 80 percent upper tolerance limit for the BFD must be less than 44.0 mm. TABLE 6-2 Proposed FAT Standards Measure First Shot Second Shot Probability of no 90 percent lower 90 percent lower penetration [Pr(nP)] confidence bound confidence bound for Pr(nP) > .9 for Pr(nP) > .7 BFD With 90 percent With 90 percent confidence, 90 confidence, 80 percent upper percent upper tolerance limit for tolerance limit for BFD < 44 mm BFD < 44 mm SOURCE: Chris Moosman, DOT&E, “ATEC Review of FAT and LAT Procedures in Army PDs and the DOT&E Protocol for NAS Statisticians,” presentation to the committee’s statistics working group, October 12, 2010. A confidence interval is an interval that covers a population parameter, in this case the probability of no complete system penetration for the population of plates, with a stated level of confidence. As discussed in the chapter Introduction, this (or any other) population parameter cannot be observed without testing (and thus destroying) all the plates. The higher the level of confidence the more likely the interval includes the unobserved population parameter. In particular, for the DOT&E protocol, achieving a 90 percent lower confidence bound that is greater than .9 means that the probability of no penetration for the entire population of plates is very likely to be greater than .9. Furthermore, as described earlier, manufacturers will need to achieve probabilities of no penetration significantly higher than 0.9 to have a reasonable chance of successfully passing this protocol standard. In contrast to a confidence interval, a tolerance interval is an interval that contains a fixed proportion of the population with a stated confidence. In this case, the protocol specifies a tolerance interval standard for back-face deformation. For the DOT&E protocol, achieving a 90 percent upper tolerance bound that the BFD is less than 44 mm at 90 -117-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION FIGURE 6-3 Probability a lot passes LAT first shot requirements for Pr(nP) for the S-4 and S-3 inspection levels for various lot sizes and an AQL of 4 percent. -126-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION FIGURE 6-4 Probability a lot passes LAT second shot requirements for Pr(nP) for the S-4 and S-3 inspection levels for various lot sizes and an AQL of 4 percent. Passing the LAT required meeting the AQL standards as well as the BFD standards. Given that the sample sizes that will be used in the BFD lower tolerance limit follow from the AQL sample sizes derived from ANSI/ASQ Z1.4-2008, the proposed protocol has necessarily lowered the LAT lower tolerance limits from 0.9 in the FAT to 0.8, and from 0.8 in the FAT to 0.7, for first and second shots, respectively (American Society for Quality, 2008). Finding. For most lot sizes, and over the higher levels of Pr(nP), the S-4 inspection level results in a greater probability that lots will pass lot acceptance testing. -127-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION PROTOCOL DESIGN TRADE-OFFS AND COMPARISONS Just as body armor design requires making an explicit trade-off between weight and protection, test protocol design requires making trade- offs between the precision of the estimates and the number of items tested. At issue is that not every plate produced can be tested, particularly in destructive testing, where each item tested is destroyed or damaged in the testing process. Thus, the goal is to estimate the quality of the production process as accurately as possible based on a limited sample. Yet, because only a sample of plates can be tested, the resulting test conclusion is subject to error and unavoidable risk both for the DoD and the manufacturer. This section illustrates how to assess the trade-offs and Appendix I describes methods for explicitly comparing the performance of various test protocols. The committee would like to illustrate how the risks of the proposed test protocol can be understood and where the testing uncertainties that arise from using clay as a backing material impact the 60-shot protocol. Let us consider the first-shot complete penetration requirement. Table 6-4 shows how the risks (government and manufacturer) vary for various sample sizes, true probabilities of no penetration, and requirements. The “true probability of no penetration,” True Pr(nP), is the probability that a particular design is not penetrated by a particular threat— this is the unknown characteristic of the hard body armor that DoD and the Army are trying to learn from the experimentation. “Government risk” is a risk the DoD assumes; it is the probability of allowing a set of armor plates that just meets the “no penetration” requirement to pass the test. “Manufacturer risk” is the probability that a set of armor plates that meets or exceeds the no-penetration requirement will fail. These risks are a function of the sample size required in the sampling plan and the manufacturer's quality. -128-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION TABLE 6-4 Risk Comparisons for Probability of Complete Penetration Sample Allowable True Government Manufacturer Size Failures Pr(nP) Requirement Risk Risk 15 0 .98 .86 .104 .261 22 0 .98 .90 .098 .359 40 1 .98 .90 .080 .190 60 2 .98 .90 .053 .119 60 2 .99 .90 .053 .022 60 2 .92 .90 .053 .868 300 9 .98 .95 .000 .082 6,000 134 .98 .975 .000 .092 For example, the fourth line in Table 6-4 is interpreted as follows: A test requirement that the 90 percent lower confidence limit must exceed 90 percent means that a successful test of 60 plates can have no more than two failures. Under these conditions, a manufacturer's plates, each of which has a probability of passing the test (i.e., of no penetration) of .98 stand an 11.9 percent chance that at least 3 or more of the 60 plates will fail the test, so that manufacturer will fail the test. Conversely, under these test conditions, the government stands a 5.3 percent chance that a manufacturer's marginally performing plates that have a no-penetration probability of .90 will pass the test. The first three lines of the Table 6-4 demonstrate that reducing the sample size from 60 shifts the risk to the manufacturer. For a sample size of 15 it is not possible to pass the test because the sample size is too small to demonstrate a 90 percent requirement with high (90 percent) confidence. The last two lines of Table 6-4 show the sharp increases in required sample size when the requirement is increased beyond .9 and the risks are held roughly constant. Table 6-4 also shows that, for a sample size of 60, a manufacturer must produce hard body armor that has a true probability of no penetration substantially higher than .9 to have a reasonable chance of passing the test. Figure 6-5 plots the manufacturer’s risk for various Pr(nP), where it is clear that to have a reasonably high probability of passing the protocol’s first-shot, no-penetration standard, the manufacturer’s plates must have a Pr(nP) substantially higher than .9. From a soldier safety perspective, this is appropriate. -129-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION FIGURE 6-5 Plot of the manufacturer’s risk for various Pr(nP) under the DOT&E protocol. To have a reasonably high probability of passing the protocol’s first- shot no-penetration standard (out of 60 plates tested, no more than two penetrations are allowed), the manufacturer’s plates must have a high Pr(nP). Because of the issues discussed in earlier sections of this report, it is difficult to tell if the observed variation in BFD for hard body armor is attributable mainly to the variation in plates, to the variation in the test process, or to both. As a result, all observed variation is being attributed to the plates. While this is clearly incorrect, without a better understanding of the specific sources of variation, it is impossible to do otherwise. This probably results in overdesign and/or overmanufacture of the plate to ensure a high probability of passing FAT and LAT. Figure 6-6 illustrates the potential impact on manufacturers by simulating the effects of the BFD test on the probability of a manufacturer failing FAT under various conditions. In Figure 6-6 (a), the assumption is that the plates resulting from a manufacturer's process have a mean BFD of 38 mm. The solid line (100 percent variance) shows the results when all observed variation is attributed to the plates. The amount of variation is shown on the x-axis in terms of standard deviations, and the probability of failing to meet the BFD criterion is shown on the y-axis. The plot shows -130-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION that the probability of failure ranges from zero for standard deviations just above 3 to nearly one for standard deviations just less than 5. The dashed curves show the impact of attributing less of the observed variation to the plates. Notice that the percentage attributed to the plates decreases as the probability of passing the test increases. Figure 6-6 (b) shows a similar result for a mean BFD of 40 mm. FIGURE 6-6 Risk comparisons for BFD assuming in the left plot that the manufacturer’s true mean BFD is 38 mm and in the right plot is 40 mm; the associated fraction of variation is shown on the x-axis. The plots show that decreasing variability in BFD, either via more consistent manufacturing processes or as a result of more repeatable testing measures, lowers the manufacturer’s chances of failing testing (given that the manufacturer’s plates do meet standards and holding everything else constant). The plots show that decreasing variability in BFD by means of more consistent manufacturing processes or more repeatable testing measures lowers the manufacturer’s chance of failing testing (given that the manufacturer’s plates do meet standards and that all other factors are constant). At issue is the current impossibility of estimating what fraction of the variation in BFD is attributable to variation in the plates and what fraction is attributable to the testing methods. The experiments recommended in Chapter 4 should provide a much better estimate of the test process variation. As discussed in earlier sections, there are known but -131-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION not well-quantified issues that relate to variations in the thermal and stress properties of the clay medium itself, variations caused by different individuals handworking the clay as it is prepared for testing, variations in calibration, and other factors. Information on how the existing process performs will facilitate improving the process (minimizing excess variation, should it exist.) Finding. Using a statistically principled protocol enables decision makers to explicitly address the necessary and inherently unavoidable risk trade- offs that must be faced in testing. Furthermore, while additional research and coordination may be necessary to finalize the protocol design, and continuing review will likely be required as manufacturing conditions and plate designs change over time, a statistically principled protocol ensures that decision makers have sound information about body armor performance in order to ensure the quality of a critical soldier safety item. RECOMMENDATIONS The committee unequivocally supports the implementation of a statistically principled test protocol that explicitly and scientifically acknowledges and addresses the testing risks described in this report. A statistically principled test protocol is critical because it is the only way to rigorously characterize body armor performance under a variety of threat conditions and operating environments to better inform DoD decisions. Because there is variation in manufactured body armor, testing alone cannot ensure that body armor is 100 percent effective. One can, however, develop higher confidence in the effectiveness of the body armor by using a statistically principled and rigorous assessment with sufficient sample size. The committee commends DOT&E for its leadership in establishing such statistically principled protocols for body armor first article testing and lot acceptance testing. Any test protocol involves some risk that bad body armor will pass the test and good body armor will fail. In setting the standards within the protocols, the DoD has a responsibility to be explicit about these risks and to design a test protocol that balances cost, performance, ability to execute, fairness to the manufacturer, and risk to the soldier. Trade-offs can be made to result in statistically principled protocols that are both scientifically rigorous and practical in application. This conceptual approach is supported by the current DOT&E protocol. Due diligence and deliberate caution are warranted during the change from the old test protocols to the new protocols. In particular, because manufacturers have strong incentives to build armor that has a high chance of passing FAT and LAT, there is some chance that the change in test protocol could have unintended impacts on body armor design and/or performance. Given the success of the current body armor in the field, changes in testing protocols should be made with deliberate -132-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION caution to ensure that plate performance is maintained (or improved) while also ensuring that the best science is brought to bear on testing body armor. The committee commends DOT&E for its ongoing discussions with the Army, USSOCOM, and other stakeholders and its willingness to reconsider and revise the confidence bounds and tolerance interval levels of the proposed protocols as appropriate and necessary. Within these discussions the committee recommends that the following three considerations should continue to be explicitly addressed.  First, it is important to reach consensus on what constitutes a BFD failure and how such failure relates to soldier injury or death. Accordingly, Chapter 8 highlights the need for research to quantify the medical results of blunt force trauma on tissue and to use those results to underpin a scientifically based BFD standard.  Second, the current clay-based test methodology is probably introducing extra variation into the test results. In particular, as described in the Chapter 4 section “Roadmap for Improving the Testing Process,” replacing Roma Plastilina #1 with a backing material that can be calibrated at room temperature has the potential to eliminate substantial variation. Thus, Recommendation 4-1, to expedite development of a standard replacement that can be used at room temperature, is critical for improving both the testing process and the statistical assessment of body armor performance.  Third, it is important that the proposed statistically principled protocol be seen not just as another in a long line of standards but as an improvement that incorporates input from all of the stakeholders and that embodies the best science. In so doing, it is particularly important to develop broad-based support for the statistically principled protocol and to ensure that its adoption will neither undo many years of successful body armor engineering nor result in other undesirable outcomes. In terms of the DOT&E FAT and LAT protocols for body armor, the committee has four specific recommendations. Recommendation 6-1: The Office of the Director, Operational Test and Evaluation (DOT&E) should continue to conduct due diligence to carefully and completely assess the effects, large and small, of its statistical protocol as it is adopted across the body armor testing community. In particular, DOT&E should continue to  Collaborate with the Army and the United States Special Operations Command (USSOCOM) to revise the test protocol as necessary, based on the results of Army and USSOCOM “for government reference” first article testing test results and -133-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION other empirical evidence, to ensure that currently acceptable plate designs are not eliminated under the new protocol; and  Regularly assess the impact or impacts of the new protocol on plate design, particularly plate weight, to ensure the test protocol results in body armor that achieves the requisite soldier safety while not negatively, inappropriately, or inadvertently affecting plate design. Recommendation 6-2: The Office of the Director, Operational Test and Evaluation, should consider modifying the first article testing protocol to  Generalize the description of the backface deformation (BFD) upper tolerance interval calculation to allow for nonnormal BFD distributions;  Specify a confidence interval calculation methodology that has better coverage properties, such as the Agresti-Coull interval recommended by Brown et al. (2001) and described in detail in Agresti and Coull (1998); and  Specify guidelines that will accommodate deviations in environmental conditions and/or plate size from the current 60-plate design matrix. For example, DOT&E could revise the current protocol to specify that if a procurement contract does not require testing under one or more of the environmental conditions listed in the design matrix, the plates listed under that condition would then be tested under ambient conditions. Recommendation 6-3: The Office of the Director, Operational Test and Evaluation, and the Army should continue to consult and engage statisticians throughout the process of assessing and revising protocols, comparing the performance of the new and old protocols, assessing the effects of the new protocols, and considering possible changes. Testers and statisticians should continue to work together as a team (1) to quantify in a statistically rigorous manner the portion of variation in BFD attributable to the testing process and that attributable to the plates and (2) to ensure these results are appropriately reflected in an updated protocol. In particular, the statisticians involved with developing and implementing the statistically principled protocol should be involved with the experimentation recommended in Chapter 4. Over the course of the committee’s research and deliberations, the DOT&E, Army, and USSOCOM have endeavored to establish statistically principled test standards that are realistically achievable with the current body armor designs. -134-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION Recommendation 6-4: The Office of the Director, Operational Test and Evaluation, the Army, and the United States Special Operations Command should work together to arrive at an acceptable set of test standards for lot acceptance testing that is both statistically principled and realistically achievable with current body armor designs. REFERENCES Agresti, A., and B. Coull. 1998. Approximate is better than “exact” for interval estimation of binomial proportions. The American Statistician 52(2):119-126. American Society for Quality. 2008. American National Standard Sampling Procedures and Tables for Inspection by Attributes. ANSI/ASQ Z1.4-2008. Milwaukee, W.I.: American Society for Quality. Brown, L., T. Cai, A. DasGupta. 2001. Interval Estimation for Binomial Proportions. Statistical Science 16(2):101-117. DoD (U.S. Department of Defense). 1996. DoD Preferred Methods for Acceptance of Product. MIL-STD-1916. Arlington, Va.: DoD. DoD. 2009. DoD Testing Requirements for Body Armor. Report Number D- 2009-047. Arlington, Va.: DoD Inspector General. DoD. 2010. Memorandum for National Research Committee. Arlington, Va.: Department of Defense Inspector General. DOT&E (Director of Operational Test and Evaluation). 2010a. Standardization of hard body armor testing. Memorandum dated April 27, 2010. DOT&E. 2010b. Memorandum: Standard for lot acceptance ballistic testing of hard body armor. July 2, 2010. Dunn, N. 2010. ATEC proposed Army protocol and background for National Academy of Science Statisticians. Aberdeen Proving Ground, Md.: Army Testing and Evaluation Center. NIST (National Institutes of Standards and Technology). 2010. Engineering Statistics Handbook. Available online at http://www.itl.nist.gov/div898/handbook/. Last accessed March 15, 2011. NRC (National Research Council). 2010. Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S. Army. Washington, D.C.: National Academies Press. -135-

OCR for page 107
PREPUBLICATION DRAFT—SUBJECT TO EDITORIAL CORRECTION OTA (Office of Technology Assessment). 1992. Police Body Armor Standards and Testing, Volume II: Appendices. OTA-ISC-535. Washington, D.C.: Office of Technology Assessment. PEO-S (Program Executive Officer, Soldier). 2010. Interceptor body armor (IBA). Available online at Accessed January 11, 2011. RDECOM (U.S. Army Research, Development and Engineering Command). 2009. Amendment of solicitation/modification of contract W91CRB-09-D-0001/P00004. Aberdeen Proving Ground, Md.: RDECOM. USSOCOM (U.S. Special Operations Command). 2010. USSOCOM first article testing protocol. Aberdeen Proving Ground, Md.: U.S. Special Operations Command. -136-