Read "Testing of Body Armor Materials: Phase III" at NAP.edu

Page 107 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

6

Statistical Considerations in Body Armor Testing

During Phases I and II of the study, the committee was requested to consider the use of statistics to permit a more scientific determination of sample sizes to be used in body armor testing. Specifically, the committee was requested to review a statistically based protocol that had been developed by the Office of the Director of Operational Test and Evaluation (DOT&E) with assistance from Army statisticians and testers. The Phase II report provided initial insights on statistics-related issues. This chapter consolidates those insights and provides more detail.

In this chapter, the committee presents its findings on statistical aspects of body armor testing with a focus on body armor plate testing beginning with general discussions of (1) what it means to conduct statistically principled testing, (2) how uncertainty and variation can influence overdesign and overmanufacture, and (3) important considerations in test protocol design.³⁷ The chapter then proceeds to describe the Army and the U.S. Special Operations Command (USSOCOM) historical test protocols and discusses the new first article testing (FAT) protocol and the proposed lot acceptance testing (LAT) of the DOT&E, including a discussion of the assumptions underlying the statistical methods and protocol design trade-offs.

INTRODUCTION

This introduction discusses the concepts of statistically principled testing, how uncertainty and variation drive overdesign, and key test protocol design requirements and considerations.

Statistically Principled Testing

The testing of body armor and helmets is destructive, meaning that the tested items are damaged as a result of the test and thus are no longer fit for use upon completion of the test. For this (and other) reasons, only a fraction of the items produced are tested. All such testing is inherently statistical since we use the information (i.e., the data and the resulting statistics) from a sample of tested items to learn something about the quality, acceptability, and/or fitness for use of the larger (untested) population of items. In statistical terminology this is referred to as

________________________

³⁷Statistical considerations for helmet test protocols are not discussed.

Page 108 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

“inference,” where the goal is to infer something about the unobserved population based on the data obtained from the observed sample. Thus it is correct to say that all such tests, including the Army’s original body armor test protocol, are statistically based. However, it is critical to note that not all statistically based tests are statistically principled.

A “statistically principled” test uses appropriate statistical methods to properly make formal inferences about the population from the sample. Formal inference means that the desired characteristic or characteristics in the population are estimated from the sample data in such a way that uncertainty inherent in the inference from sample to population is appropriately and explicitly accounted for by the statistical methods. In the case of testing, this generally means a particular sample size is specified (as well as other sampling and estimation details) to minimize the uncertainty to some acceptable level. Thus, the use of statistically principled test procedures and test methods allow decision makers, test organizations, and manufacturers to all have confidence that the test performance of the sample appropriately characterizes the performance of the population.

Uncertainty and Variation Drive Overdesign

Larger and/or thicker body armor insert plates provide additional survivability but at the cost of more weight. Heavier body armor can contribute to fatigue, may inhibit mobility and effectiveness, and, at its worst, may result in personnel choosing not to wear the body armor, completely defeating its purpose (OTA, 1992).

Body armor is designed to protect against a particular level of threat. To the extent that the armor exceeds this level, it can be thought of as overdesigned or overmanufactured, in the sense that lighter plates could have been produced to achieve the desired level of protection.

Uncertainty and variation in the manufacture, testing, and employment of body armor, as well as the natural concern for protecting personnel, tend to result in conservative decision making, which in turn can result in body armor overdesign and/or overmanufacture. For example,

• Variation in body armor manufacturing processes can drive suppliers to produce plates that are generally heavier than required to lower the risk of producing nonconforming plates.

• Variation in FAT and LAT can further drive suppliers to produce heavier-than-necessary body armor to ensure their product successfully meets the FAT and LAT test standards.

• Uncertainty about the particular threat that personnel may face can result in tighter specifications and/or testing to a higher possible threat and sometimes to threats beyond what personnel would actually experience in order to ensure that the threats are clearly met.

Page 109 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

Furthermore, with statistically principled test protocols, variation in both the manufacturing and testing processes requires testing greater quantities of body armor to achieve a given level of certainty about performance. To the extent that variation in the manufacturing and testing processes is reduced, higher certainty about body armor performance can be achieved within a given testing protocol or, alternatively, fewer tests can be conducted, with attendant savings in cost and effort, to achieve an equivalent level of certainty.

In sum, uncertainty and variation at each step of design, manufacture, and test are frequently accounted for with safety margins, the cumulative effect of which can be overdesign. To the extent that uncertainty and variation in manufacturing and testing are minimized, body armor with the desired level of performance could be achieved with greater certainty and perhaps lighter weight.

Key Test Protocol Design Requirements

The most fundamental requirements for the new protocols are that they are (1) statistically principled and (2) implemented across the Department of Defense (DoD). As previously described, having a statistically principled test protocol ensures that acceptance decisions are based on defensible, scientifically sound principles and methodology. DoD-wide implementation of the protocols ensures that all body armor in DoD meets a common, minimum standard of performance. Both of these requirements are reflected in the DoD Inspector General (IG) report (DoD, 2009) and in a DOT&E memorandum (DOT&E, 2010a).

A corollary is that the standards in the proposed protocol, and any subsequent modifications to them, should be based on empirical evidence. There are two aspects to this:

Body armor procured under the Army’s original (statistically based but not statistically principled) test protocols have performed well in theater. As discussed in Chapter 2, there are no known deaths attributed to failure of existing body armor against the threats for which it is designed.³⁸ Thus, the new protocol standards should not

________________________

³⁸“There have been no known soldier deaths due to small arms that were attributable to a failure of the issued ceramic body armor.” (PEO-S, 2010). Likewise, as discussed in personal communication between James Zheng, Chief Scientist, Office of the Program Executive Officer, Soldier, and Larry Lehowicz, Chair, December 29, 2009, in no case has it been determined that an issued enhanced small arms protective insert (ESAPI) or enhanced side ballistic insert (ESBI) armor plate failed to prevent an armor piercing by small arms projectiles of 7.62 mm × 63 mm or less.

The committee notes that the statement in PEO-S (2010) is carefully qualified. It is possible that soldiers wearing body armor may suffer casualties when their ceramic armor is defeated by rounds of caliber larger than 7.62 mm × 63 mm, when projectiles or shrapnel strike a

Page 110 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

eliminate currently acceptable body armor designs from continued production, nor should they negatively impact the design or inappropriately incentivize changes to the design solely because of the new standards.
On the other hand, changes to the proposed new protocol, and any changes to future protocol requirements, should be based on empirical evidence and solid statistical analyses. This is not meant to suggest that expert judgment should be ignored; such judgment is often crucial for insight and understanding. However, given that the test protocol design is intended to be a scientifically defensible, statistically principled protocol, changes to the protocol should be based on the same criteria. Under these conditions, proposed changes must be based on empirical evidence, not anecdote and opinion.

Finally, a design consideration is that any protocol should (1) be flexible enough to accommodate mission-specific needs (or lack thereof) and (2) as necessary, allow the standards to vary by threat. As for flexibility, and as previously described, it is critical that the protocol specifies requirements that ensure a scientifically sound, statistically principled test that achieves a minimum standard of body armor performance DoD-wide. However, there are likely benefits to a protocol that is not unnecessarily overly specific. As for the latter threat consideration, since the performance of the body armor varies by threat, it may be useful to have threat-specific standards. In particular, and perhaps more to the point, having a single common set of protocol standards for all threats could result in body armor that is overdesigned for the actual or most likely threat.

________________________

portion of the body not protected by body armor, when the blast comes from improvised explosive devices (IEDs) or other explosives, and so forth. In addition, it is also possible that casualties have occurred but were not attributed to failure in the body armor, for example, when a casualty becomes separated from issued body armor and it is not be possible to track the armor back to the original casualty.

According to Lt. Col. Edward L. Mazuchowski, Deputy Medical Examiner, Armed Forces Institute of Pathology, in a presentation to the committee entitled “Body Armor and Blunt Force Injury: A Medical Examiner’s Perspective,” August 11, 2010, there has been no evidence of a failure of body armor against the threats for which it is designed based on forensic analysis of the casualties. This report must be qualified by the fact that not all casualties are returned with their body armor. However, it is not unreasonable to conclude that body armor failures (against threats for which the armor is designed) must at most be quite rare since, if such failures were more common or frequent, it is likely that at least one failure would have been observed over the years that body armor has been deployed in combat.

Finally, Lt. Col. Peter Greany, USSOCOM, stated in discussion with the committee statistics working group on October 12, 2010, that there had been “zero USSOCOM fatalities due to body armor failure.”

Page 111 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

BACKGROUND

This section discusses the historical FAT protocols, the DOT&E protocol for body armor FAT, test protocol assumptions, and LAT.

Historical First Article Testing Protocols

FAT is used to ensure that body armor (and helmets) conform to all contract requirements for acceptance, including specific inspections and tests as well as drawings or other specifications. As described in the DoD IG report DoD Testing Requirements for Body Armor, the U.S. Army and the USSOCOM originally conduct FATs using the same measures (probability of penetration and backface deformation [BFD]) but to separate standards (DoD, 2009).

Under their original protocols, both the Army and USSOCOM assess ballistic performance using penetration and BFD under various threats and environmental conditions. They both assess V₅₀, the highest velocity of a threat at which the probability of complete penetration is 50 percent, by measuring plate penetration over a range of velocities. In addition, USSOCOM tests plate shatter gap, which occurs when a bullet penetrates body armor at a lower velocity than the body armor was designed to defeat (DoD, 2009).

As described in the IG’s report, the original Army protocol for body armor testing is statistically based but not statistically principled. It is statistically based because a sample of plates is tested with the intention of inferring the properties of a larger but unobserved population of plates. However, it is not statistically principled, because small sample sizes and an ad hoc scoring rule do not support formal statistical inference of the population. In particular, for enhanced small arms protective inserts, the Army requires testing of the following:

Three plates, each against a defined matching threat in ambient conditions,
Three plates against a defined overmatching threat in ambient conditions, and
One plate for each of nine environmental conditions.

In addition, the original Army protocol uses 12 plates for V₅₀ testing, so that in total 27 plates are tested (Dunn, 2010). FAT failure occurs with (1) one or more catastrophic penetrations or BFD failures on

Page 112 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

V₀ tests³⁹; (2) accumulation of a limited number of failure points; or (3) failure to meet minimum V₅₀.⁴⁰

When testing a plate, the first shot must be within ¾ in. to 1¼ in. or 1 in. to 1½ in. (depending on the threat) of an edge. The second shot (assuming the plate passes the first shot) is targeted either 3 in. to 6 in. or 4 in. to 5 in. (again, depending on the threat) away from the impact location of the first shot, at least 1.5 in. from any edge, and at the ballistically weakest point of the plate (RDECOM, 2009).

The original USSOCOM protocol is statistically principled with sample sizes that can vary from a minimum of 146 plates tested to a maximum of 480 plates tested. At the minimum, USSOCOM requires the following:

Sixteen plates each against four defined threats, including one overmatching threat under ambient conditions, and
Six plates for each of eight environmental conditions.

When testing a plate, the first shot must be within 0.75 in. to 1.25 in. of an edge and then the second shot (assuming the plate passes the first shot) is targeted 4 in. toward the center of the plate from the impact of the first shot. As shown in Figure 6-1, subsequent plates are tested by proceeding clockwise.

In addition, the original USSOCOM protocol uses 6 plates for V₅₀ testing and another 28 for shatter gap testing. Should a plate fail in any category, the USSOCOM protocol requires additional testing in that category. Successful completion of the USSOCOM FAT is based on achieving the following:

A 90 percent probability the plate will stop the first shot and not exceed BFD requirements (44 mm), with 80 percent confidence for all four defined threats,
A 90 percent probability the plate will stop the second shot with 80 percent confidence for the three matching threats, and
A 60 percent probability the plate will stop the second shot for the overmatching threat with 80 percent confidence (USSCOM, 2010).

________________________

³⁹V₀ testing is “resistance to penetration” testing and occurs at velocities where there should be no plate penetrations.

⁴⁰LTC Jon Rickey, Product Manager, Soldier Protective Equipment, “Historical XSAPI Results APR 09 - JUN 10,” presentation to the committee’s statistics working group, October 12, 2010.

Page 113 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

FIGURE 6-1 USSOCOM FAT shot pattern. SOURCE: P. Greany, discussion with the committee’s statistics working group, October 12, 2010.

In its report, the DoD IG analyzed Army and USSOCOM protocols and, based on a first shot comparison for the overmatching threat under ambient conditions, found that “… the USSOCOM sampling plan provided a 27 percent better chance that defective plates are detected during first article testing….” (DoD, 2009, pp. 30-31). The DoD IG attributed the 27 percent improvement in defective plate detection “primarily to the total number of plates tested” by USSOCOM (DoD, 2010).

Finding. Because of their differences, and as demonstrated in the Department of Defense (DoD) Inspector General calculations, neither the historical Army protocols nor the United States Special Operations Command protocols met the key protocol design requirement as a common standard DoD-wide. In addition, the historical Army protocol did not meet the key design requirement as a statistically principled test.

DOT&E Protocol for Body Armor FAT

In DoD Testing Requirements for Body Armor, the IG recommended that “the Director, Operational Test and Evaluation (DOT&E) develop a test operations procedure for body armor ballistic inserts and involve the Services and USSOCOM to verify the procedure is implemented DoD-wide” (DoD, 2009, p. i). It also stated that “standardization of body armor testing and acceptance will ensure that Service members receive body armor that has been rigorously tested and will provide uniform protection in the battlefield” and proposed that “the test procedure should include, at a minimum, requirements for sample size,

Page 114 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

shot pattern, types of testing, and acceptance criteria to verify the rigor of testing.” (Ibid., p. 32)

On the same page, the report went on to say that “… body armor testing should provide a certain level of confidence that the manufacturing process is capable of producing an armor product that will meet the established requirements.” In response, the DOT&E issued a new protocol, “Standardization of Hard Body Armor Testing” (DOT&E, 2010a). Assessment standards for both old and new DOT&E protocols are based on two measures: the probability of no penetration [Pr(nP)] and the depth of BFD.

The new standard establishes a statistically principled protocol that sets minimum requirements for first article tests, including “standard testing references, protocols, procedures, and analytical processes for hard body armor testing.” A key component of the protocol is a 60-plate design matrix that specifies the number and sizes of plates to be tested in each of nine environments and under ambient conditions and by shot order (Table 6-1). The 60-plate design matrix is replicated for each threat. The proposed standard does not specify the threats for testing.

An important consideration when evaluating this design matrix is to recall that it is designed for acceptance (i.e., contractual) testing as opposed to operational testing. An acceptance test is intended to evaluate the hard body armor against contractual requirements—in this case, against a requirement for the probability of penetration and BFD under a variety of environmental conditions. In contrast, an operational test assesses performance under realistic operational conditions and, as such, might lead to different choices about the allocation of tests. For example, an operational test might allocate additional plates to ambient conditions if that was determined to be the most likely environment in which the plates would actually be used.

The committee notes that the design is reasonably balanced, with every size plate appearing in each environment and an equal number of tests for the two shot orders. Based on analytical results of past test data conducted by Army statisticians, the inclusion of shot order (first shot edge/second shot crown vs. second shot crown/second shot edge) in the 60-plate design matrix is relevant since plate performance varies by shot order. The committee also notes that the current design matrix requires USSOCOM to test under one ambient condition not currently tested and to test extra small plates, which USSOCOM does not use.⁴¹

Finding. Because the protocol requires the same 60-plate protocol for all tests, some user communities are required to test for environmental conditions and plate sizes that are not necessarily relevant to those communities.

________________________

⁴¹Lt. Col. Peter Greany, USSOCOM, discussion with the committee’s statistics working group, October 12, 2010.

Page 115 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

TABLE 6-1 60-Plate Protocol


Environment	First Shot Edge/ Second Shot Crown	First Shot Crown/ Second Shot Edge

Ambient (unconditioned)	1 extra small plate 1 large plate 1 extra large plate	1 small plate 1 medium plate 1 extra large plate

Temperature cycling	1 medium plate 1 large plate 1 extra large plate	1 extra small plate 1 small plate 1 medium plate

JP-8 soak	1 extra small plate 1 small plate 1 medium plate	1 medium plate 1 large plate 1 extra large plate

Oil soak	1 small plate 1 medium plate 1 large plate	1 extra small plate 1 small plate 1 extra large plate

Salt water	1 extra small plate 1 medium plate 1 extra large plate	1 extra small plate 1 small plate 1 large plate

Weathered	1 small plate 1 medium plate 1 extra large plate	1 extra small plate 1 large plate 1 extra large plate

High temperature	1 small plate 1 large plate 1 extra large plate	1 extra small plate 1 medium plate 1 large plate

Low temperature	1 extra small plate 1 small plate 1 extra large plate	1 small plate 1 medium plate 1 large plate

Altitude	1 extra small plate 1 medium plate 1 large plate	1 small plate 1 large plate 1 extra large plate

Total	27	27

Impacted^a	2 extra small plates, 1 small plate, 1 medium plate, 1 large plate, 1 extra large plate

Total plates tested	60

^a Shot order is not relevant for impacted plates since the first shot is taken at the most severely damaged part of the plate as identified by X-ray. In the absence of a visible crack, the first shot is taken at the crown and the second shot 5-6 in. away from the first shot impact location and not closer than 1.5 in. to an edge.

SOURCE: DOT&E, 2010a.

Page 116 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

In addition, the committee notes that 54 out of the 60 plates are tested under other than ambient conditions. To the extent that these conditions are not often experienced during operational use, the testing may skewed toward assessing plate performance under less common conditions. On the other hand, as previously discussed, this is an acceptance test and not an operational test, so if performance in these environments and under these conditions is contractually required, then testing six plates per environment is certainly appropriate. However, it is worth noting that the resulting statistical inference is to a population of plates that experiences environmental conditions in proportion to the fraction of plates tested in each condition in the design matrix. To the extent that some of the environmental conditions are stressing, this could result in a conservatively biased test, in that the resulting estimates for probability of penetration and/or BFD may be greater than that experienced under actual operational conditions.

The committee understood that the choice of a 60-plate sample size resulted from the necessity to balance statistical precision against the real-world constraints of test range capacity as well as the cost and length of the tests, as is the case with all statistical test designs. Specifically, in the absence of constraints, more testing will provide better estimates of plate performance. However, test range capacity is not unconstrained, nor are budgets, and the time it takes to conduct a FAT must be reasonable so that production is not unduly delayed. In the case of body armor, it was determined that testing 60 plates per threat is executable and provides sufficient statistical precision to assess the aggregate performance of a manufacturer’s plates for that threat. A consequence of that choice (and the design of the test matrix) is that the effects of plate size, shot location, and environment can all be estimated, as can the size by location and the location by environment two-way interactions; size and environment, however, are confounded.

Finding. The 60-plate protocol makes a reasonable (and necessary) trade-off between the precision of the statistical tests and real-world constraints, such as test range capacity and test costs.

The assessment standards for the DOT&E protocol are based on two measures, the probability of no penetration and BFD. Table 6-2 provides an overview of the statistical basis for the proposed FAT protocol. For any threat, the following is required to successfully pass the FAT:

For the first shot, the 90 percent lower confidence bound for the probability of no complete penetration must be greater than or equal to .9.

Page 117 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

For the second shot, the 90 percent lower confidence bound for the probability of no complete penetration must be greater than or equal to .7.
For the first shot, with 90 percent confidence the 90 percent upper tolerance limit for the BFD must be less than 44.0 mm.
For the second shot, with 90 percent confidence the 80 percent upper tolerance limit for the BFD must be less than 44.0 mm.

TABLE 6-2 Proposed FAT Standards


Measure	First Shot	Second Shot

Probability of no penetration [Pr(nP)]	90 percent lower confidence bound for Pr(nP) .9	90 percent lower confidence bound for Pr(nP) .7

BFD	With 90 percent confidence, 90 percent upper tolerance limit for BFD 44 mm	With 90 percent confidence, 80 percent upper tolerance limit for BFD 44 mm

SOURCE: Chris Moosman, DOT&E, “ATEC Review of FAT and LAT Procedures in Army PDs and the DOT&E Protocol for NAS Statisticians,” presentation to the committee’s statistics working group, October 12, 2010.

A confidence interval is an interval that covers a population parameter, in this case the probability of no complete system penetration for the population of plates, with a stated level of confidence. As discussed in the chapter Introduction, this (or any other) population parameter cannot be observed without testing (and thus destroying) all the plates. The higher the level of confidence the more likely the interval includes the unobserved population parameter. In particular, for the DOT&E protocol, achieving a 90 percent lower confidence bound that is greater than .9 means that the probability of no penetration for the entire population of plates is very likely to be greater than .9. Furthermore, as described earlier, manufacturers will need to achieve probabilities of no penetration significantly higher than 0.9 to have a reasonable chance of successfully passing this protocol standard.

In contrast to a confidence interval, a tolerance interval is an interval that contains a fixed proportion of the population with a stated confidence. In this case, the protocol specifies a tolerance interval standard for back-face deformation. For the DOT&E protocol, achieving a 90 percent upper tolerance bound that the BFD is less than 44 mm at 90

Page 118 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

percent confidence means that 90 percent of the entire population of plates is very likely to have BFDs of less than 44 mm. Note that this does not mean specifically that BFDs of greater than 44 mm cannot occur, and, in fact, it is possible to observe BFDs greater than 44 mm in some of the tested plates and still successfully pass the FAT. (For a more detailed discussion of the assumptions underlying the standard confidence and tolerance interval calculations, see the next section.)

The committee notes that the original draft DOT&E protocol specified a 90 percent lower confidence bound for the probability of complete system penetration greater than 0.8 for the second shot, and this information was subsequently reflected in the committee’s Phase II letter report (NRC, 2010). However, as a result of DOT&E discussions with the Army and USSOCOM, the standard was subsequently modified to a 90 percent lower confidence bound for the probability that no complete system penetration is greater than 0.7 in the protocol promulgated by DOT&E.⁴²

Finding. During the course of the committee’s research and deliberations, the Office of the Director, Operational Test and Evaluation, the Army, and the United States Special Operations Command have endeavored to establish statistically principled test standards that are realistically achieveable with the current body armor designs. The committee found these collaborative efforts to be commendable.

The combination of 60 plates tested per threat, combined with the requirement that the 90 percent lower confidence bound for the probability of no penetration be greater than or equal to 0.9 on the first shot, means that out of the 60 plates tested, two can fail (i.e., have a complete penetration) and the manufacturer will still pass the FAT. While some have stated that there are no existing body armor protocols that allow a penetration on the first shot, in fact USSOCOM’s historical protocol allows one or more plates to be penetrated (depending on the number of plates tested) on the first shot.⁴³ Furthermore, while previous Army test protocols with smaller sample sizes permitted no first shot penetrations, it does not follow that there will be no first shot penetrations for the entire population of plates eventually procured. The only way to positively ensure that the population of plates will have no penetrations is to test every plate, a physical impossibility with destructive testing.

That said, the committee recognizes that there may be a perception issue with a test protocol for which one or two penetrations can still result in passing the FAT. There are zero-failure protocol alternatives: For example, with a total sample size of 22 plates a standard of zero failures

________________________

⁴²DOT&E (Director, Operational Test and Evaluation). 2010a. Standardization of Hard Body Armor Testing. Memorandum dated April 27, 2010.

⁴³Lt. Col. Peter Greany, discussion with the committee’s statistics working group, October 12, 2010.

Page 119 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

(penetrations) results in a 90 percent lower confidence bound greater than 0.9. However, limiting the total sample size to 22 plates results in much more limited information about plate performance, particularly in terms of environmental testing, which would be reduced to testing only two plates per condition. Furthermore, a 22-plate zero failure protocol would increase the risk that a manufacturer might fail the FAT even though it manufactures plates that meet the performance standards. On the other hand, maintaining the 60-plate protocol but not allowing any first shot penetrations would substantially increase the risk that manufacturers could fail the FAT even with acceptable plates that have a very high probability of no penetration. (See section on protocol design trade-offs for a discussion about how protocol design affects the risks the government and the manufacturer face during testing.)

One solution to this dilemma discussed during committee briefings is to maintain the 60-plate protocol but not allow any perforations under ambient conditions. Instead, the two allowable penetrations can occur only under the environmental conditions, some of which, like the impacted condition, are inherently stressing on the plates, and then if two failures occur they cannot occur under the same environmental condition. That is, the FAT is failed if (1) on the first shot one or more penetrations occur under ambient conditions, (2) two or more penetrations occur for the same environmental condition, or (3) three or more first shot penetrations occur for any of the 60 plates tested.

The standard also establishes fair-hit/no-hit criteria, where data from any shot with a velocity that is too high are excluded from analysis regardless of outcome, while data from shots with velocities that are too low are included only if they completely penetrate (both plate and system) or have a BFD of greater than 44.0 mm. This biases the test results toward soldier safety, as would be expected, but it may also bias toward overdesign of the hard armor. This trade-off should be explicitly recognized. On the other hand, the DOT&E protocol specifies a shot pattern similar to the Army’s historical test protocol in the sense that the first and second shots on a plate must be 5-6 in. apart. To the extent that shots closer together are more stressing on the plates, this protocol may be less stressing than the USSOCOMs historical protocol that required the second shot to be 4 in. away from the first shot.

Finding. The new Office of the Director, Operational Test and Evaluation protocol meets both key protocol design requirements; it is statistically principled and it provides a minimum Department of Defense-wide body armor test standard.

Test Protocol Assumptions

The DOT&E protocol states that “the DoD BFD requirement is a BFD (based on the calculated upper tolerance limit for the data set) that

Page 120 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

does not exceed 44.0 mm” (DOT&E, 2010a). In general, a “tolerance interval” is a statistical interval, calculated from the data, in which a particular proportion of the population will be contained with a specified level of confidence. The end points of a tolerance interval are called “tolerance limits.”

As described in the NIST Engineering Statistics Handbook (2010), three types of questions can be addressed by tolerance intervals:

What interval will contain p percent of the population measurements?
What interval guarantees that p percent of population measurements will not fall below a lower limit?
What interval guarantees that p percent of population measurements will not exceed an upper limit? (NIST, 2010, Section 7.2.6.3)

Question 1 leads to a two-sided interval; questions 2 and 3 lead to one-sided intervals, called “tolerance bounds.” For body armor testing, the relevant question is 3, which requires the calculation of an upper tolerance bound.

The correct calculation of any tolerance interval, bound, or limit depends on the underlying distribution of measurements in the population. For this reason, no single formula can be applied in all situations. The most common formula assumes the underlying population is normal; formulas have been derived, however, for many other underlying distributions (see Appendix H).

The DOT&E protocol states that an upper tolerance interval will be calculated for BFD “as a continuous normal random variable” (DOT&E, 2010a, p. 8). In so doing, the protocol is explicitly assuming that BFD is normally distributed. However, if the BFD distribution is not normal, then the resulting tolerance intervals based on this assumption will not contain the intended p percent of the population. While many of the empirical BFD distributions observed by the committee were bell shaped, that does not necessarily mean that the BFD distribution is normal. Further, some of empirical BFD distributions for certain threats and vendors or threats and designs looked either skewed or had one or more truncated tails, suggesting the BFD distribution in some cases is not normal.⁴⁴ The violation of the normality assumption will affect the appropriate or inappropriate acceptance or rejection of body armor differently depending on the underlying population distribution. Appendix H provides additional information on tolerance-bound calculations.

Finding. The distribution of backface deformation populations has not been proven to be normally distributed for all combinations of vendor, threat, and design; therefore, the tolerance-bound calculation specified by the protocol may not be appropriate in all cases.

________________________

⁴⁴Chris Moosman, DOT&E, “ATEC Review of FAT and LAT Procedures in Army PDs and the DOT&E Protocol for NAS Statisticians,” presentation to the committee’s statistics working group, October 12, 2010.

Page 121 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

The protocol specifies the requirement for probability of penetration in terms of a lower confidence limit for the probability of no penetration. A confidence interval contains a population parameter (here, the proportion of plates not penetrated) with the stated confidence level. This means that if repeated samples of the same size are taken, the confidence interval would contain the parameter the specified proportion of the time. For example, a 50 percent confidence interval contains the population parameter, on average, for 50 percent of the samples taken. Since any test considers only a sample of the plates in the population, even requiring zero failures during the test cannot guarantee (and does not mean) that there will be no failures in the larger population of plates.

The DOT&E protocol specifies that the lower confidence limit “is calculated using the Clopper-Pearson method,” which is based on the cumulative probabilities of the binomial distribution (DOT&E, 2010a). Because of the discreteness of the binomial distribution, the Clopper-Pearson method results in a conservative estimate in the sense that it is guaranteed to achieve at least the specified confidence level and may exceed it. As Agresti and Coull (1998, p. 119) state, “For any fixed parameter value, the actual coverage probability can be much larger than the nominal confidence level unless n is quite large.” (See Brown et al. (2001) and Agresti and Coull (1998) for additional discussion.)

Figure 6-2 shows how the Clopper-Pearson confidence lower bound behaves in terms of coverage behavior for n = 60 and 0.8 Pr(nP) 0.999 . In particular, it shows that interval coverage, while bounded below by the specified level 1 - α = .9, oscillates dramatically and can often be substantially greater than .9.

Because the specified confidence level for the Clopper-Pearson confidence interval is conservative, in the sense that the resulting interval achieves a level of confidence of at least .90, the DOT&E protocol can achieve confidence levels well above the specified 90 percent. However, since the actual confidence level changes quite dramatically for small changes in Pr(nP)—at a level of precision that is inestimable with the test sample sizes—it will be impossible to determine the achieved confidence level for any particular test. For example, at Pr(nP) = .913, the actual confidence level is 0.903, while at Pr(nP) = .914, the actual confidence level is .97. As Figure 6-2 shows (and as do similar figures in Brown et al., 2001, and Agresti and Coull, 1998), this type of dramatic change occurs frequently.

Page 122 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

FIGURE 6-2 Plot of the actual coverage level achieved by a lower confidence bound calculated according to the Clopper-Pearson method for n = 60 and various Pr(nP). The horizontal line at 0.90 is the confidence level (1 - α) specified in the DOT&E FAT protocol.

Finding. Use of the Clopper-Pearson method for calculating the lower confidence limit is conservative, resulting in actual confidence levels that are at least as great as, and often greater than, the confidence level specified in the standard. The actual confidence level is a function of the Pr(nP) of the plates, it varies substantially depending on the particular Pr(nP), and it can be quite different for small changes in Pr(nP).

The protocol was designed (and analyses are performed) assuming that the data are independent and identically distributed. Given the current test procedures, this assumption is likely not met. In particular, biases may be introduced by the test procedures for the clay box. For example, the

Page 123 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

committee strongly suspects that if it tested a group of plates all on boxes that were 10 minutes out of the ovens and then could repeat the experiment with exactly the same plates and clay boxes except that the boxes were 40 minutes out of the oven, it would see systematically different BFD results. At present, the committee does not know how much of an issue this causes with the analyses, but it is another reason to develop clay that can be used at ambient temperature in the test.

From a statistical perspective, there are many things that can be traded off in a protocol, including sample size, confidence level, and requirements. Essentially the trade-off is in terms of risk to the DoD (of purchasing a plate design that does not meet requirements) or to the manufacturer (of having a plate design that does meet requirements fail the FAT.) A larger sample size provides more information to characterize a population and will generally lead to narrower confidence and tolerance intervals. More samples also allow for testing more combinations of factors and conditions. However, larger sample sizes usually come with higher costs. Lower confidence levels also generally lead to narrower confidence and tolerance intervals, but at the cost of less confidence that the interval contains the quantity of interest. To have a statistically principled protocol, it is critical that a high confidence level is maintained. Of course changing requirements directly impacts DoD and manufacturer risk.

Lot Acceptance Testing

Once a manufacturer has passed FAT and begins production, LAT is used to ensure that body armor continues to conform to contract requirements. Owing to the critical nature of safety when it comes to body armor, continued LAT testing is both desirable and necessary, but the committee also recognizes that modern quality control calls for manufacturing processes to be improved to eliminate as much variation as possible. As described in MIL-STD-1916, “sampling inspection alone does not control or improve quality” (DoD, 1996, p. 8). Elimination of variation can provide a number of benefits, including a more consistently performing product as well as a reduction of risks to both the manufacturer and the DoD. In addition, to the extent that such reductions in variation lead to more predictability in plate performance and testing outcomes, these reductions might lead to innovations in plate design that allow reductions in plate weight while maintaining ballistic protection.

Table 6-3 provides an overview of the statistical basis for a proposed LAT protocol. Currently, results from the protocol are being used only for government reference. The protocol is being used to provide data for evaluation of the protocol’s effectiveness as a standard for lot acceptance, and, based on this evaluation, DOT&E will revise the protocol as necessary before promulgating a final, mandatory standard for use in future contracts (DOT&E, 2010b).

Page 124 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

TABLE 6-3 Proposed LAT Standards


	Measure	First Shot	Second Shot


	Pr(nP)	4 percent AQL	15 percent AQL

BFD		With 90 percent confidence, 80 percent upper tolerance limit for BFD 44 mm	With 90 percent confidence, 70 percent upper tolerance limit for BFD 44 mm

SOURCE: DOT&E, 2010b.

The proposed LAT protocol is similar in many respects to the FAT protocol, including range setup, the use of clay as a backing material and its calibration, fair-hit/no-test criteria, and the definitions of complete and partial penetrations. However, there are some important differences. Most notably, LAT sample sizes are smaller than FAT sample sizes, and they vary by size of the lot. For a normal inspection schedule, the protocol requires at one extreme a sample size of 8 plates for a lot of between 91 and 150 plates and at the other extreme a sample size of 32 plates for a lot of between 1,201 and 3,200 plates. ⁴⁵ Product managers have the option to implement switching procedures, and the requisite sample sizes are listed in Tables 4 thru 6 of the proposed protocol. Other differences include these:

Because all plates are tested under ambient conditions, neither Table 6-1 nor any other such design matrix applies to the LAT.
As shown in Table 6-3, while the BFD standard is the same in LAT as in FAT, the probability of the no penetration lower confidence bound FAT metric has been replaced with an “acceptable quality level” metric (sometimes abbreviated AQL)⁴⁶ in LAT.

________________________

⁴⁵Sample sizes are based on special inspection level S-4 of ANSI/ASQ Z1.4-2008 (American Society for Quality, 2008). Tables 3-6 in the proposed protocol are derived directly from Tables I, II-A, II-B, and II-C of ANSI/ASQ Z1.4-2008.

⁴⁶ANSI/ASQ Z1.4-2008 defines AQL as the “Acceptance Quality Limit.” It explicitly states that “the use of the abbreviation AQL to mean Acceptable Quality Level is no longer recommended” (American Society for Quality, 2008, p. 8).

Page 125 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

According to ANSI/ASQ Z1.4-2008, “The AQL is the quality level that is the worst tolerable process average when a continuing series of lots is submitted for acceptance sampling” (American Society for Quality, 2008, p. 2). It goes on to say, “The purpose of this standard is, through the economic and psychological pressure of lot non-acceptance, to induce a supplier to maintain a process average at least as good as the specified AQL while at the same time providing an upper limit on the consideration of the consumer’s risk of accepting occasional poor lots. The standard is not intended as a procedure for estimating lot quality or for segregating lots” (American Society for Quality, 2008, p. 3).

In the course of the committee’s deliberations, some have suggested that special inspection level S-3 would be preferable to inspection level S-4. However, the committee notes that this could lead to an undesirable lot rejection rate in some situations. As the ANSI/ASQ standard states, “The sampling plans in this standard are so arranged that the probability of lot acceptance at the designated AQL depends upon sample size, being generally higher for large samples than for small samples for a given AQL” (American Society for Quality, 2008, p. 2).

Figure 6-3 plots the probability that a lot of body armor passes LAT first shot requirements for the S-3 and S-4 inspection levels for various lot sizes and an AQL of 4 percent. Figure 6-3 shows that the S-4 inspection scheme does in general result in a higher probability that a lot passes LAT first shot requirements for .9 Pr(nP) 1.0. The exception is lot sizes between 151 and 500. However, it also shows for the S-4 inspection scheme that lots with Pr(nP) ~ .98 have a very high chance of passing LAT regardless of lot size: It is at or above 99 percent with the exception that for lot sizes between 151 and 500 the probability that a lot passes is 97.3 percent. In comparison, for the S-3 inspection scheme with Pr(nP) = .98 the probability of passing LAT can be as low 90 percent for lot sizes between 91 and 150. For lot sizes between 151 and 500, the probability of passing is greatest at 99 percent and for the other two it is 97.3 percent.

Figure 6-4 shows that the inspection level S-4 gives a higher probability a lot passes LAT on the second shot for all sample sizes for Pr(nP) 0.827. In addition, Figures 6-3 and 6-4 show that in general larger lot sizes tend to have higher probabilities of passing for equivalent Pr(nP) values.

Page 126 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

FIGURE 6-3 Probability a lot passes LAT first shot requirements for Pr(nP) for the S-4 and S-3 inspection levels for various lot sizes and an AQL of 4 percent.

Page 127 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

FIGURE 6-4 Probability a lot passes LAT second shot requirements for Pr(nP) for the S-4 and S-3 inspection levels for various lot sizes and an AQL of 4 percent.

Passing the LAT required meeting the AQL standards as well as the BFD standards. Given that the sample sizes that will be used in the BFD lower tolerance limit follow from the AQL sample sizes derived from ANSI/ASQ Z1.4-2008, the proposed protocol has necessarily lowered the LAT lower tolerance limits from 0.9 in the FAT to 0.8, and from 0.8 in the FAT to 0.7, for first and second shots, respectively (American Society for Quality, 2008).

Finding. For most lot sizes, and over the higher levels of Pr(nP), the S-4 inspection level results in a greater probability that lots will pass lot acceptance testing.

Page 128 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

PROTOCOL DESIGN TRADE-OFFS AND COMPARISONS

Just as body armor design requires making an explicit trade-off between weight and protection, test protocol design requires making trade-offs between the precision of the estimates and the number of items tested. At issue is that not every plate produced can be tested, particularly in destructive testing, where each item tested is destroyed or damaged in the testing process. Thus, the goal is to estimate the quality of the production process as accurately as possible based on a limited sample. Yet, because only a sample of plates can be tested, the resulting test conclusion is subject to error and unavoidable risk both for the DoD and the manufacturer. This section illustrates how to assess the trade-offs and Appendix I describes methods for explicitly comparing the performance of various test protocols.

The committee would like to illustrate how the risks of the proposed test protocol can be understood and where the testing uncertainties that arise from using clay as a backing material impact the 60-shot protocol. Let us consider the first-shot complete penetration requirement.

Table 6-4 shows how the risks (government and manufacturer) vary for various sample sizes, true probabilities of no penetration, and requirements. The “true probability of no penetration,” True Pr(nP), is the probability that a particular design is not penetrated by a particular threat— this is the unknown characteristic of the hard body armor that DoD and the Army are trying to learn from the experimentation. “Government risk” is a risk the DoD assumes; it is the probability of allowing a set of armor plates that just meets the “no penetration” requirement to pass the test. “Manufacturer risk” is the probability that a set of armor plates that meets or exceeds the no-penetration requirement will fail. These risks are a function of the sample size required in the sampling plan and the manufacturer’s quality.

Page 129 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

TABLE 6-4 Risk Comparisons for Probability of Complete Penetration


Sample Size	Allowable Failures	True Pr(nP)	Requirement	Government Risk	Manufacturer Risk

15	0	.98	.86	.104	.261
22	0	.98	.90	.098	.359
40	1	.98	.90	.080	.190
60	2	.98	.90	.053	.119
60	2	.99	.90	.053	.022
60	2	.92	.90	.053	.868
300	9	.98	.95	.000	.082
6,000	134	.98	.975	.000	.092

For example, the fourth line in Table 6-4 is interpreted as follows: A test requirement that the 90 percent lower confidence limit must exceed 90 percent means that a successful test of 60 plates can have no more than two failures. Under these conditions, a manufacturer’s plates, each of which has a probability of passing the test (i.e., of no penetration) of .98 stand an 11.9 percent chance that at least 3 or more of the 60 plates will fail the test, so that manufacturer will fail the test. Conversely, under these test conditions, the government stands a 5.3 percent chance that a manufacturer’s marginally performing plates that have a no-penetration probability of .90 will pass the test.

The first three lines of the Table 6-4 demonstrate that reducing the sample size from 60 shifts the risk to the manufacturer. For a sample size of 15 it is not possible to pass the test because the sample size is too small to demonstrate a 90 percent requirement with high (90 percent) confidence. The last two lines of Table 6-4 show the sharp increases in required sample size when the requirement is increased beyond .9 and the risks are held roughly constant.

Table 6-4 also shows that, for a sample size of 60, a manufacturer must produce hard body armor that has a true probability of no penetration substantially higher than .9 to have a reasonable chance of passing the test. Figure 6-5 plots the manufacturer’s risk for various Pr(nP), where it is clear that to have a reasonably high probability of passing the protocol’s first-shot, no-penetration standard, the manufacturer’s plates must have a Pr(nP) substantially higher than .9. From a soldier safety perspective, this is appropriate.

Page 130 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

FIGURE 6-5 Plot of the manufacturer’s risk for various Pr(nP) under the DOT&E protocol. To have a reasonably high probability of passing the protocol’s first-shot no-penetration standard (out of 60 plates tested, no more than two penetrations are allowed), the manufacturer’s plates must have a high Pr(nP).

Because of the issues discussed in earlier sections of this report, it is difficult to tell if the observed variation in BFD for hard body armor is attributable mainly to the variation in plates, to the variation in the test process, or to both. As a result, all observed variation is being attributed to the plates. While this is clearly incorrect, without a better understanding of the specific sources of variation, it is impossible to do otherwise. This probably results in overdesign and/or overmanufacture of the plate to ensure a high probability of passing FAT and LAT.

Figure 6-6 illustrates the potential impact on manufacturers by simulating the effects of the BFD test on the probability of a manufacturer failing FAT under various conditions. In Figure 6-6 (a), the assumption is that the plates resulting from a manufacturer’s process have a mean BFD of 38 mm. The solid line (100 percent variance) shows the results when all observed variation is attributed to the plates. The amount of variation is shown on the x-axis in terms of standard deviations, and the probability of failing to meet the BFD criterion is shown on the y-axis. The plot shows

Page 131 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

that the probability of failure ranges from zero for standard deviations just above 3 to nearly one for standard deviations just less than 5. The dashed curves show the impact of attributing less of the observed variation to the plates. Notice that the percentage attributed to the plates decreases as the probability of passing the test increases. Figure 6-6 (b) shows a similar result for a mean BFD of 40 mm.

FIGURE 6-6 Risk comparisons for BFD assuming in the left plot that the manufacturer’s true mean BFD is 38 mm and in the right plot is 40 mm; the associated fraction of variation is shown on the x-axis. The plots show that decreasing variability in BFD, either via more consistent manufacturing processes or as a result of more repeatable testing measures, lowers the manufacturer’s chances of failing testing (given that the manufacturer’s plates do meet standards and holding everything else constant).

The plots show that decreasing variability in BFD by means of more consistent manufacturing processes or more repeatable testing measures lowers the manufacturer’s chance of failing testing (given that the manufacturer’s plates do meet standards and that all other factors are constant). At issue is the current impossibility of estimating what fraction of the variation in BFD is attributable to variation in the plates and what fraction is attributable to the testing methods. The experiments recommended in Chapter 4 should provide a much better estimate of the test process variation. As discussed in earlier sections, there are known but

Page 132 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

not well-quantified issues that relate to variations in the thermal and stress properties of the clay medium itself, variations caused by different individuals handworking the clay as it is prepared for testing, variations in calibration, and other factors. Information on how the existing process performs will facilitate improving the process (minimizing excess variation, should it exist.)

Finding. Using a statistically principled protocol enables decision makers to explicitly address the necessary and inherently unavoidable risk trade-offs that must be faced in testing. Furthermore, while additional research and coordination may be necessary to finalize the protocol design, and continuing review will likely be required as manufacturing conditions and plate designs change over time, a statistically principled protocol ensures that decision makers have sound information about body armor performance in order to ensure the quality of a critical soldier safety item.

RECOMMENDATIONS

The committee unequivocally supports the implementation of a statistically principled test protocol that explicitly and scientifically acknowledges and addresses the testing risks described in this report. A statistically principled test protocol is critical because it is the only way to rigorously characterize body armor performance under a variety of threat conditions and operating environments to better inform DoD decisions. Because there is variation in manufactured body armor, testing alone cannot ensure that body armor is 100 percent effective. One can, however, develop higher confidence in the effectiveness of the body armor by using a statistically principled and rigorous assessment with sufficient sample size. The committee commends DOT&E for its leadership in establishing such statistically principled protocols for body armor first article testing and lot acceptance testing.

Any test protocol involves some risk that bad body armor will pass the test and good body armor will fail. In setting the standards within the protocols, the DoD has a responsibility to be explicit about these risks and to design a test protocol that balances cost, performance, ability to execute, fairness to the manufacturer, and risk to the soldier. Trade-offs can be made to result in statistically principled protocols that are both scientifically rigorous and practical in application. This conceptual approach is supported by the current DOT&E protocol.

Due diligence and deliberate caution are warranted during the change from the old test protocols to the new protocols. In particular, because manufacturers have strong incentives to build armor that has a high chance of passing FAT and LAT, there is some chance that the change in test protocol could have unintended impacts on body armor design and/or performance. Given the success of the current body armor in the field, changes in testing protocols should be made with deliberate

Page 133 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

caution to ensure that plate performance is maintained (or improved) while also ensuring that the best science is brought to bear on testing body armor.

The committee commends DOT&E for its ongoing discussions with the Army, USSOCOM, and other stakeholders and its willingness to reconsider and revise the confidence bounds and tolerance interval levels of the proposed protocols as appropriate and necessary. Within these discussions the committee recommends that the following three considerations should continue to be explicitly addressed.

• First, it is important to reach consensus on what constitutes a BFD failure and how such failure relates to soldier injury or death. Accordingly, Chapter 8 highlights the need for research to quantify the medical results of blunt force trauma on tissue and to use those results to underpin a scientifically based BFD standard.

• Second, the current clay-based test methodology is probably introducing extra variation into the test results. In particular, as described in the Chapter 4 section “Roadmap for Improving the Testing Process,” replacing Roma Plastilina #1 with a backing material that can be calibrated at room temperature has the potential to eliminate substantial variation. Thus, Recommendation 4-1, to expedite development of a standard replacement that can be used at room temperature, is critical for improving both the testing process and the statistical assessment of body armor performance.

• Third, it is important that the proposed statistically principled protocol be seen not just as another in a long line of standards but as an improvement that incorporates input from all of the stakeholders and that embodies the best science. In so doing, it is particularly important to develop broad-based support for the statistically principled protocol and to ensure that its adoption will neither undo many years of successful body armor engineering nor result in other undesirable outcomes.

In terms of the DOT&E FAT and LAT protocols for body armor, the committee has four specific recommendations.

Recommendation 6-1: The Office of the Director, Operational Test and Evaluation (DOT&E) should continue to conduct due diligence to carefully and completely assess the effects, large and small, of its statistical protocol as it is adopted across the body armor testing community. In particular, DOT&E should continue to

• Collaborate with the Army and the United States Special Operations Command (USSOCOM) to revise the test protocol as necessary, based on the results of Army and USSOCOM “for government reference” first article testing test results and

Page 134 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

other empirical evidence, to ensure that currently acceptable plate designs are not eliminated under the new protocol; and
Regularly assess the impact or impacts of the new protocol on plate design, particularly plate weight, to ensure the test protocol results in body armor that achieves the requisite soldier safety while not negatively, inappropriately, or inadvertently affecting plate design.

Recommendation 6-2: The Office of the Director, Operational Test and Evaluation, should consider modifying the first article testing protocol to

Generalize the description of the backface deformation (BFD) upper tolerance interval calculation to allow for nonnormal BFD distributions;
Specify a confidence interval calculation methodology that has better coverage properties, such as the Agresti-Coull interval recommended by Brown et al. (2001) and described in detail in Agresti and Coull (1998); and
Specify guidelines that will accommodate deviations in environmental conditions and/or plate size from the current 60-plate design matrix.

For example, DOT&E could revise the current protocol to specify that if a procurement contract does not require testing under one or more of the environmental conditions listed in the design matrix, the plates listed under that condition would then be tested under ambient conditions.

Recommendation 6-3: The Office of the Director, Operational Test and Evaluation, and the Army should continue to consult and engage statisticians throughout the process of assessing and revising protocols, comparing the performance of the new and old protocols, assessing the effects of the new protocols, and considering possible changes.

Testers and statisticians should continue to work together as a team (1) to quantify in a statistically rigorous manner the portion of variation in BFD attributable to the testing process and that attributable to the plates and (2) to ensure these results are appropriately reflected in an updated protocol. In particular, the statisticians involved with developing and implementing the statistically principled protocol should be involved with the experimentation recommended in Chapter 4.

Over the course of the committee’s research and deliberations, the DOT&E, Army, and USSOCOM have endeavored to establish statistically principled test standards that are realistically achievable with the current body armor designs.

Page 135 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

Recommendation 6-4: The Office of the Director, Operational Test and Evaluation, the Army, and the United States Special Operations Command should work together to arrive at an acceptable set of test standards for lot acceptance testing that is both statistically principled and realistically achievable with current body armor designs.

REFERENCES

Agresti, A., and B. Coull. 1998. Approximate is better than “exact” for interval estimation of binomial proportions. The American Statistician 52(2):119-126.

American Society for Quality. 2008. American National Standard Sampling Procedures and Tables for Inspection by Attributes. ANSI/ASQ Z1.4-2008. Milwaukee, W.I.: American Society for Quality.

Brown, L., T. Cai, A. DasGupta. 2001. Interval Estimation for Binomial Proportions. Statistical Science 16(2):101-117.

DoD (U.S. Department of Defense). 1996. DoD Preferred Methods for Acceptance of Product. MIL-STD-1916. Arlington, Va.: DoD.

DoD. 2009. DoD Testing Requirements for Body Armor. Report Number D-2009-047. Arlington, Va.: DoD Inspector General.

DoD. 2010. Memorandum for National Research Committee. Arlington, Va.: Department of Defense Inspector General.

DOT&E (Director of Operational Test and Evaluation). 2010a. Standardization of hard body armor testing. Memorandum dated April 27, 2010.

DOT&E. 2010b. Memorandum: Standard for lot acceptance ballistic testing of hard body armor. July 2, 2010.

Dunn, N. 2010. ATEC proposed Army protocol and background for National Academy of Science Statisticians. Aberdeen Proving Ground, Md.: Army Testing and Evaluation Center.

NIST (National Institutes of Standards and Technology). 2010. Engineering Statistics Handbook. Available online at http://www.itl.nist.gov/div898/handbook/. Last accessed March 15, 2011.

NRC (National Research Council). 2010. Phase II Report on Review of the Testing of Body Armor Materials for Use by the U.S. Army. Washington, D.C.: National Academies Press.

Page 136 Cite

Suggested Citation:"6 Statistical Considerations in Body Armor Testing." National Research Council. 2012. Testing of Body Armor Materials: Phase III. Washington, DC: The National Academies Press. doi: 10.17226/13390.

×

OTA (Office of Technology Assessment). 1992. Police Body Armor Standards and Testing, Volume II: Appendices. OTA-ISC-535. Washington, D.C.: Office of Technology Assessment.

PEO-S (Program Executive Officer, Soldier). 2010. Interceptor body armor (IBA). Available online at https://peosoldier.army.mil/factsheets/SEQ_SSV_IBA.pdf. Accessed January 11, 2011.

RDECOM (U.S. Army Research, Development and Engineering Command). 2009. Amendment of solicitation/modification of contract W91CRB-09-D-0001/P00004. Aberdeen Proving Ground, Md.: RDECOM.

USSOCOM (U.S. Special Operations Command). 2010. USSOCOM first article testing protocol. Aberdeen Proving Ground, Md.: U.S. Special Operations Command.

Testing of Body Armor Materials: Phase III (2012)

Chapter: 6 Statistical Considerations in Body Armor Testing

Welcome to OpenBook!

Get Email Updates