The results of some of the measurements described in previous reports and outlined in Chapter 6 are summarized in Tables D.1 through D.3. The measured kerma was reported by three groups, Johns Hopkins University/Applied Physics Laboratory (JHU/APL),1 National Institute of Standards and Technology (NIST),2 and American Association of Physicists in Medicine (AAPM),3 as described in Table D.1. Aluminum half-value layers were reported by the same three groups, as described in Table D.2. Effective dose was estimated by the same three groups using the reference effective dose formula and by the U.S. Army Public Health Command (USAPHC)4 using the deep dose equivalent from optically stimulated luminescence (OSL) dosimeter readings, as described in Table D.3.
1 JHU/APL, Radiation Safety Engineering Assessment Report for the Rapiscan Secure 1000 in Single Pose Configuration, NSTD-09-1085, Version 2, Laurel, Md., August 2010; hereinafter referred to as the JHU/APL report.
2 J.L. Glover, R. Minniti, L.T. Hudson, and N. Paulter, Assessment of the Rapiscan Secure 1000 Single Pose (ATR version) for Conformance with National Radiological Safety Standards, NIST report for the TSA, inter-agency agreement HSHQDC-11-X-00585, April 19, 2012; hereinafter referred to as the NIST report.
3 AAPM, Radiation Dose from Airport Scanners: Report of AAPM Task Group 217, College Park, Md., 2013; hereinafter referred to as the AAPM report.
4 USAPHC, Radiation Protection Consultation No. 26-MF-0E7K-11, Rapiscan Secure 1000 Single Pose Dosimetry Study, Aberdeen Proving Ground, Md., 2012.
TABLE D.1 Measured Kerma
|Report||KStd (nGy) per Scan||Kmax (nGy) per Scan|
|JHU/APL||41 ± 1|
|NIST||47 ± 1a||95 ± 6|
|AAPM||46 ± 3|
a The value 47 ± 1 is from the updated September 28, 2012, NIST report; however, the committee initially had access to a preliminary report, dated April 19, 2012, where the value was 48.6 ± 1.
TABLE D.2 Measured Aluminum Half-Value Layers
|Report||HVL1 (mm of Al)|
NOTE: JHU/APL dual values are for a primary and secondary unit.
TABLE D.3 Estimated Effective Dose per Screening
THE PRIMARY RESULTS OF THE THREE RAPISCAN STUDIES: A STATISTICAL OVERVIEW
The results about exposure in the JHU/APL, NIST, and AAPM reports are summarized as in Table D.1. This summary accurately describes what the studies say, but the studies are cryptic in discussing how they reached margins of error for their estimates of average exposure. At a minimum, all the studies deviate from standard statistical methods for describing the uncertainty in key parameters. Here, the committee elaborates on this point one study at a time and then considers how one might synthesize the results of the three studies in the spirit of meta-analysis.
The AAPM Report
Having obtained data about nine Rapiscan machines (six of them deployed at the Los Angeles International Airport), the authors report on page 9 that
The energy-corrected measurements at reference point averaged for all tested units was 0.046 µGy with a standard deviation of 0.003 µGy and a range of 0.04 µGy to 0.052 µGy.
To generalize from the study, a statistician would typically make the tacit assumption that the nine units tested are a random sample of all Rapiscan units used at airports and would use the observed results to obtain both a point estimate of average exposure and a 95 percent confidence interval for average exposure. This confidence interval would be construed as the range of plausible values for average exposure over all Rapsican machines and would have a 95 percent probability of including the all-Rapiscan average. The statement quoted above, however, does not provide the 95 percent confidence interval; nor is it obvious how to construct that interval from the information the report provides.
Much of the problem relates to ambiguity over whether the authors are reporting a standard deviation (the phrase they use) or instead a standard error. The standard deviation is a familiar measure of spread among the original measurements, in this case nine. By contrast, the standard error reflects the uncertainty in using the average of the original measurements as a proxy for the overall mean. As the sample size increases, the standard deviation among the measurements would not be expected to change because the early pattern of spread would tend to be replicated among later measurements; however, the standard error would decrease because bigger samples produce more accurate results. Indeed, the standard deviation (SD) and the standard error (SE) are related by the formula:
where n is the sample size.
To find the 95 percent confidence interval for the mean, based on nine observations, the usual formula would be:
where X is the sample mean (here 0.046) and
This factor-of-three difference arises because n = 9 in the SE versus SD formula above. The factor 2.306 arises from use of the t-probability distribution with eight degrees of freedom, which is the usual distribution applied to a random sample of size nine.
Depending on which of the two values is used for the SE, we reach a 95 percent confidence interval of either (.0391, .0529) with SE = .003 or (.0437, .0483) with SE = .001.
The best guess, given what the authors said, is that they were reporting the standard deviation, because they state that the range from the smallest to largest observations extended from .04 µGy to .052 µGy. For that reason, (.0437, .0483) is the more plausible 95 percent confidence interval, which implies that the point estimate of .046 suffers a “margin of error” of .0023. Expressed in nGy, the interval is (43.7, 48.3).
The NIST Report
The NIST report statistic that is comparable to that from the AAPM report is “47 ± 1 nGy.” However, interpreting the uncertainty range ±0.8 nGy is very difficult given what is reported. The authors do not tell us the size of the sample; rather, they report on page 21 that
The air kerma was measured on several occasions and in each case the ion chamber was repositioned multiple times in order to provide an estimate of the uncertainty associated with positioning the chamber.
Needless to say, one cannot apply sampling formulas to vague formulations like “several occasions” or “multiple times.” It is conceivable that the authors correctly present the 95 percent confidence interval for mean exposure, which extends from 47 – 1 to 47 + 1, or from 46 to 48. But they offer no reason to be confident that they have done so.
In any case, the authors report that “only one system was tested.” Thus, while their results say something about within-machine variability for a particular unit, they say nothing about cross-machine variability in mean exposure. Yet the AAPM results suggest that cross-machine variation might be considerably greater than cross-scan variations for a single machine. The appearance of greater precision for the NIST measurement might be illusory, even if its confidence interval for mean exposure is narrower than that of AAPM, because NIST had only one data point as compared to nine for AAPM.
The JHU/APL Report
The JHU/APL report is more specific than the other two are in that it offers the original results in Tables 8-1 and 8-2. It appears that only one machine was tested, with five scans performed for both the secondary unit and the primary unit. There was very little variability across the five scans: the coefficient of variation among the five scans was on the order of 1 percent, meaning that the standard deviation SD was only 1 percent of the mean. For that reason, was only about ½ of 1 percent. With n = 5, the 95 percent confidence interval for the overall mean would follow:
Given an X of 41 nGy and an SE of .005 X ≈ 205, the confidence interval would extend from 40.43 to 41.57.
Again, however, the results offer no indication of cross-machine variability, a key consideration in making inferences about mean exposure for all Rapiscan machines.
SYNTHESIZING THE RESULTS
It appears that the AAPM report results are statistically consistent with those from the JHU/APL report. After all, AAPM got a mean exposure of 41 nGy for one machine, which fell within the range of 40 nGy to 52 nGy that AAPM observed over nine machines.
However, without knowing more about the measurement procedures, there is no way to combine the results of the three studies that is manifestly correct. One approach might assume each of the three studies yielded an average exposure that differs from μ, the true average exposure for all Rapiscan machines, by an amount that follows a zero-mean bell-shaped normal curve, with the standard deviation σ of all three curves being the same. That assumption implies that, a priori, all three studies are equally accurate in estimating μ.
Under that assumption, combining the results is mathematically tractable. Given that the three studies yielded mean exposures of 46, 41, and 47, the point estimate of average exposure based on “one study, one vote” would be (46 + 41 + 47)/3 ≈ 45 nGy. The common standard deviation σ for the measurement error affecting each study’s result would be approximated as 3 for these three observations. A
95 percent confidence interval for μ would extend from 38 to 50. If that interval seems large, it is because the estimate of σ based on the three key numbers is highly unstable: when one of the three results exceeds another by 6 (i.e., 47 − 41), then the true standard deviation σ could be larger than 3. The confidence interval takes account of worst-case possibilities as well as more typical ones.
Another approach would follow the principle of “one machine, one vote,” and would combine results for the 11 machines tested, nine of them by AAPM. The sample mean for these 11 data points would be (9 × 46 + 41 + 47)/11 ≈ 46 nGy, while their standard deviation would be approximately one and a half. Working from there to the standard error of the estimate 46 nGy and applying the appropriate t-distribution formula yields a 95 percent confidence interval for μ that extends from 42 to 49 nGy. That this interval is narrower than 38 to 50 reflects the fact that random samples of size 11 are considered almost an order of magnitude more reliable than random samples of size 3. Of course, this method effectively gives 9/11 = 82 percent of the weight to the AAPM study, which would be unwarranted if the procedures followed in that study were less reliable than were those in the others. Judging the relative plausibility of the studies cannot be accomplished without examining the methodologies used to measure the radiation emitted from the AIT systems and calculating the doses to the persons being screened and others.