Appendix C
Committee Evaluation of Statistical Analysis Report
The Statistical Analysis Report (B2M10) was submitted in response to a contract with the FBI, to analyze the results of the assays on the 1,070 FBI Repository (FBIR) samples and determine whether the results of the assays could be related to those obtained on the evidentiary material. If such a relationship could be identified, a secondary issue was the development of a measure of its “statistical strength.” Two of the attack letters assayed positive for the four mutations A1, A3, D, and E. Results on 1,059 of the 1,070 samples were tabulated in the Statistical Analysis Report. Eight samples tested positive for all four mutations; seven of these eight samples came from one institution (USAMRIID) and the remaining sample came from a different institution (Battelle Memorial Institute [BMI]). A table of documented transfers of samples from one institution to another showed a transfer of sample material from the first institution (USAMRIID) to the second institution (BMI). This Appendix discusses the validity of the inferences and calculations in the Statistical Analysis Report submitted to the FBI.
As noted in Chapter 6, the statistical analyses used in the report (e.g., 95 percent confidence interval for the proportion of samples with four mutations, chi-squared tests of independence) require two key assumptions to be valid:
1. Representativeness: The 1,059 samples are assumed to be a representative and random collection of samples from some well-defined population of samples.
2. Independence: The 1,059 samples are assumed to be independent of one another (i.e., have no connection with each other, beyond that they all come from the same population).
The Statistical Analysis Report acknowledges that neither assumption can be validated from these data. The committee agrees with this assessment. As a consequence, many of the statistical methods applied to these data cannot be
validated. The consequences of the violation of these assumptions and their impacts are listed below.
1. FBIR is not a representative and random collection of samples from a well-defined population of B. anthracis samples.
1,059 samples do not appear to satisfy assumption 1. They were obtained in response to a request from the FBI. No information is available on samples in the population that were not submitted. In fact, the “target population’’ seems not to have been defined. It could be the population of all unique preparations of B. anthracis Ames in the United States, or in the world, or from selected institutions. The absence of a definition of “well-defined population” makes it difficult to assess representativeness of the collection. The elimination of samples that had “inconclusive” results on assays also appears to be nonrandom, as some institutions had many more “inconclusive” assays than others.
2. The 1,059 samples in the FBIR are not independent.
FBI submitted to the committee a table of known transfers of samples between institutions. Hence, the second assumption is violated. Thus, the results of the chi-squared tests for independence of the mutations that are calculated in the report are not meaningful. Further, the confidence interval for the proportion 8/947 is not appropriate. The correct denominator for this proportion is likely not 947. A more accurate numerator and denominator might refer to the number of known independent preparations rather than the number of samples, but such information may not be possible to obtain.
3. Violation of assumptions renders invalid the inferences from the statistical analyses.
the FBIR is not a representative and random collection of independent samples, the results on the assays from the repository may be biased. Virtually all statistical procedures assume that the units on which measurements are made comprise a random, representative collection from the target population. (The effects of biased sampling on inferences have been well documented; see, e.g., Freedman et al., 2007). Without an appropriate model that characterizes the nonrepresentativeness and the degree of dependence among the samples, it is not possible to calculate a meaningful measure of “statistical significance” in the results.
4. Results on 112 samples beyond the 947 samples
Statistical Analysis Report eliminated from most of its tables the results of the assays on 112 samples that showed “inconclusive” for A1, A3, MRI-D, or E. Twenty-one of these 112 sample that were eliminated from the statistical analysis assayed positive for 1, 2, or 3 mutations. Table C-1 lists these samples. (Five samples—05-022, 49-014, 53-014, 53-068, 54-008—are listed twice because they were reported as “inconclusive” or “variant” on two assays.)
TABLE C-1 Samples with Positive and “Inconclusive” or “Variant” Assays
FBIR Number | A1 | A3 | MRI-D | IITRI-D | E | +Mutations |
039-010 | inc | + | − | − | − | A3 |
044−034 | var | − | − | + | − | IITRI−D |
049−014 | inc | var | + | + | − | MRI−D, IITRI−D |
053−004 | var | + | − | − | + | A3, E |
053−010 | var | + | + | + | + | A3, MRI−D, IITRI−D, E |
053−014 | var | inc | − | − | + | E |
053−068 | inc | inc | + | − | − | MRI−D |
054−008 | inc | + | + | inc | + | A3, MRI−D, E |
061−030 | inc | − | + | + | − | MRI−D, IITRI−D |
066−015 | inc | inc | + | + | − | MRI−D, IITRI−D |
005−022 | + | var | − | inc | + | A1, E |
017−006 | − | var | + | + | − | MRI−D, IITRI−D |
049−014 | inc | var | + | + | − | MRI−D, IITRI−D |
049−018 | − | var | + | + | − | MRI−D, ITRI−D |
053−014 | var | inc | − | − | + | E |
053−068 | inc | inc | + | − | − | MRI−D |
054−066 | + | inc | + | + | + | A1, MRI−D, IITRI−D, E |
054−068 | − | var | + | − | + | MRI−D, E |
005−020 | − | − | + | inc | − | MRI−D |
005−022 | + | var | − | inc | + | A1, E |
043−016 | − | − | + | inc | − | MRI−D |
044−020 | − | + | − | inc | + | A3, E |
052−026 | + | + | + | inc | − | A1, A3, MRI−D |
054−008 | inc | + | + | inc | + | A3, MRI−D, E |
054−022 | − | − | + | inc | − | MRI−D |
057−036 | − | − | + | inc | − | MRI−D |
inc = inconclusive
IITRI = Illinois Institute for Technology Research Institute
MRI = Midwest Research Institute
var = variant
In addition to the two 3-positive samples (+++) among the 947 samples, the four samples below also tested positive for 3 mutations (ordered by FBIR number):
052-026 | + | + | + | inc | − | A1, A3, MRI-D |
053-010 | var | + | + | + | + | A3, MRI-D, IITRI-D, E |
054-008 | inc | + | + | inc | + | A3, MRI-D, E |
054-066 | + | Inc | + | + | + | A1, MRI-D, IITRI-D, E |
The following four samples revealed positive assays for 2 of the 4 mutations, in addition to the 11 samples noted among the 947 samples (ordered by FBIR number):
005-022 | + | var | − | inc | + | A1, E |
044-020 | − | + | − | inc | + | A3, E |
053-004 | var | + | − | − | + | A3, E |
054-068 | − | var | + | − | + | MRI-D, E |
DILUTION EXPERIMENTS
Dilution experiments were conducted to assess the sensitivity of the assays to various concentrations. Thirty samples were prepared from RMR-1029 at dilution 10.0. As with the other samples, some of the assays were “inconclusive.” Genotype E tested positive in all 30 samples; all 4 mutations tested positive for 16 samples. But in the remaining 14 samples, assays for one or more of the genotypes were negative. In fact, one sample tested negative for A1, A3, and D; it was positive for only E. Five samples were positive for two mutations only (A3 and E), and eight samples were positive for only three of the four mutations (7 for A3, D, E; 1 for A1, A3, E). Thus, 6 of the 30 replicate samples (20 percent) tested positive for only 1 or 2 of the mutations. Given that 50 of the 947 FBIR samples showed only 1 positive, and 11 of the 947 showed only 2 positives, this variation indicates that some of the samples may have harbored mutations that went undetected. Absent any repeat testing of these samples, however, it is difficult to know how such false negatives might have affected the inferences.
Additional experiments were conducted on RMR-1029 and another sample, “SPS.266 Tube#5,” at 10 dilutions levels (10.1, …, 10.10). The results of the
three replicates at each dilution level, for each of the five genotypes, for samples from both RMR-1029 and SPS.266 Tube#5 were reported in Chapter 6. Variability in the results on replicates, even from the same sample at the same dilution level, demonstrates the value, and need for, replicate testing. For example, the results on the three replicates from RMR-1029 at dilution 10.1, ordered as A1, A3, MRI-D, IITRI-D, E, were: (- + + + +), (- + + + -), (+ + + + -). Clearly, dilution affects the assay result: the greater the dilution, the more likely the assay is negative. Moreover, it is perhaps unexpected that greater dilutions sometimes give positive results when not all replicates at lesser dilutions did so.
CONCORDANCE OF TESTS FROM IITRI-D AND MRI-D
The FBI retained both the Illinois Institute for Technology Research Institute (IITRI) and Midwest Research Institute (MRI) to conduct the D assays. Because the assays on the 1,059 samples can be considered to be independent between IITRI and MRI, the Statistical Analysis Report (Table 3, p. 7, as presented below) tabulates the results of the D assays from the two facilities:
IITRI-D | MRI-D | |||||
Inconclusive | Negative | No growth | Pending | Positive | Total | |
Inconclusive | 0 | 22 | 12 | 0 | 0 | 34 |
Negative | 17 | 909 | 1 | 1 | 12 | 940 |
Negative-u | 1 | 20 | 0 | 0 | 0 | 21 |
Positive | 6 | 12 | 0 | 0 | 46 | 64 |
TOTAL | 24 | 963 | 13 | 1 | 58 | 1,059 |
The Statistical Analysis Report combined the “negative-u” results with the “negative” results, and eliminated the 12 samples that showed “no-growth” by IITRI-D and “inconclusive” by MRI-D as well as the one “pending” sample, to yield the following table:
IITRI-D | MRI-D | |||||
Inconclusive | Negative | Positive | Total | |||
Inconclusive | 0 | 22 | 0 | 22 | ||
Negative | 18 | 929 | 12 | 959 | ||
Positive | 6 | 12 | 46 | 64 | ||
TOTAL | 24 | 963 | 58 | 1,045 | ||
Eliminating the 14 “no-growth” and “pending” samples, the concordance rate is 975/1045 = 0.933, with a 95 percent confidence interval (0.916, 0.947). Thus, the agreement between the facilities is unlikely to be lower than 91.6 per-
cent and likely does not exceed 94.7 percent. Of greater interest, however, are the 12 samples that were positive by IITRI-D but negative for MRI-D, the 12 samples that were negative by IITRI-D but positive by MRI-D, and the six samples that were positive by MRI-D but inconclusive by IITRI-D. While concordance is informative, these 30 samples with discordant results might provide increased information about the samples and the assay process. On the other hand, we also know from the repeated assays of the dilution series that some discordance also arises owing to variation even when using the same assay procedure.
In any case, because genotype D is the only one of the four genotypes that was subjected to independent testing by a second organization one cannot say whether the results on the other genotypes might have been different if they also had been subjected to independent testing.
“SIGNIFICANCE” OF SEVEN (++++) SAMPLES FROM INSTITUTION F
The Statistical Analysis Report notes in its conclusions:
“In summation, though the random chance of occurrence of the sample type (++++) is 8 out of 947 (i.e., 0.84%) with exact 95% confidence interval of 0.0037 to 0.0166 (I.e., from 1 in 270 to 1 in 60), this sample type has been found in only two institutions thus far sampled (USAMRIID and BMI), and its occurrence in BMI is explained by a recent sample transfer from USAM to BMI, since there is no documented record of sample transfers in the other direction.’’ (p. 2)
As noted in Chapter 6, 598 of the 947 samples (63 percent) came from Institution F. (Twelve of the institutions submitted 6 or fewer samples; 4 institutions submitted 15-31 samples, and 4 institutions submitted 49-74 samples.) Therefore, one would not be surprised to find more “mutation-positive” samples from Institution F than, say, from Institution B (which contributed only one sample). One might naturally ask: How unusual is the occurrence of seven “4-mutation” samples—or even all eight—from Institution F? Given that Institution F contributed almost 2/3 of the 947 samples, how many of the 4-positive (++++) samples would Institution F receive if the 4-mutation samples were distributed completely at random?
The answer to this question is given by the probabilities of observing 0 or 1 or 2 or ... or 8 of the eight (++++) samples from Institution F, given that Institution F submitted 598 of the 947 samples that yielded definitive results on the A1, A3, MRI-D, and E assays. These probabilities (from the hypergeometric probability distribution) (Johnson et al., 2005) are shown in Table C-2.
TABLE C-2 Probabilities of k 4-Mutation Samples in Institution F
k = | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Probability | 0.0003 | 0.0045 | 0.0276 | 0.0955 | 0.2058 | 0.2826 | 0.2415 | 0.1174 | 0.0248 |
This table shows that the chance of Institution F having ended up with seven or eight of the eight (++++) samples is (0.1174 + 0.0248) = 0.1422, or about 1/7. Therefore, while the observed data showing that seven of the eight (++++) samples appeared in Institution F is not completely typical, it also could hardly be considered extreme.