H
Principal Components Analysis: How Many Elements Should Be Measured?

The number of elements in bullet lead that have been measured has ranged from three to seven, and sometimes the concentration of a measured element is so small as to be undetectable. The optimal number of elements to measure is unclear. An unambiguous way to determine it is to calculate, using two-sample equivalence t tests, the probability of a false match on the 1,837-bullet data set as described in Chapter 3. Recall that the equivalence t test requires specification of a value δ/RE where RE = relative error and a value α denoting the expected probability of a false match. Each simulation run would use a different combination of the elements: there are 35 possible subsets of three of the seven elements, 35 possible subsets of four of the seven elements, 21 possible subsets of five of the seven elements, seven possible subsets of six of the seven elements, and one simulation run corresponding to using all seven elements. Among the three-element subsets, the subset with the lowest false match probability would be selected, and a similar process would occur for the four-, five-, and six-element subsets. One could then plot the false match probability as a function of δ/RE for various choices of δ/RE and determine the reduction in false match probability in moving from three to seven elements for testing purposes. Such a calculation may well differ if applied to the full (71,000-bullet) data set.

An alternative, easier to apply but less direct approach is to characterize the variability among the bullets using all seven elements. To avoid the problem of many missing values of elemental concentrations in the 1,837-bullet dataset, we will use the 1,373-bullet subset, for which all 7 elemental calculations exist (after inputing some values for Cd). The variability can then be compared with the variability obtained using all possible three-, four-, five-, and six-element subsets. It is likely that the false match probability will be higher in subsets that



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 157
Forensic Analysis Weighing Bullet Lead Evidence H Principal Components Analysis: How Many Elements Should Be Measured? The number of elements in bullet lead that have been measured has ranged from three to seven, and sometimes the concentration of a measured element is so small as to be undetectable. The optimal number of elements to measure is unclear. An unambiguous way to determine it is to calculate, using two-sample equivalence t tests, the probability of a false match on the 1,837-bullet data set as described in Chapter 3. Recall that the equivalence t test requires specification of a value δ/RE where RE = relative error and a value α denoting the expected probability of a false match. Each simulation run would use a different combination of the elements: there are 35 possible subsets of three of the seven elements, 35 possible subsets of four of the seven elements, 21 possible subsets of five of the seven elements, seven possible subsets of six of the seven elements, and one simulation run corresponding to using all seven elements. Among the three-element subsets, the subset with the lowest false match probability would be selected, and a similar process would occur for the four-, five-, and six-element subsets. One could then plot the false match probability as a function of δ/RE for various choices of δ/RE and determine the reduction in false match probability in moving from three to seven elements for testing purposes. Such a calculation may well differ if applied to the full (71,000-bullet) data set. An alternative, easier to apply but less direct approach is to characterize the variability among the bullets using all seven elements. To avoid the problem of many missing values of elemental concentrations in the 1,837-bullet dataset, we will use the 1,373-bullet subset, for which all 7 elemental calculations exist (after inputing some values for Cd). The variability can then be compared with the variability obtained using all possible three-, four-, five-, and six-element subsets. It is likely that the false match probability will be higher in subsets that

OCR for page 157
Forensic Analysis Weighing Bullet Lead Evidence comprise lesser amounts of the total variability and lower in subsets that comprise nearly all of the variability in the data set. Variability can be characterized by using principal components analysis (PCA). Consider, for example, a PCA using the first three elements (As, Sb, and Sn—elements “123”), which yields 104.564 as the total variation in the data. PCA provides the three linear combinations that decompose this variation of 104.564 into three linear combinations of the three elements in a sequential fashion: the first linear combination explains the most variation (76.892); the second, independent of the first, explains the next-most (19.512), and the third accounts for the remainder (8.16). The total variation in all seven elements is 136.944. Thus, this three-element subset accounts for (104.564/136.944) × 100%, or 76.3% of the total variation. The results of PCA on all 35 3-element subsets are shown Table H.1; they illustrate that subset “237” (Sb, Sn, and Cd) appears to be best for characterizing the total variability in the set, accounting for (114.503/136.944) × 100% = 83.6% of the variability. Subset “137” (As, Sn, and Cd) is almost as good at (113.274/136.944) × 100% = 83.0%. PCA is then applied to all 35 possible four-element subsets; the one that accounts for the most variation, (131.562/136.944) × 100% = 96.1%, is subset “1237” (As, Sb, Sn, and Cd). Among the five-element subsets, subset “12357” (As, Sn, Sb, Cu, and Cd) explains the greatest proportion of the variance: (134.419/136.944) × 100% = 98.2%, or about 2.1% more than the subset without Cu. The five-element subset containing Bi instead of Cu is nearly as efficient: (133.554/136.944) × 100% = 97.5%. Finally, among the six-element subsets, “123457” (all but Ag) comes very close to explaining the variation using all seven elements: (136.411/136.944) × 100% = 99.6%. Measuring all elements except Bi is nearly as efficient, explaining (134.951/136.944) × 100% = 98.5% of total variation. The values obtained for each three-, four-, five-, six-, and seven-element subset PCA are found in Tables H.1, H.3, H.5, H.7, and H.9 below. The corresponding variances in order of increasing percentages are found in Tables H.2, H.4, H.6, and H.8. This calculation may not directly relate to results obtained by simulating the false match probability as described above, but it does give some indication of the contribution of the different elements, and the results appear to be consistent with the impressions of the scientists who have been measuring bullets and making comparisons (Ref. 1-3).

OCR for page 157
Forensic Analysis Weighing Bullet Lead Evidence TABLE H.1 Principal Components Analysis on All Three-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, and 3 represent the first principal components through third, and the rows show the total variation due to each successive element included in the subset.   123 124 125 126 127 134 135 136 137 1 76.892 26.838 27.477 26.829 28.109 73.801 73.957 73.786 74.254 2 96.404 35.383 36.032 35.373 53.809 86.312 86.730 86.294 100.820 3 104.564 37.340 38.204 35.879 62.344 88.269 89.133 86.808 113.274   145 146 147 156 157 167 234 235 236 1 17.553 17.110 27.027 17.534 27.071 27.027 71.675 71.838 71.661 2 20.218 19.223 44.089 19.991 44.535 44.074 87.537 88.137 87.529 3 21.909 19.584 46.049 20.448 46.914 44.589 89.498 90.362 88.037   237 245 246 247 256 257 267 345 346 1 72.186 18.941 18.335 27.146 18.938 27.216 27.146 69.371 69.243 2 98.651 21.493 20.457 45.309 21.220 45.926 45.308 72.353 71.377 3 114.503 23.138 20.813 47.278 21.677 48.143 45.818 74.067 71.742   347 356 357 367 456 457 467 567   1 69.771 69.357 69.891 69.758 3.272 27.030 26.998 27.030   2 96.234 72.149 96.367 96.221 5.039 30.136 29.156 29.929   3 98.208 72.606 99.072 96.747 5.382 31.847 29.522 30.387   TABLE H.2 Total Variance (Compare with 136.944 Total Variance) for Three-Component Subsets, in Order of Increasing Variance. 456 146 156 246 256 145 245 467 567 457 5.382 19.584 20.448 20.813 21.677 21.909 23.138 29.522 30.387 31.847 126 124 125 167 267 147 157 247 257 127 35.879 37.340 38.204 44.589 45.818 46.049 46.914 47.278 48.143 62.344 346 356 345 136 236 134 135 234 235 367 71.742 72.606 74.067 86.808 88.037 88.269 89.133 89.498 90.362 96.747 347 357 123 137 237           98.208 99.072 104.564 113.274 114.503          

OCR for page 157
Forensic Analysis Weighing Bullet Lead Evidence TABLE H.3 Principal Components Analysis on All Four-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, and 4 represent the first principal component through fourth, and the rows show the total variation due to each successive element included in the subset.   1234 1235 1236 1237 1245 1246 1247 1256 1257 1 76.918 77.133 76.903 77.362 27.517 26.865 28.126 27.506 28.599 2 96.441 97.085 96.430 103.955 36.072 35.410 53.844 36.061 54.501 3 104.603 105.249 104.590 123.430 38.556 37.516 62.380 38.279 63.047 4 106.557 107.421 105.096 131.562 40.197 37.872 64.337 38.736 65.202   1267 1345 1346 1347 1356 1357 1367 1456 1457 1 28.122 73.982 73.810 74.278 73.966 74.440 74.263 17.575 27.071 2 53.835 86.772 86.330 100.843 86.751 101.012 100.828 20.366 44.575 3 62.371 89.436 88.440 113.309 89.208 113.752 113.291 22.099 47.221 4 62.877 91.126 88.801 115.267 89.665 116.131 113.806 22.441 48.906   1467 1567 2345 2346 2347 2356 2357 2367 2456 1 27.027 27.071 71.861 71.683 72.209 71.847 72.378 72.195 18.969 2 44.108 44.556 88.174 87.562 98.673 88.164 98.855 98.660 21.650 3 46.221 46.989 90.710 89.674 114.534 90.437 115.149 114.526 23.328 4 46.581 47.446 92.355 90.030 116.495 90.894 117.360 115.035 23.670   2457 2467 2567 3456 3457 3467 3567 4567   1 27.217 27.146 27.217 69.378 69.911 69.777 69.898 27.031   2 45.955 45.333 45.952 72.492 96.387 96.241 96.374 30.276   3 48.496 47.454 48.218 74.257 99.355 98.375 99.147 32.037   4 50.135 47.810 48.675 74.599 101.065 98.740 99.604 32.380   TABLE H.4 Total Variance (Compare with 136.944 Total Variance) for Four-Component Subsets, in Order of Increasing Variance. 1456 2456 4567 1246 1256 1245 1467 1567 2467 2567 22.441 23.670 32.380 37.872 38.736 40.197 46.581 47.446 47.810 48.675 1457 2457 1267 1247 1257 3456 1346 1356 2346 2356 48.906 50.135 62.877 64.337 65.202 74.599 88.801 89.665 90.030 90.894 1345 2345 3467 3567 3457 1236 1234 1235 1367 2367 91.126 92.355 98.740 99.604 101.065 105.096 106.557 107.421 113.806 115.035 1347 1357 2347 2357 1237           115.267 116.131 116.495 117.360 131.562          

OCR for page 157
Forensic Analysis Weighing Bullet Lead Evidence TABLE H.5 Principal Components Analysis on All Five-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, 4, and 5 represent the first principal components through fifth, and the rows show the total variation due to each successive element included in the subset.   12345 12346 12347 12356 12357 12367 12456 12457 12467 1 77160 76.930 77.388 77.144 77.608 77.373 27.547 28.624 28.140 2 97.127 96.468 103.981 97.114 104.205 103.966 36.103 54.541 53.871 3 105.292 104.630 123.467 105.278 124.130 123.456 38.716 63.088 62.408 4 107.775 106.733 131.600 107.496 132.265 131.588 40.387 65.560 64.514 5 109.414 107.089 133.554 107.953 134.419 132.094 40.729 67.194 64.869   12567 13456 13457 13467 13567 14567 23456 23457 23467 1 28.617 73.991 74.464 74.286 74.448 27.072 71.870 72.401 72.217 2 54.530 86.795 101.037 100.852 101.021 44.598 88.203 98.878 98.682 3 63.076 89.584 113.794 113.328 113.773 47.372 90.867 115.186 114.559 4 65.277 91.316 116.440 115.438 116.206 49.096 92.546 117.714 116.617 5 65.734 91.658 118.124 115.799 116.663 49.439 92.887 119.353 117.028   23567 24567 34567             1 72.387 27.218 69.918             2 98.864 45.984 96.394             3 115.177 48.655 99.495             4 117.435 50.326 101.254             5 117.892 50.667 101.597             TABLE H.6 Total Variance (Compare with 136.944 Total Variance) for Five-Component Subsets, in Order of Increasing Variance. 12456 14567 24567 12467 12567 12457 13456 23456 34567 12346 40.73 49.44 50.67 64.87 65.73 67.19 91.66 92.89 101.60 107.09 %29.74 36.10 37.00 47.37 48.00 49.07 66.93 67.83 74.19 78.20 12356 12345 13467 12567 23467 23567 13457 23457 12367 12347 107.95 109.41 115.80 116.66 117.03 117.89 118.12 119.35 132.09 133.55 78.83 79.90 84.56 85.19 85.46 86.09 8 6.26 87.15 96.46 97.53 12357   134.42   98.16  

OCR for page 157
Forensic Analysis Weighing Bullet Lead Evidence TABLE H.7 Principal Components Analysis on All Six-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, 4, 5, and 6 represent the first principal component through sixth, and the rows show the total variation due to each successive element included in the subset.   123456 123457 123467 123567 124567 134567 234567 1 77.172 77.635 77.399 77.620 28.643 74.472 72.411 2 97.157 104.232 103.993 104.216 54.571 101.046 98.887 3 105.322 124.172 123.494 124.159 63.118 113.817 115.215 4 107.934 132.307 131.628 132.294 65.721 116.590 117.872 5 109.605 134.779 133.731 134.494 67.385 118.314 119.543 6 109.946 136.411 134.087 134.951 67.726 118.656 119.885 TABLE H.8 Total Variance (Compare with 136.944 Total Variance) for Six-Component Subsets, in Order of Increasing Variance 124567 123456 134567 234567 123467 123567 123457 67.726 109.946 118.656 119.885 134.087 134.951 136.411 49.45% 80.28% 86.65% 87.54% 97.91% 98.54% 99.61% TABLE H.9 Principal Components Analysis on all Seven-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, 4, 5, and 6 represent the first principal component through sixth, and the rows show the total variation due to each successive element included in the subset.   1234567 1 77.64703 2 104.24395 3 124.20241 4 132.33795 5 134.94053 6 136.60234 7 136.94360 Summary:   3 elements: 237 (83.6% of total variance) 4 elements: 1237 (96.07% of total variance) 5 elements: 12357 (98.16% of total variance) or 12347 (97.52%) 6 elements: 123567 (99.61% of total variance) or 123457 (98.54%) (Bi-Ag correlation) 7 elements: 1234567 (100.00% of total variance) REFERENCES 1. Koons, R. D. and Grant, D. M. J. Foren.. Sci. 2002, 47(5), 950. 2. Randich, E.; Duerfeldt, W.; McLendon, W.; and Tobin, W. Foren. Sci. Int. 2002, 127, 174–191. 3. Peele, E. R.; Havekost, D. G.; Peters, C. A.; and Riley, J. P. USDOJ (ISBN 0-932115-12-8), 57, 1991.