H
Principal Components Analysis: How Many Elements Should Be Measured?
The number of elements in bullet lead that have been measured has ranged from three to seven, and sometimes the concentration of a measured element is so small as to be undetectable. The optimal number of elements to measure is unclear. An unambiguous way to determine it is to calculate, using two-sample equivalence t tests, the probability of a false match on the 1,837-bullet data set as described in Chapter 3. Recall that the equivalence t test requires specification of a value δ/RE where RE = relative error and a value α denoting the expected probability of a false match. Each simulation run would use a different combination of the elements: there are 35 possible subsets of three of the seven elements, 35 possible subsets of four of the seven elements, 21 possible subsets of five of the seven elements, seven possible subsets of six of the seven elements, and one simulation run corresponding to using all seven elements. Among the three-element subsets, the subset with the lowest false match probability would be selected, and a similar process would occur for the four-, five-, and six-element subsets. One could then plot the false match probability as a function of δ/RE for various choices of δ/RE and determine the reduction in false match probability in moving from three to seven elements for testing purposes. Such a calculation may well differ if applied to the full (71,000-bullet) data set.
An alternative, easier to apply but less direct approach is to characterize the variability among the bullets using all seven elements. To avoid the problem of many missing values of elemental concentrations in the 1,837-bullet dataset, we will use the 1,373-bullet subset, for which all 7 elemental calculations exist (after inputing some values for Cd). The variability can then be compared with the variability obtained using all possible three-, four-, five-, and six-element subsets. It is likely that the false match probability will be higher in subsets that
comprise lesser amounts of the total variability and lower in subsets that comprise nearly all of the variability in the data set. Variability can be characterized by using principal components analysis (PCA).
Consider, for example, a PCA using the first three elements (As, Sb, and Sn—elements “123”), which yields 104.564 as the total variation in the data. PCA provides the three linear combinations that decompose this variation of 104.564 into three linear combinations of the three elements in a sequential fashion: the first linear combination explains the most variation (76.892); the second, independent of the first, explains the next-most (19.512), and the third accounts for the remainder (8.16). The total variation in all seven elements is 136.944. Thus, this three-element subset accounts for (104.564/136.944) × 100%, or 76.3% of the total variation. The results of PCA on all 35 3-element subsets are shown Table H.1; they illustrate that subset “237” (Sb, Sn, and Cd) appears to be best for characterizing the total variability in the set, accounting for (114.503/136.944) × 100% = 83.6% of the variability. Subset “137” (As, Sn, and Cd) is almost as good at (113.274/136.944) × 100% = 83.0%.
PCA is then applied to all 35 possible four-element subsets; the one that accounts for the most variation, (131.562/136.944) × 100% = 96.1%, is subset “1237” (As, Sb, Sn, and Cd). Among the five-element subsets, subset “12357” (As, Sn, Sb, Cu, and Cd) explains the greatest proportion of the variance: (134.419/136.944) × 100% = 98.2%, or about 2.1% more than the subset without Cu. The five-element subset containing Bi instead of Cu is nearly as efficient: (133.554/136.944) × 100% = 97.5%. Finally, among the six-element subsets, “123457” (all but Ag) comes very close to explaining the variation using all seven elements: (136.411/136.944) × 100% = 99.6%. Measuring all elements except Bi is nearly as efficient, explaining (134.951/136.944) × 100% = 98.5% of total variation. The values obtained for each three-, four-, five-, six-, and seven-element subset PCA are found in Tables H.1, H.3, H.5, H.7, and H.9 below. The corresponding variances in order of increasing percentages are found in Tables H.2, H.4, H.6, and H.8.
This calculation may not directly relate to results obtained by simulating the false match probability as described above, but it does give some indication of the contribution of the different elements, and the results appear to be consistent with the impressions of the scientists who have been measuring bullets and making comparisons (Ref. 1-3).
TABLE H.1 Principal Components Analysis on All Three-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, and 3 represent the first principal components through third, and the rows show the total variation due to each successive element included in the subset.
|
123 |
124 |
125 |
126 |
127 |
134 |
135 |
136 |
137 |
1 |
76.892 |
26.838 |
27.477 |
26.829 |
28.109 |
73.801 |
73.957 |
73.786 |
74.254 |
2 |
96.404 |
35.383 |
36.032 |
35.373 |
53.809 |
86.312 |
86.730 |
86.294 |
100.820 |
3 |
104.564 |
37.340 |
38.204 |
35.879 |
62.344 |
88.269 |
89.133 |
86.808 |
113.274 |
|
145 |
146 |
147 |
156 |
157 |
167 |
234 |
235 |
236 |
1 |
17.553 |
17.110 |
27.027 |
17.534 |
27.071 |
27.027 |
71.675 |
71.838 |
71.661 |
2 |
20.218 |
19.223 |
44.089 |
19.991 |
44.535 |
44.074 |
87.537 |
88.137 |
87.529 |
3 |
21.909 |
19.584 |
46.049 |
20.448 |
46.914 |
44.589 |
89.498 |
90.362 |
88.037 |
|
237 |
245 |
246 |
247 |
256 |
257 |
267 |
345 |
346 |
1 |
72.186 |
18.941 |
18.335 |
27.146 |
18.938 |
27.216 |
27.146 |
69.371 |
69.243 |
2 |
98.651 |
21.493 |
20.457 |
45.309 |
21.220 |
45.926 |
45.308 |
72.353 |
71.377 |
3 |
114.503 |
23.138 |
20.813 |
47.278 |
21.677 |
48.143 |
45.818 |
74.067 |
71.742 |
|
347 |
356 |
357 |
367 |
456 |
457 |
467 |
567 |
|
1 |
69.771 |
69.357 |
69.891 |
69.758 |
3.272 |
27.030 |
26.998 |
27.030 |
|
2 |
96.234 |
72.149 |
96.367 |
96.221 |
5.039 |
30.136 |
29.156 |
29.929 |
|
3 |
98.208 |
72.606 |
99.072 |
96.747 |
5.382 |
31.847 |
29.522 |
30.387 |
|
TABLE H.2 Total Variance (Compare with 136.944 Total Variance) for Three-Component Subsets, in Order of Increasing Variance.
456 |
146 |
156 |
246 |
256 |
145 |
245 |
467 |
567 |
457 |
5.382 |
19.584 |
20.448 |
20.813 |
21.677 |
21.909 |
23.138 |
29.522 |
30.387 |
31.847 |
126 |
124 |
125 |
167 |
267 |
147 |
157 |
247 |
257 |
127 |
35.879 |
37.340 |
38.204 |
44.589 |
45.818 |
46.049 |
46.914 |
47.278 |
48.143 |
62.344 |
346 |
356 |
345 |
136 |
236 |
134 |
135 |
234 |
235 |
367 |
71.742 |
72.606 |
74.067 |
86.808 |
88.037 |
88.269 |
89.133 |
89.498 |
90.362 |
96.747 |
347 |
357 |
123 |
137 |
237 |
|
|
|
|
|
98.208 |
99.072 |
104.564 |
113.274 |
114.503 |
|
|
|
|
|
TABLE H.3 Principal Components Analysis on All Four-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, and 4 represent the first principal component through fourth, and the rows show the total variation due to each successive element included in the subset.
|
1234 |
1235 |
1236 |
1237 |
1245 |
1246 |
1247 |
1256 |
1257 |
1 |
76.918 |
77.133 |
76.903 |
77.362 |
27.517 |
26.865 |
28.126 |
27.506 |
28.599 |
2 |
96.441 |
97.085 |
96.430 |
103.955 |
36.072 |
35.410 |
53.844 |
36.061 |
54.501 |
3 |
104.603 |
105.249 |
104.590 |
123.430 |
38.556 |
37.516 |
62.380 |
38.279 |
63.047 |
4 |
106.557 |
107.421 |
105.096 |
131.562 |
40.197 |
37.872 |
64.337 |
38.736 |
65.202 |
|
1267 |
1345 |
1346 |
1347 |
1356 |
1357 |
1367 |
1456 |
1457 |
1 |
28.122 |
73.982 |
73.810 |
74.278 |
73.966 |
74.440 |
74.263 |
17.575 |
27.071 |
2 |
53.835 |
86.772 |
86.330 |
100.843 |
86.751 |
101.012 |
100.828 |
20.366 |
44.575 |
3 |
62.371 |
89.436 |
88.440 |
113.309 |
89.208 |
113.752 |
113.291 |
22.099 |
47.221 |
4 |
62.877 |
91.126 |
88.801 |
115.267 |
89.665 |
116.131 |
113.806 |
22.441 |
48.906 |
|
1467 |
1567 |
2345 |
2346 |
2347 |
2356 |
2357 |
2367 |
2456 |
1 |
27.027 |
27.071 |
71.861 |
71.683 |
72.209 |
71.847 |
72.378 |
72.195 |
18.969 |
2 |
44.108 |
44.556 |
88.174 |
87.562 |
98.673 |
88.164 |
98.855 |
98.660 |
21.650 |
3 |
46.221 |
46.989 |
90.710 |
89.674 |
114.534 |
90.437 |
115.149 |
114.526 |
23.328 |
4 |
46.581 |
47.446 |
92.355 |
90.030 |
116.495 |
90.894 |
117.360 |
115.035 |
23.670 |
|
2457 |
2467 |
2567 |
3456 |
3457 |
3467 |
3567 |
4567 |
|
1 |
27.217 |
27.146 |
27.217 |
69.378 |
69.911 |
69.777 |
69.898 |
27.031 |
|
2 |
45.955 |
45.333 |
45.952 |
72.492 |
96.387 |
96.241 |
96.374 |
30.276 |
|
3 |
48.496 |
47.454 |
48.218 |
74.257 |
99.355 |
98.375 |
99.147 |
32.037 |
|
4 |
50.135 |
47.810 |
48.675 |
74.599 |
101.065 |
98.740 |
99.604 |
32.380 |
|
TABLE H.4 Total Variance (Compare with 136.944 Total Variance) for Four-Component Subsets, in Order of Increasing Variance.
1456 |
2456 |
4567 |
1246 |
1256 |
1245 |
1467 |
1567 |
2467 |
2567 |
22.441 |
23.670 |
32.380 |
37.872 |
38.736 |
40.197 |
46.581 |
47.446 |
47.810 |
48.675 |
1457 |
2457 |
1267 |
1247 |
1257 |
3456 |
1346 |
1356 |
2346 |
2356 |
48.906 |
50.135 |
62.877 |
64.337 |
65.202 |
74.599 |
88.801 |
89.665 |
90.030 |
90.894 |
1345 |
2345 |
3467 |
3567 |
3457 |
1236 |
1234 |
1235 |
1367 |
2367 |
91.126 |
92.355 |
98.740 |
99.604 |
101.065 |
105.096 |
106.557 |
107.421 |
113.806 |
115.035 |
1347 |
1357 |
2347 |
2357 |
1237 |
|
|
|
|
|
115.267 |
116.131 |
116.495 |
117.360 |
131.562 |
|
|
|
|
|
TABLE H.5 Principal Components Analysis on All Five-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, 4, and 5 represent the first principal components through fifth, and the rows show the total variation due to each successive element included in the subset.
|
12345 |
12346 |
12347 |
12356 |
12357 |
12367 |
12456 |
12457 |
12467 |
1 |
77160 |
76.930 |
77.388 |
77.144 |
77.608 |
77.373 |
27.547 |
28.624 |
28.140 |
2 |
97.127 |
96.468 |
103.981 |
97.114 |
104.205 |
103.966 |
36.103 |
54.541 |
53.871 |
3 |
105.292 |
104.630 |
123.467 |
105.278 |
124.130 |
123.456 |
38.716 |
63.088 |
62.408 |
4 |
107.775 |
106.733 |
131.600 |
107.496 |
132.265 |
131.588 |
40.387 |
65.560 |
64.514 |
5 |
109.414 |
107.089 |
133.554 |
107.953 |
134.419 |
132.094 |
40.729 |
67.194 |
64.869 |
|
12567 |
13456 |
13457 |
13467 |
13567 |
14567 |
23456 |
23457 |
23467 |
1 |
28.617 |
73.991 |
74.464 |
74.286 |
74.448 |
27.072 |
71.870 |
72.401 |
72.217 |
2 |
54.530 |
86.795 |
101.037 |
100.852 |
101.021 |
44.598 |
88.203 |
98.878 |
98.682 |
3 |
63.076 |
89.584 |
113.794 |
113.328 |
113.773 |
47.372 |
90.867 |
115.186 |
114.559 |
4 |
65.277 |
91.316 |
116.440 |
115.438 |
116.206 |
49.096 |
92.546 |
117.714 |
116.617 |
5 |
65.734 |
91.658 |
118.124 |
115.799 |
116.663 |
49.439 |
92.887 |
119.353 |
117.028 |
|
23567 |
24567 |
34567 |
|
|
|
|
|
|
1 |
72.387 |
27.218 |
69.918 |
|
|
|
|
|
|
2 |
98.864 |
45.984 |
96.394 |
|
|
|
|
|
|
3 |
115.177 |
48.655 |
99.495 |
|
|
|
|
|
|
4 |
117.435 |
50.326 |
101.254 |
|
|
|
|
|
|
5 |
117.892 |
50.667 |
101.597 |
|
|
|
|
|
|
TABLE H.6 Total Variance (Compare with 136.944 Total Variance) for Five-Component Subsets, in Order of Increasing Variance.
12456 |
14567 |
24567 |
12467 |
12567 |
12457 |
13456 |
23456 |
34567 |
12346 |
40.73 |
49.44 |
50.67 |
64.87 |
65.73 |
67.19 |
91.66 |
92.89 |
101.60 |
107.09 |
%29.74 |
36.10 |
37.00 |
47.37 |
48.00 |
49.07 |
66.93 |
67.83 |
74.19 |
78.20 |
12356 |
12345 |
13467 |
12567 |
23467 |
23567 |
13457 |
23457 |
12367 |
12347 |
107.95 |
109.41 |
115.80 |
116.66 |
117.03 |
117.89 |
118.12 |
119.35 |
132.09 |
133.55 |
78.83 |
79.90 |
84.56 |
85.19 |
85.46 |
86.09 |
8 6.26 |
87.15 |
96.46 |
97.53 |
12357 |
|
||||||||
134.42 |
|
||||||||
98.16 |
|
TABLE H.7 Principal Components Analysis on All Six-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, 4, 5, and 6 represent the first principal component through sixth, and the rows show the total variation due to each successive element included in the subset.
|
123456 |
123457 |
123467 |
123567 |
124567 |
134567 |
234567 |
1 |
77.172 |
77.635 |
77.399 |
77.620 |
28.643 |
74.472 |
72.411 |
2 |
97.157 |
104.232 |
103.993 |
104.216 |
54.571 |
101.046 |
98.887 |
3 |
105.322 |
124.172 |
123.494 |
124.159 |
63.118 |
113.817 |
115.215 |
4 |
107.934 |
132.307 |
131.628 |
132.294 |
65.721 |
116.590 |
117.872 |
5 |
109.605 |
134.779 |
133.731 |
134.494 |
67.385 |
118.314 |
119.543 |
6 |
109.946 |
136.411 |
134.087 |
134.951 |
67.726 |
118.656 |
119.885 |
TABLE H.8 Total Variance (Compare with 136.944 Total Variance) for Six-Component Subsets, in Order of Increasing Variance
124567 |
123456 |
134567 |
234567 |
123467 |
123567 |
123457 |
67.726 |
109.946 |
118.656 |
119.885 |
134.087 |
134.951 |
136.411 |
49.45% |
80.28% |
86.65% |
87.54% |
97.91% |
98.54% |
99.61% |
TABLE H.9 Principal Components Analysis on all Seven-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, 4, 5, and 6 represent the first principal component through sixth, and the rows show the total variation due to each successive element included in the subset.
|
1234567 |
1 |
77.64703 |
2 |
104.24395 |
3 |
124.20241 |
4 |
132.33795 |
5 |
134.94053 |
6 |
136.60234 |
7 |
136.94360 |
Summary: |
|
3 elements: 237 |
(83.6% of total variance) |
4 elements: 1237 |
(96.07% of total variance) |
5 elements: 12357 |
(98.16% of total variance) or 12347 (97.52%) |
6 elements: 123567 |
(99.61% of total variance) or 123457 (98.54%) (Bi-Ag correlation) |
7 elements: 1234567 |
(100.00% of total variance) |
REFERENCES
1. Koons, R. D. and Grant, D. M. J. Foren.. Sci. 2002, 47(5), 950.
2. Randich, E.; Duerfeldt, W.; McLendon, W.; and Tobin, W. Foren. Sci. Int. 2002, 127, 174–191.
3. Peele, E. R.; Havekost, D. G.; Peters, C. A.; and Riley, J. P. USDOJ (ISBN 0-932115-12-8), 57, 1991.