H
Principal Components Analysis: How Many Elements Should Be Measured?
The number of elements in bullet lead that have been measured has ranged from three to seven, and sometimes the concentration of a measured element is so small as to be undetectable. The optimal number of elements to measure is unclear. An unambiguous way to determine it is to calculate, using twosample equivalence t tests, the probability of a false match on the 1,837bullet data set as described in Chapter 3. Recall that the equivalence t test requires specification of a value δ/RE where RE = relative error and a value α denoting the expected probability of a false match. Each simulation run would use a different combination of the elements: there are 35 possible subsets of three of the seven elements, 35 possible subsets of four of the seven elements, 21 possible subsets of five of the seven elements, seven possible subsets of six of the seven elements, and one simulation run corresponding to using all seven elements. Among the threeelement subsets, the subset with the lowest false match probability would be selected, and a similar process would occur for the four, five, and sixelement subsets. One could then plot the false match probability as a function of δ/RE for various choices of δ/RE and determine the reduction in false match probability in moving from three to seven elements for testing purposes. Such a calculation may well differ if applied to the full (71,000bullet) data set.
An alternative, easier to apply but less direct approach is to characterize the variability among the bullets using all seven elements. To avoid the problem of many missing values of elemental concentrations in the 1,837bullet dataset, we will use the 1,373bullet subset, for which all 7 elemental calculations exist (after inputing some values for Cd). The variability can then be compared with the variability obtained using all possible three, four, five, and sixelement subsets. It is likely that the false match probability will be higher in subsets that
comprise lesser amounts of the total variability and lower in subsets that comprise nearly all of the variability in the data set. Variability can be characterized by using principal components analysis (PCA).
Consider, for example, a PCA using the first three elements (As, Sb, and Sn—elements “123”), which yields 104.564 as the total variation in the data. PCA provides the three linear combinations that decompose this variation of 104.564 into three linear combinations of the three elements in a sequential fashion: the first linear combination explains the most variation (76.892); the second, independent of the first, explains the nextmost (19.512), and the third accounts for the remainder (8.16). The total variation in all seven elements is 136.944. Thus, this threeelement subset accounts for (104.564/136.944) × 100%, or 76.3% of the total variation. The results of PCA on all 35 3element subsets are shown Table H.1; they illustrate that subset “237” (Sb, Sn, and Cd) appears to be best for characterizing the total variability in the set, accounting for (114.503/136.944) × 100% = 83.6% of the variability. Subset “137” (As, Sn, and Cd) is almost as good at (113.274/136.944) × 100% = 83.0%.
PCA is then applied to all 35 possible fourelement subsets; the one that accounts for the most variation, (131.562/136.944) × 100% = 96.1%, is subset “1237” (As, Sb, Sn, and Cd). Among the fiveelement subsets, subset “12357” (As, Sn, Sb, Cu, and Cd) explains the greatest proportion of the variance: (134.419/136.944) × 100% = 98.2%, or about 2.1% more than the subset without Cu. The fiveelement subset containing Bi instead of Cu is nearly as efficient: (133.554/136.944) × 100% = 97.5%. Finally, among the sixelement subsets, “123457” (all but Ag) comes very close to explaining the variation using all seven elements: (136.411/136.944) × 100% = 99.6%. Measuring all elements except Bi is nearly as efficient, explaining (134.951/136.944) × 100% = 98.5% of total variation. The values obtained for each three, four, five, six, and sevenelement subset PCA are found in Tables H.1, H.3, H.5, H.7, and H.9 below. The corresponding variances in order of increasing percentages are found in Tables H.2, H.4, H.6, and H.8.
This calculation may not directly relate to results obtained by simulating the false match probability as described above, but it does give some indication of the contribution of the different elements, and the results appear to be consistent with the impressions of the scientists who have been measuring bullets and making comparisons (Ref. 13).
TABLE H.1 Principal Components Analysis on All ThreeElement Subsets of 1,373Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, and 3 represent the first principal components through third, and the rows show the total variation due to each successive element included in the subset.

123 
124 
125 
126 
127 
134 
135 
136 
137 
1 
76.892 
26.838 
27.477 
26.829 
28.109 
73.801 
73.957 
73.786 
74.254 
2 
96.404 
35.383 
36.032 
35.373 
53.809 
86.312 
86.730 
86.294 
100.820 
3 
104.564 
37.340 
38.204 
35.879 
62.344 
88.269 
89.133 
86.808 
113.274 

145 
146 
147 
156 
157 
167 
234 
235 
236 
1 
17.553 
17.110 
27.027 
17.534 
27.071 
27.027 
71.675 
71.838 
71.661 
2 
20.218 
19.223 
44.089 
19.991 
44.535 
44.074 
87.537 
88.137 
87.529 
3 
21.909 
19.584 
46.049 
20.448 
46.914 
44.589 
89.498 
90.362 
88.037 

237 
245 
246 
247 
256 
257 
267 
345 
346 
1 
72.186 
18.941 
18.335 
27.146 
18.938 
27.216 
27.146 
69.371 
69.243 
2 
98.651 
21.493 
20.457 
45.309 
21.220 
45.926 
45.308 
72.353 
71.377 
3 
114.503 
23.138 
20.813 
47.278 
21.677 
48.143 
45.818 
74.067 
71.742 

347 
356 
357 
367 
456 
457 
467 
567 

1 
69.771 
69.357 
69.891 
69.758 
3.272 
27.030 
26.998 
27.030 

2 
96.234 
72.149 
96.367 
96.221 
5.039 
30.136 
29.156 
29.929 

3 
98.208 
72.606 
99.072 
96.747 
5.382 
31.847 
29.522 
30.387 

TABLE H.2 Total Variance (Compare with 136.944 Total Variance) for ThreeComponent Subsets, in Order of Increasing Variance.
456 
146 
156 
246 
256 
145 
245 
467 
567 
457 
5.382 
19.584 
20.448 
20.813 
21.677 
21.909 
23.138 
29.522 
30.387 
31.847 
126 
124 
125 
167 
267 
147 
157 
247 
257 
127 
35.879 
37.340 
38.204 
44.589 
45.818 
46.049 
46.914 
47.278 
48.143 
62.344 
346 
356 
345 
136 
236 
134 
135 
234 
235 
367 
71.742 
72.606 
74.067 
86.808 
88.037 
88.269 
89.133 
89.498 
90.362 
96.747 
347 
357 
123 
137 
237 





98.208 
99.072 
104.564 
113.274 
114.503 





TABLE H.3 Principal Components Analysis on All FourElement Subsets of 1,373Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, and 4 represent the first principal component through fourth, and the rows show the total variation due to each successive element included in the subset.

1234 
1235 
1236 
1237 
1245 
1246 
1247 
1256 
1257 
1 
76.918 
77.133 
76.903 
77.362 
27.517 
26.865 
28.126 
27.506 
28.599 
2 
96.441 
97.085 
96.430 
103.955 
36.072 
35.410 
53.844 
36.061 
54.501 
3 
104.603 
105.249 
104.590 
123.430 
38.556 
37.516 
62.380 
38.279 
63.047 
4 
106.557 
107.421 
105.096 
131.562 
40.197 
37.872 
64.337 
38.736 
65.202 

1267 
1345 
1346 
1347 
1356 
1357 
1367 
1456 
1457 
1 
28.122 
73.982 
73.810 
74.278 
73.966 
74.440 
74.263 
17.575 
27.071 
2 
53.835 
86.772 
86.330 
100.843 
86.751 
101.012 
100.828 
20.366 
44.575 
3 
62.371 
89.436 
88.440 
113.309 
89.208 
113.752 
113.291 
22.099 
47.221 
4 
62.877 
91.126 
88.801 
115.267 
89.665 
116.131 
113.806 
22.441 
48.906 

1467 
1567 
2345 
2346 
2347 
2356 
2357 
2367 
2456 
1 
27.027 
27.071 
71.861 
71.683 
72.209 
71.847 
72.378 
72.195 
18.969 
2 
44.108 
44.556 
88.174 
87.562 
98.673 
88.164 
98.855 
98.660 
21.650 
3 
46.221 
46.989 
90.710 
89.674 
114.534 
90.437 
115.149 
114.526 
23.328 
4 
46.581 
47.446 
92.355 
90.030 
116.495 
90.894 
117.360 
115.035 
23.670 

2457 
2467 
2567 
3456 
3457 
3467 
3567 
4567 

1 
27.217 
27.146 
27.217 
69.378 
69.911 
69.777 
69.898 
27.031 

2 
45.955 
45.333 
45.952 
72.492 
96.387 
96.241 
96.374 
30.276 

3 
48.496 
47.454 
48.218 
74.257 
99.355 
98.375 
99.147 
32.037 

4 
50.135 
47.810 
48.675 
74.599 
101.065 
98.740 
99.604 
32.380 

TABLE H.4 Total Variance (Compare with 136.944 Total Variance) for FourComponent Subsets, in Order of Increasing Variance.
1456 
2456 
4567 
1246 
1256 
1245 
1467 
1567 
2467 
2567 
22.441 
23.670 
32.380 
37.872 
38.736 
40.197 
46.581 
47.446 
47.810 
48.675 
1457 
2457 
1267 
1247 
1257 
3456 
1346 
1356 
2346 
2356 
48.906 
50.135 
62.877 
64.337 
65.202 
74.599 
88.801 
89.665 
90.030 
90.894 
1345 
2345 
3467 
3567 
3457 
1236 
1234 
1235 
1367 
2367 
91.126 
92.355 
98.740 
99.604 
101.065 
105.096 
106.557 
107.421 
113.806 
115.035 
1347 
1357 
2347 
2357 
1237 





115.267 
116.131 
116.495 
117.360 
131.562 





TABLE H.5 Principal Components Analysis on All FiveElement Subsets of 1,373Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, 4, and 5 represent the first principal components through fifth, and the rows show the total variation due to each successive element included in the subset.

12345 
12346 
12347 
12356 
12357 
12367 
12456 
12457 
12467 
1 
77160 
76.930 
77.388 
77.144 
77.608 
77.373 
27.547 
28.624 
28.140 
2 
97.127 
96.468 
103.981 
97.114 
104.205 
103.966 
36.103 
54.541 
53.871 
3 
105.292 
104.630 
123.467 
105.278 
124.130 
123.456 
38.716 
63.088 
62.408 
4 
107.775 
106.733 
131.600 
107.496 
132.265 
131.588 
40.387 
65.560 
64.514 
5 
109.414 
107.089 
133.554 
107.953 
134.419 
132.094 
40.729 
67.194 
64.869 

12567 
13456 
13457 
13467 
13567 
14567 
23456 
23457 
23467 
1 
28.617 
73.991 
74.464 
74.286 
74.448 
27.072 
71.870 
72.401 
72.217 
2 
54.530 
86.795 
101.037 
100.852 
101.021 
44.598 
88.203 
98.878 
98.682 
3 
63.076 
89.584 
113.794 
113.328 
113.773 
47.372 
90.867 
115.186 
114.559 
4 
65.277 
91.316 
116.440 
115.438 
116.206 
49.096 
92.546 
117.714 
116.617 
5 
65.734 
91.658 
118.124 
115.799 
116.663 
49.439 
92.887 
119.353 
117.028 

23567 
24567 
34567 






1 
72.387 
27.218 
69.918 






2 
98.864 
45.984 
96.394 






3 
115.177 
48.655 
99.495 






4 
117.435 
50.326 
101.254 






5 
117.892 
50.667 
101.597 






TABLE H.6 Total Variance (Compare with 136.944 Total Variance) for FiveComponent Subsets, in Order of Increasing Variance.
12456 
14567 
24567 
12467 
12567 
12457 
13456 
23456 
34567 
12346 
40.73 
49.44 
50.67 
64.87 
65.73 
67.19 
91.66 
92.89 
101.60 
107.09 
%29.74 
36.10 
37.00 
47.37 
48.00 
49.07 
66.93 
67.83 
74.19 
78.20 
12356 
12345 
13467 
12567 
23467 
23567 
13457 
23457 
12367 
12347 
107.95 
109.41 
115.80 
116.66 
117.03 
117.89 
118.12 
119.35 
132.09 
133.55 
78.83 
79.90 
84.56 
85.19 
85.46 
86.09 
8 6.26 
87.15 
96.46 
97.53 
12357 


134.42 


98.16 

TABLE H.7 Principal Components Analysis on All SixElement Subsets of 1,373Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, 4, 5, and 6 represent the first principal component through sixth, and the rows show the total variation due to each successive element included in the subset.

123456 
123457 
123467 
123567 
124567 
134567 
234567 
1 
77.172 
77.635 
77.399 
77.620 
28.643 
74.472 
72.411 
2 
97.157 
104.232 
103.993 
104.216 
54.571 
101.046 
98.887 
3 
105.322 
124.172 
123.494 
124.159 
63.118 
113.817 
115.215 
4 
107.934 
132.307 
131.628 
132.294 
65.721 
116.590 
117.872 
5 
109.605 
134.779 
133.731 
134.494 
67.385 
118.314 
119.543 
6 
109.946 
136.411 
134.087 
134.951 
67.726 
118.656 
119.885 
TABLE H.8 Total Variance (Compare with 136.944 Total Variance) for SixComponent Subsets, in Order of Increasing Variance
124567 
123456 
134567 
234567 
123467 
123567 
123457 
67.726 
109.946 
118.656 
119.885 
134.087 
134.951 
136.411 
49.45% 
80.28% 
86.65% 
87.54% 
97.91% 
98.54% 
99.61% 
TABLE H.9 Principal Components Analysis on all SevenElement Subsets of 1,373Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, 4, 5, and 6 represent the first principal component through sixth, and the rows show the total variation due to each successive element included in the subset.

1234567 
1 
77.64703 
2 
104.24395 
3 
124.20241 
4 
132.33795 
5 
134.94053 
6 
136.60234 
7 
136.94360 
Summary: 

3 elements: 237 
(83.6% of total variance) 
4 elements: 1237 
(96.07% of total variance) 
5 elements: 12357 
(98.16% of total variance) or 12347 (97.52%) 
6 elements: 123567 
(99.61% of total variance) or 123457 (98.54%) (BiAg correlation) 
7 elements: 1234567 
(100.00% of total variance) 
REFERENCES
1. Koons, R. D. and Grant, D. M. J. Foren.. Sci. 2002, 47(5), 950.
2. Randich, E.; Duerfeldt, W.; McLendon, W.; and Tobin, W. Foren. Sci. Int. 2002, 127, 174–191.
3. Peele, E. R.; Havekost, D. G.; Peters, C. A.; and Riley, J. P. USDOJ (ISBN 0932115128), 57, 1991.