Principal Components Analysis: How Many Elements Should Be Measured?

The number of elements in bullet lead that have been measured has ranged from three to seven, and sometimes the concentration of a measured element is so small as to be undetectable. The optimal number of elements to measure is unclear. An unambiguous way to determine it is to calculate, using two-sample equivalence *t* tests, the probability of a false match on the 1,837-bullet data set as described in Chapter 3. Recall that the equivalence *t* test requires specification of a value δ/*RE* where *RE* = relative error and a value α denoting the expected probability of a false match. Each simulation run would use a different combination of the elements: there are 35 possible subsets of three of the seven elements, 35 possible subsets of four of the seven elements, 21 possible subsets of five of the seven elements, seven possible subsets of six of the seven elements, and one simulation run corresponding to using all seven elements. Among the three-element subsets, the subset with the lowest false match probability would be selected, and a similar process would occur for the four-, five-, and six-element subsets. One could then plot the false match probability as a function of δ/*RE* for various choices of δ/*RE* and determine the reduction in false match probability in moving from three to seven elements for testing purposes. Such a calculation may well differ if applied to the full (71,000-bullet) data set.

An alternative, easier to apply but less direct approach is to characterize the variability among the bullets using all seven elements. To avoid the problem of many missing values of elemental concentrations in the 1,837-bullet dataset, we will use the 1,373-bullet subset, for which all 7 elemental calculations exist (after inputing some values for Cd). The variability can then be compared with the variability obtained using all possible three-, four-, five-, and six-element subsets. It is likely that the false match probability will be higher in subsets that

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 157

Forensic Analysis Weighing Bullet Lead Evidence
H
Principal Components Analysis: How Many Elements Should Be Measured?
The number of elements in bullet lead that have been measured has ranged from three to seven, and sometimes the concentration of a measured element is so small as to be undetectable. The optimal number of elements to measure is unclear. An unambiguous way to determine it is to calculate, using two-sample equivalence t tests, the probability of a false match on the 1,837-bullet data set as described in Chapter 3. Recall that the equivalence t test requires specification of a value δ/RE where RE = relative error and a value α denoting the expected probability of a false match. Each simulation run would use a different combination of the elements: there are 35 possible subsets of three of the seven elements, 35 possible subsets of four of the seven elements, 21 possible subsets of five of the seven elements, seven possible subsets of six of the seven elements, and one simulation run corresponding to using all seven elements. Among the three-element subsets, the subset with the lowest false match probability would be selected, and a similar process would occur for the four-, five-, and six-element subsets. One could then plot the false match probability as a function of δ/RE for various choices of δ/RE and determine the reduction in false match probability in moving from three to seven elements for testing purposes. Such a calculation may well differ if applied to the full (71,000-bullet) data set.
An alternative, easier to apply but less direct approach is to characterize the variability among the bullets using all seven elements. To avoid the problem of many missing values of elemental concentrations in the 1,837-bullet dataset, we will use the 1,373-bullet subset, for which all 7 elemental calculations exist (after inputing some values for Cd). The variability can then be compared with the variability obtained using all possible three-, four-, five-, and six-element subsets. It is likely that the false match probability will be higher in subsets that

OCR for page 157

Forensic Analysis Weighing Bullet Lead Evidence
comprise lesser amounts of the total variability and lower in subsets that comprise nearly all of the variability in the data set. Variability can be characterized by using principal components analysis (PCA).
Consider, for example, a PCA using the first three elements (As, Sb, and Sn—elements “123”), which yields 104.564 as the total variation in the data. PCA provides the three linear combinations that decompose this variation of 104.564 into three linear combinations of the three elements in a sequential fashion: the first linear combination explains the most variation (76.892); the second, independent of the first, explains the next-most (19.512), and the third accounts for the remainder (8.16). The total variation in all seven elements is 136.944. Thus, this three-element subset accounts for (104.564/136.944) × 100%, or 76.3% of the total variation. The results of PCA on all 35 3-element subsets are shown Table H.1; they illustrate that subset “237” (Sb, Sn, and Cd) appears to be best for characterizing the total variability in the set, accounting for (114.503/136.944) × 100% = 83.6% of the variability. Subset “137” (As, Sn, and Cd) is almost as good at (113.274/136.944) × 100% = 83.0%.
PCA is then applied to all 35 possible four-element subsets; the one that accounts for the most variation, (131.562/136.944) × 100% = 96.1%, is subset “1237” (As, Sb, Sn, and Cd). Among the five-element subsets, subset “12357” (As, Sn, Sb, Cu, and Cd) explains the greatest proportion of the variance: (134.419/136.944) × 100% = 98.2%, or about 2.1% more than the subset without Cu. The five-element subset containing Bi instead of Cu is nearly as efficient: (133.554/136.944) × 100% = 97.5%. Finally, among the six-element subsets, “123457” (all but Ag) comes very close to explaining the variation using all seven elements: (136.411/136.944) × 100% = 99.6%. Measuring all elements except Bi is nearly as efficient, explaining (134.951/136.944) × 100% = 98.5% of total variation. The values obtained for each three-, four-, five-, six-, and seven-element subset PCA are found in Tables H.1, H.3, H.5, H.7, and H.9 below. The corresponding variances in order of increasing percentages are found in Tables H.2, H.4, H.6, and H.8.
This calculation may not directly relate to results obtained by simulating the false match probability as described above, but it does give some indication of the contribution of the different elements, and the results appear to be consistent with the impressions of the scientists who have been measuring bullets and making comparisons (Ref. 1-3).

OCR for page 157

Forensic Analysis Weighing Bullet Lead Evidence
TABLE H.1 Principal Components Analysis on All Three-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, and 3 represent the first principal components through third, and the rows show the total variation due to each successive element included in the subset.
123
124
125
126
127
134
135
136
137
1
76.892
26.838
27.477
26.829
28.109
73.801
73.957
73.786
74.254
2
96.404
35.383
36.032
35.373
53.809
86.312
86.730
86.294
100.820
3
104.564
37.340
38.204
35.879
62.344
88.269
89.133
86.808
113.274
145
146
147
156
157
167
234
235
236
1
17.553
17.110
27.027
17.534
27.071
27.027
71.675
71.838
71.661
2
20.218
19.223
44.089
19.991
44.535
44.074
87.537
88.137
87.529
3
21.909
19.584
46.049
20.448
46.914
44.589
89.498
90.362
88.037
237
245
246
247
256
257
267
345
346
1
72.186
18.941
18.335
27.146
18.938
27.216
27.146
69.371
69.243
2
98.651
21.493
20.457
45.309
21.220
45.926
45.308
72.353
71.377
3
114.503
23.138
20.813
47.278
21.677
48.143
45.818
74.067
71.742
347
356
357
367
456
457
467
567
1
69.771
69.357
69.891
69.758
3.272
27.030
26.998
27.030
2
96.234
72.149
96.367
96.221
5.039
30.136
29.156
29.929
3
98.208
72.606
99.072
96.747
5.382
31.847
29.522
30.387
TABLE H.2 Total Variance (Compare with 136.944 Total Variance) for Three-Component Subsets, in Order of Increasing Variance.
456
146
156
246
256
145
245
467
567
457
5.382
19.584
20.448
20.813
21.677
21.909
23.138
29.522
30.387
31.847
126
124
125
167
267
147
157
247
257
127
35.879
37.340
38.204
44.589
45.818
46.049
46.914
47.278
48.143
62.344
346
356
345
136
236
134
135
234
235
367
71.742
72.606
74.067
86.808
88.037
88.269
89.133
89.498
90.362
96.747
347
357
123
137
237
98.208
99.072
104.564
113.274
114.503

OCR for page 157

Forensic Analysis Weighing Bullet Lead Evidence
TABLE H.3 Principal Components Analysis on All Four-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, and 4 represent the first principal component through fourth, and the rows show the total variation due to each successive element included in the subset.
1234
1235
1236
1237
1245
1246
1247
1256
1257
1
76.918
77.133
76.903
77.362
27.517
26.865
28.126
27.506
28.599
2
96.441
97.085
96.430
103.955
36.072
35.410
53.844
36.061
54.501
3
104.603
105.249
104.590
123.430
38.556
37.516
62.380
38.279
63.047
4
106.557
107.421
105.096
131.562
40.197
37.872
64.337
38.736
65.202
1267
1345
1346
1347
1356
1357
1367
1456
1457
1
28.122
73.982
73.810
74.278
73.966
74.440
74.263
17.575
27.071
2
53.835
86.772
86.330
100.843
86.751
101.012
100.828
20.366
44.575
3
62.371
89.436
88.440
113.309
89.208
113.752
113.291
22.099
47.221
4
62.877
91.126
88.801
115.267
89.665
116.131
113.806
22.441
48.906
1467
1567
2345
2346
2347
2356
2357
2367
2456
1
27.027
27.071
71.861
71.683
72.209
71.847
72.378
72.195
18.969
2
44.108
44.556
88.174
87.562
98.673
88.164
98.855
98.660
21.650
3
46.221
46.989
90.710
89.674
114.534
90.437
115.149
114.526
23.328
4
46.581
47.446
92.355
90.030
116.495
90.894
117.360
115.035
23.670
2457
2467
2567
3456
3457
3467
3567
4567
1
27.217
27.146
27.217
69.378
69.911
69.777
69.898
27.031
2
45.955
45.333
45.952
72.492
96.387
96.241
96.374
30.276
3
48.496
47.454
48.218
74.257
99.355
98.375
99.147
32.037
4
50.135
47.810
48.675
74.599
101.065
98.740
99.604
32.380
TABLE H.4 Total Variance (Compare with 136.944 Total Variance) for Four-Component Subsets, in Order of Increasing Variance.
1456
2456
4567
1246
1256
1245
1467
1567
2467
2567
22.441
23.670
32.380
37.872
38.736
40.197
46.581
47.446
47.810
48.675
1457
2457
1267
1247
1257
3456
1346
1356
2346
2356
48.906
50.135
62.877
64.337
65.202
74.599
88.801
89.665
90.030
90.894
1345
2345
3467
3567
3457
1236
1234
1235
1367
2367
91.126
92.355
98.740
99.604
101.065
105.096
106.557
107.421
113.806
115.035
1347
1357
2347
2357
1237
115.267
116.131
116.495
117.360
131.562

OCR for page 157

Forensic Analysis Weighing Bullet Lead Evidence
TABLE H.5 Principal Components Analysis on All Five-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, 4, and 5 represent the first principal components through fifth, and the rows show the total variation due to each successive element included in the subset.
12345
12346
12347
12356
12357
12367
12456
12457
12467
1
77160
76.930
77.388
77.144
77.608
77.373
27.547
28.624
28.140
2
97.127
96.468
103.981
97.114
104.205
103.966
36.103
54.541
53.871
3
105.292
104.630
123.467
105.278
124.130
123.456
38.716
63.088
62.408
4
107.775
106.733
131.600
107.496
132.265
131.588
40.387
65.560
64.514
5
109.414
107.089
133.554
107.953
134.419
132.094
40.729
67.194
64.869
12567
13456
13457
13467
13567
14567
23456
23457
23467
1
28.617
73.991
74.464
74.286
74.448
27.072
71.870
72.401
72.217
2
54.530
86.795
101.037
100.852
101.021
44.598
88.203
98.878
98.682
3
63.076
89.584
113.794
113.328
113.773
47.372
90.867
115.186
114.559
4
65.277
91.316
116.440
115.438
116.206
49.096
92.546
117.714
116.617
5
65.734
91.658
118.124
115.799
116.663
49.439
92.887
119.353
117.028
23567
24567
34567
1
72.387
27.218
69.918
2
98.864
45.984
96.394
3
115.177
48.655
99.495
4
117.435
50.326
101.254
5
117.892
50.667
101.597
TABLE H.6 Total Variance (Compare with 136.944 Total Variance) for Five-Component Subsets, in Order of Increasing Variance.
12456
14567
24567
12467
12567
12457
13456
23456
34567
12346
40.73
49.44
50.67
64.87
65.73
67.19
91.66
92.89
101.60
107.09
%29.74
36.10
37.00
47.37
48.00
49.07
66.93
67.83
74.19
78.20
12356
12345
13467
12567
23467
23567
13457
23457
12367
12347
107.95
109.41
115.80
116.66
117.03
117.89
118.12
119.35
132.09
133.55
78.83
79.90
84.56
85.19
85.46
86.09
8 6.26
87.15
96.46
97.53
12357
134.42
98.16

OCR for page 157

Forensic Analysis Weighing Bullet Lead Evidence
TABLE H.7 Principal Components Analysis on All Six-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, 4, 5, and 6 represent the first principal component through sixth, and the rows show the total variation due to each successive element included in the subset.
123456
123457
123467
123567
124567
134567
234567
1
77.172
77.635
77.399
77.620
28.643
74.472
72.411
2
97.157
104.232
103.993
104.216
54.571
101.046
98.887
3
105.322
124.172
123.494
124.159
63.118
113.817
115.215
4
107.934
132.307
131.628
132.294
65.721
116.590
117.872
5
109.605
134.779
133.731
134.494
67.385
118.314
119.543
6
109.946
136.411
134.087
134.951
67.726
118.656
119.885
TABLE H.8 Total Variance (Compare with 136.944 Total Variance) for Six-Component Subsets, in Order of Increasing Variance
124567
123456
134567
234567
123467
123567
123457
67.726
109.946
118.656
119.885
134.087
134.951
136.411
49.45%
80.28%
86.65%
87.54%
97.91%
98.54%
99.61%
TABLE H.9 Principal Components Analysis on all Seven-Element Subsets of 1,373-Bullet Subset. Elements 1, 2, 3, 4, 5, 6, and 7 are As, Sb, Sn, Cu, Bi, Ag, and Cd, respectively. Row labels 1, 2, 3, 4, 5, and 6 represent the first principal component through sixth, and the rows show the total variation due to each successive element included in the subset.
1234567
1
77.64703
2
104.24395
3
124.20241
4
132.33795
5
134.94053
6
136.60234
7
136.94360
Summary:
3 elements: 237
(83.6% of total variance)
4 elements: 1237
(96.07% of total variance)
5 elements: 12357
(98.16% of total variance) or 12347 (97.52%)
6 elements: 123567
(99.61% of total variance) or 123457 (98.54%) (Bi-Ag correlation)
7 elements: 1234567
(100.00% of total variance)
REFERENCES
1. Koons, R. D. and Grant, D. M. J. Foren.. Sci. 2002, 47(5), 950.
2. Randich, E.; Duerfeldt, W.; McLendon, W.; and Tobin, W. Foren. Sci. Int. 2002, 127, 174–191.
3. Peele, E. R.; Havekost, D. G.; Peters, C. A.; and Riley, J. P. USDOJ (ISBN 0-932115-12-8), 57, 1991.