Read "Forensic Analysis: Weighing Bullet Lead Evidence" at NAP.edu

Page 133 Cite

Suggested Citation:"Appendix E: Basic Principles of Statistics." National Research Council. 2004. Forensic Analysis: Weighing Bullet Lead Evidence. Washington, DC: The National Academies Press. doi: 10.17226/10924.

×

E
Basic Principles of Statistics¹

All measurements are subject to error. Analytical chemical measurements often have the property that the error is proportional to the value. Denote the i^th measurement on bullet k as X_ik (we will consider only one element in this discussion and hence drop the subscript j utilized in Chapter 3). Let denote the mean of all measurements that could ever be taken on this bullet, and let denote the error associated with this measurement. A typical model for analytical measurement error might be

Likewise, for a given PS bullet measurement, Y_ik, with mean and error in measurement η_ik,

Notice that if we take logarithms of each equation, these equations become additive rather than multiplicative in the error term:

Models with additive rather than multiplicative error are the basis for most statistical procedures. In addition, as discussed below, the logarithmic transformation yields more normally distributed data as well as transformed measure-

¹	Note that the notation used in this Appendix differs from that used in the body of the report.

Page 134 Cite

Suggested Citation:"Appendix E: Basic Principles of Statistics." National Research Council. 2004. Forensic Analysis: Weighing Bullet Lead Evidence. Washington, DC: The National Academies Press. doi: 10.17226/10924.

×

ments with constant variance. That is, an estimate of log(µ_xk) is the logarithm of the sample average of the three measurements on bullet k, and a plot of these log(averages) shows more normally distributed values than a plot of the averages alone. We denote the variances of and as and and the variances of the error terms and as and respectively. It is likely that the between-bullet variation is the same for the populations of both the CS and the PS bullets; therefore, since should be the same as we will denote the between-bullet variances as Similarly, if the measurements on both the CS and PS bullets were taken at the same time, their errors should also have the same variances; we will denote this within-bullet variance as or σ² when we are concentrating on just the within-bullet (measurement) variability.

Thus, for three reasons—the nature of the error in chemical measurements, the approximate normality of the distributions, and the more constant variance (that is, the variance is not a function of the magnitude of the measurement itself)—logarithmic transformation of the measurements is advisable. In what follows, we will assume that x_i denotes the logarithm of the i^th measurement on a given CS bullet and one particular element, µ_x denotes the mean of these log(measurement) values, and ε_i denotes the error in this i^th measurement. Similarly, let y_i denote the logarithm of the i^th measurement on a given PS bullet and the same element, µ_y denote the mean of these log(measurement) values, and η_i denote the error in this i^th measurement.

NORMAL (GAUSSIAN) MODEL FOR MEASUREMENT ERROR

All measurements are subject to measurement error:

Ideally, ε_i and π_i are small, but in all instances they are unknown from measured replicate to replicate. If the measurement technique is unbiased, we expect the mean of the measurement errors to be zero. Let and denote the measurement errors’ variances. Because µ_x and µ_y are assumed to be constant, and hence have variance 0, and The distribution of measurement errors is often (not always) assumed to be normal (Gaussian). That assumption is often the basis of a convenient model for the measurements and implies that

(E.1)

if µ_x and σ_x are known (and likewise for y_i, using µ_y and σ_y). (The value 1.96 is often conveniently rounded to 2.) Moreover, will also be normally

Page 135 Cite

Suggested Citation:"Appendix E: Basic Principles of Statistics." National Research Council. 2004. Forensic Analysis: Weighing Bullet Lead Evidence. Washington, DC: The National Academies Press. doi: 10.17226/10924.

×

distributed, also with mean µ_x but with a smaller variance, therefore

Referring to Part (b) of the Federal Bureau of Investigation (FBI) protocol for forming “compositional groups” (see Chapter 3), its calculation of the standard deviation of the group is actually a standard deviation of averages of three measurements, or an estimate of in our notation, not of σ_x. In practice, however, µ_x and σ_x are unknown, and interest centers not on an individual x_i but rather on µ_x, the mean of the distribution of the measured replicates. If we estimate µ_x and σ_x using and s_x from only three replicates as in the current FBI procedure but still assume that the measurement error is normally distributed, then a 95 percent confidence interval for the true{µ_x} can be derived from Equation E.1 by rearranging the inequalities using the correct multiplier, not from the Gaussian distribution (that is, not 1.96 in Equation E.1) but rather from Student’s t distribution, and the correct standard deviation instead of s_x:

Use of the multiplier 2 instead of 2.484 yields a confidence coefficient of 0.926, not 0.95.

CLASSICAL HYPOTHESIS-TESTING: TWO-SAMPLE t STATISTIC

The present situation involves the comparison between the sample means and from two bullets. Classical hypothesis-testing states the null and alternative hypotheses as (reversed from our situation), and states that the two samples of observations (here, x₁, x₂, x₃ and y₁, y₂, y₃) are normally distributed as and Under those conditions, and s_p are highly efficient estimates of µ_x, µ_y, and σ, respectively, where s_p is a pooled estimate of the standard deviation that is based on both samples:

(E.2)

Evidence in favor of H₁:µ_x ≠ µ_y occurs when and are “far apart.” Formally, “far apart” is determined when the so-called two-sample t statistic (which, under H₀, has a central Student’s t distribution on n_x + n_y − 2 = 3 + 3 − 2 = 4 degrees of freedom) exceeds a critical point from this Student’s t₄ distribution. To ensure a false null hypothesis rejection probability of no more than 100α% where α is the

Page 136 Cite

Suggested Citation:"Appendix E: Basic Principles of Statistics." National Research Council. 2004. Forensic Analysis: Weighing Bullet Lead Evidence. Washington, DC: The National Academies Press. doi: 10.17226/10924.

×

probability of rejecting H₀ when it is correct (that is, claiming “different” when the means are equal), we reject H₀ in favor of H₁ if

(E.3)

where t_nx ₊_ny _{− 2,α/2} is the value beyond which only 100 · α/2% of the Student’s t distribution (on n_x + n_y − 2 degrees of freedom) lies.

When Equation E.3 reduces to:

(E.4)

This procedure for testing H₀ versus H₁ has the following property: among all possible tests of H₀ whose false rejection probability does not exceed α, this two-sample Student’s t test has the maximum probability of rejecting H₀ when H₁ is true (that is, has the highest power to detect when µ_x and µ_y are unequal). If the two-sample t statistic is less than this critical value (2.776 for α = 0.05), the interpretation is that the data do not support the hypothesis of different means. A larger critical value would reject the null hypothesis (“same means”) less often.

The FBI protocol effectively uses s_x + s_y in the denominator instead of and uses a “critical value” of 2 instead of 2.776. Simulation suggests that the distribution of the ratio (s_x + s_y)/s_p has a mean of 1.334 (10%, 25%, 75%, and 90% quantiles are 1.198, 1.288, 1.403, and 1.413, respectively). Substituting suggests that the approximate error in rejecting H₀ when it is true for the FBI statistic, would also be 0.05 if it used a “critical point” of Replacing 1.334 with the quantiles 1.198, 1.288, 1.403, and 1.413 yields values of 1.892, 1.760, 1.616, and 1.604, respectively—all smaller than the FBI value of 2. The FBI value of 2 would correspond to an approximate error of 0.03. A larger critical value (smaller error) leads to fewer rejections of the null hypothesis, that is, more likely to claim “equality” and less likely to claim “different” when the means are the same.

If the null hypothesis is H₀:µ_x − µ_y = δ(δ ≠ 0), the two-sample t statistic in Equation E.4 has a noncentral t distribution with noncentrality parameter (δ/σ)(n_xn_y)/(n_x + n_y), which reduces to (δ/σ)(n/2) when n_x = n_y = n. When the null hypothesis is the distribution of the pooled two-sided two-sample t statistic (Equation E.4) has a noncentral F distribution with 1 and n_x + n_{y − 2} = 2(n − 1) degrees of freedom and noncentrality parameter

Page 137 Cite

Suggested Citation:"Appendix E: Basic Principles of Statistics." National Research Council. 2004. Forensic Analysis: Weighing Bullet Lead Evidence. Washington, DC: The National Academies Press. doi: 10.17226/10924.

×

The use of Student’s t statistic is valid (that is, the probability of falsely rejecting H₀ when the means µ_x and µ_y are truly equal is α) only when the x’s and y’s are normally distributed. The appropriate critical value (here, 2.776 for α = 0.05 and δ = 0) is different if the distributions are not normal, or if σ_x ≠ σ_y, or if H₀: | µ_x − µ_y | ≥ δ ≠ 0, or if (s_x + s_y)/2 is used instead of s_p (Equation E.2), as is used currently in the FBI’s statistical method. It also has the highest power (highest probability of claiming H₁ when in fact µ_x ≠ µ_y, subject to the condition that the probability of erroneously rejecting H₀ is no more than α.

The assumption “σ_x = σ_y” is probably reasonably valid if the measurement process is consistent from bullet sample to bullet sample: one would expect the error in measuring the concentration of a particular element for the crime scene (CS) bullet (σ_x) to be the same as that in measuring the concentration of the same element in the potential suspect (PS) bullet (σ_y). However, the normality assumption may be questionable here; as noted by (Ref. 1), average concentrations for different bullets tend to be lognormally distributed. That means that log(As average) is approximately normal as it is for all six other elements. When the measurement uncertainty is very small (say, σ_x < 0.2), the lognormal distribution differs little from the normal distribution (Ref. 2), so these assumptions will be reasonably well satisfied for precise measurement processes. Only a few of the standard deviations in the datasets were greater than 0.2 (see the section titled “Description of Data Sets” in Chapter 3).

The case of CABL differs from the classical situation primarily in the reversal of the null and alternative hypotheses of interest. That is, the null hypothesis here is H₀:µ_x ≠ µ_y vs H₁:µ_x = µ_y. We accommodate the difference by stating a specific relative difference between µ_x and µ_y, |µ_x − µ_y|, and rely on the noncentral F distribution as mentioned above.

EQUIVALENCE t TESTS²

An equivalence t test is designed to handle our situation:

H₀: means are different.

H₁: means are similar.

Those hypotheses are quantified more precisely as

We must choose a value of δ that adequately reflects the condition that “two bullets came from the same compositionally indistinguishable volume of mate-

²	Note that the form of this test is referred to as successive t-test statistics in Chapter 3. In that description, the setting of error rates is not prescribed.

Page 138 Cite

Suggested Citation:"Appendix E: Basic Principles of Statistics." National Research Council. 2004. Forensic Analysis: Weighing Bullet Lead Evidence. Washington, DC: The National Academies Press. doi: 10.17226/10924.

×

rial (CIVL), subject to specification limits on the element given by the manufacturer.” For example, if the manufacturer claims that the Sb concentrations in a given lot of material are 5% ± 0.20%, a value of δ = 0.20 might be deemed reasonable. The test statistic is still the two-sample t as before, but now we reject H₀ if and are too close. As before, we ensure that the false match probability cannot exceed a particular value by choosing a critical value so that the probability of falsely rejecting H₀ (falsely claiming a “match”) is no greater than α (here, we will choose α = 1/2,500 = 0.0004 for example. The equivalence test has the property that, subject to false match probability ≤ α = 0.0004, the probability of correctly rejecting H₀ (that is, claiming that two bullets match when the means of the batches from which the bullets came are less than δ), is maximized. The left panel of Figure E.1 shows a graph of the distribution of the difference under the null hypothesis that δ/σ = 0.25 (that is, either µ_x − µ_y = −0.25σ, or µ_x − µ_y = +0.25σ) and n = 100 fragment averages in each sample, subject to false match probability ≤ 0.05: the equivalence test in this case rejects H₀ when The right panel of Figure E.1 shows the power of this test: when δ equals zero, the probability of correctly rejecting the null hypothesis (“means differ by more than 0.25”) is about 0.60, whereas the probability of rejecting the null hypothesis when δ = 0.25 is only 0.05 (as it should be, given the specifications of the test). Figure E.1 is based on the information given in Wellek (Ref. 3); similar figures apply for the case when α = 0.0004, n = 3 measurements in each sample, and δ/σ = 1 or 2.

DIGRESSION: LOGNORMAL DISTRIBUTIONS

This section explains two benefits of transforming measurements via logarithms for the statistical analysis.

The standard deviations of measurements made with inductively coupled plasma-optical emission spectroscopy are generally proportional to their means; hence, one typically refers to relative error, or coefficient of variation, sometimes expressed as a percentage, When the measurements are transformed first via logarithms, the standard deviation of the log(measurements) is approximately, and conveniently, equal to the coefficient of variation (COV), sometimes called relative error (RE), in the original scale. This can be seen easily through standard propagation-of-error formulas (Ref. 4, 5), which rely on a first-order Taylor series expansion for the transformation (here, the natural logarithm) about the mean in the original scale—

Page 139 Cite

Suggested Citation:"Appendix E: Basic Principles of Statistics." National Research Council. 2004. Forensic Analysis: Weighing Bullet Lead Evidence. Washington, DC: The National Academies Press. doi: 10.17226/10924.

×

FIGURE E.1 The left panel shows a picture of the distribution of the difference under the null hypothesis that δ/σ = 0.25 and n = 100 fragment averages in each sample, subject to false match probability ≤ 0.05 : the equivalence test in this case rejects H₀ when The right panel shows the power of this test: when δ equals zero, the probability of correctly rejecting the null hypothesis is about 0.60, whereas the probability of rejecting the null hypothesis when δ = 0.25 is only 0.05. Figure is based on information given in Wellek (Ref. 3).

Page 140 Cite

Suggested Citation:"Appendix E: Basic Principles of Statistics." National Research Council. 2004. Forensic Analysis: Weighing Bullet Lead Evidence. Washington, DC: The National Academies Press. doi: 10.17226/10924.

×

—because the variance of a constant (such as µ_x) is zero. Letting f(X) = log(X), and f′(µ_x) = 1/µ_x, it follows that

Moreover, the distribution of the logarithms for each element tends to be more normal than that of the raw data. Thus, to obtain more-normally distributed data and as a by-product a simple calculation of the COV, the data should first be transformed via logarithms. Approximate confidence intervals are calculated in the log scale and then can be transformed back to the original scale via the antilogarithm,

DIGRESSION: ESTIMATING σ² WITH POOLED VARIANCES

The FBI protocol for statistical analysis estimates the variances of the triplicate measurements in each bullet with only three observations, which leads to highly variable estimates—a range of a factor of 10, 20, or even more Assuming that the measurement variation is the same for both the PS and CS bullets, the classical two-sample t statistic pools the variances into (Equation E.2), which has four degrees of freedom and is thus more stable than either individual s_x or s_y alone (each based on only two degrees of freedom). The pooled variance need not rely on only the six observations from the two samples if the within-replicate variance is the same for several bullets. Certainly, that condition is likely to hold if bullets are analyzed with a consistent measurement process. If three measurements are used to calculate each within-replicate standard deviation from each of, say, B bullets, a better, more stable estimate of σ² is

Such an estimate of σ² is now based on not just 2(2) = 4 degrees of freedom, but rather 2B degrees of freedom. A stable and repeatable measurement process offers many estimates of σ² from many bullets analyzed by the laboratory over several years. The within-replicate variances may be used in the above equation. To verify the stability of the measurement process, standard deviations should be plotted in a control-chart format (s-chart) (Ref. 7) with limits that, if exceeded, indicate a change in precision. Standard deviations that fall within the limits should be pooled as in Equation E.3. Using pooled standard deviations guards against the possibility of claiming a match simply because the measurement variability on a particular day happened to be large by chance, creating wider intervals and hence greater chances of overlap.

To determine whether a given standard deviation, say, s_g, might be larger than the s_p determined from measurements on B previous bullets, one can com-

Page 141 Cite

Suggested Citation:"Appendix E: Basic Principles of Statistics." National Research Council. 2004. Forensic Analysis: Weighing Bullet Lead Evidence. Washington, DC: The National Academies Press. doi: 10.17226/10924.

×

pare the ratio with an F distribution on 2 and 2B degrees of freedom. Assuming that the FBI has as many as 500 estimates, the 5% critical point from an F distribution on two and 1,000 degrees of freedom is 3.005. Thus, if a given standard deviation is times larger than the pooled standard deviation for that element, one should consider remeasuring that element, in that the precision may be larger than expected by chance alone (5% of the time).

REFERENCES

1. Carriquiry, A.; Daniels, M.; and Stern, H. “Statistical Treatment of Case Evidence: Analysis of Bullet Lead”, Unpublished report, 2002.

2. Antle, C.E. “Lognormal distribution” in Encyclopedia of Statistical Sciences, Vol 5, Kotz, S.; Johnson, N. L.; and Read, C. B., Eds.; Wiley: New York, NY, 1985, pp. 134–136.

3. Wellek, S. Testing Statistical Hypotheses of Equivalence Chapman and Hall: New York, NY 2003.

4. Ku, H.H. Notes on the use of propagation of error formulas, Journal of Research of the National Bureau of Standards-C. Engineering and Instrumentation, 70C(4), 263–273. Reprinted in Precision Measurement and Calibration: Selected NBS Papers on Statistical Concepts and Procedures, NBS Special Publication 300, Vol. 1, H.H. Ku, Ed., 1969, 331–341.

5. Cameron, J.E. “Error analysis” in Encyclopedia of Statistical Sciences, Vol 2, Kotz, S.; Johnson, N. L.; and Read, C. B., Eds., Wiley: New York, NY, 1982, pp. 545–541.

6. Mood, A.; Graybill, F.; and Boes, D. Introduction to the Theory of Statistics, Third Edition McGraw-Hill: New York, NY, 1974.

7. Vardeman, S. B. and Jobe, J. M. Statistical Quality Assurance Methods for Engineers, Wiley: New York, NY, 1999.