Skip to main content

Currently Skimming:

5 Statistical Issues
Pages 125-165

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 125...
... Although the formulae might provide good estimates of the match probability for the average member of the population, they might not be appropriate for a member of an unusual iSome references for general background that are pertinent to this chapter or parts of it are Aldous (1989) , Finkelstein and Levin (1990)
From page 126...
... DATA SOURCES A simple random sample of a given size from a population is one chosen so that each possible sample has an equal chance of being selected. Ideally, the reference data set from which genotype frequencies are calculated would be a simple random sample or a stratified or otherwise scientifically structured random sample from the relevant population.
From page 127...
... . MATCH PROBABILITY AND LIKELIHOOD RATIO Suppose that a DNA sample from a crime scene and one from a suspect are compared, and the two profiles match at every locus tested.
From page 128...
... If Under some circumstances, such as if the match window is small, the probability of a match between two samples from the same person might be less than 1. In principle, this could change the likelihood ratio (Kaye l995b)
From page 129...
... This procedure recommended in the 1992 report would calculate the match probability as 2(plp2 + pips + PIP4 + P2P3 + P2P4 + P3P4~' that is, the probability that a randomly selected person would have two alleles from the set of possibilities CAD, A2, As, All. As above, the reciprocal of this probability can be interpreted as a likelihood ratio.
From page 130...
... (19961. Bayes's Theorem The likelihood ratio and the match probability, being reciprocals, contain the same information.
From page 131...
... are 1:2, and the LR is 10,000, the posterior odds are 5,000:1. Many statisticians and forensic scientists prefer to use the likelihood ratio rather than the match probability (Berry 1991a; Berry et al.
From page 132...
... A better procedure, used by some laboratories, is to use an empirically determined prior probability or to give several posterior probabilities corresponding to a range of prior probabilities. With the high LRs typically found when DNA markers are used (and the putative father has not been excluded)
From page 133...
... SUSPECT IDENTIFIED BY A DNA DATABASE SEARCH Thus far, we have assumed that the suspect was identified by evidence other than DNA, such as testimony of an eyewitness or circumstantial evidence. In that case, the DNA is tested and the match probability or likelihood ratio is computed for the event that a person selected at random from some population
From page 134...
... A correction to account for the database search can be made in computing the match probability. Let Mi denote the event that the i-th DNA profile in the database matches the evidence sample.
From page 135...
... . Equation 5.3 motivates the simple rule sometimes suggested by forensic scientists: multiply the match probability by the size of the database searched (or that part of the database that is relevant for example, males in a search for a match to a semen sample)
From page 136...
... The match probability computed in forensic analysis refers to a particular evidentiary profile. That profile might be said to be unique if it is so rare that it becomes unreasonable to suppose that a second person in the population might have the same profile.
From page 137...
... . The sharper approximate upper bound derived in Appendix SC is shown in Figure 5.1.
From page 138...
... In some cases, the population of interest might be limited to a particular geographic area. For an area with a population of one million, six loci with geometric-mean homozygosities of 0.05 would yield an approximate upper bound of 0.008.
From page 139...
... One would seek the relative likelihoods that, with a suitable measure of distance, the bands would be as similar to one another as was observed. At present, however, most presentations of DNA evidence use some form of grouping of alleles.
From page 140...
... The match window should not be set so small that true matches are missed. At the same time, the window should not be so wide that bands that are clearly different will be declared to match.
From page 141...
... The possibility of coincidental matches for all bands in a multilocus analysis is extremely remote, as the very small match probabilities associated with such a profile indicate (see Chapter 4 for details'. The size of the match window should be defined in the laboratory protocol, and not vary from case to case.
From page 142...
... As was stated in Chapter 3, if for any reason the analyst by visual inspection overrides the conclusion from the measurements, that should be clearly stated and reasons given. Binning Once a match has been declared and confirmed by measurement, it is necessary to estimate the probability of a match on the assumption that the suspect sample and the evidence sample are not from the same source in order to calculate the match probability or likelihood ratio.
From page 143...
... To calculate an upper bound, an analyst must add the frequencies of all fixed bins overlapped by the match window, as recommended by the 1992 NRC report. Thus, fixed bins, when used with the
From page 144...
... To approximate the floating-bin match probability, we recommend using the fixed bin with the largest frequency among those overlapped by the match window. That approach is based on the observations that both boating and fixed bins are about 10% wide and that bands generally do not cluster around fixedbin boundaries (Budowle, Giusti er al.
From page 146...
... The product form of the relation suggests that it is most convenient to find a confidence interval for the natural logarithm of the probability and then transform it back to the probability, as is often done in data analysis (see Sokal and llohlf 1981~. The contribution to the match probability of a single homozygous locus is pit (or 2pi if a conservative estimate is desired)
From page 147...
... We can also write confidence intervals for values calculated with Equations 4.10.7 7If Equations 4.4 or 4.10 are used to evaluate match probabilities, a prescription for calculating confidence intervals can be similarly derived, although the detailed formulae will be somewhat different. Since knowledge of the range of reasonable values of ~ is obtained from an accumulating body of population-genetics studies, one might give a range of confidence intervals based on a range of values of 0.
From page 148...
... In actual populations, we expect Dij to be positive unless it is very close to zero. INDIVIDUAL VARIABILITY AND EMPIRICAL COMPARISONS Confidence intervals derived from the simplifying assumptions of sampling theory do not take account of all possible sources of uncertainty that can affect the accuracy of a match probability or likelihood ratio.
From page 149...
... Assume that the source of evidence DNA from a particular crime in Georgia is known to be black. To make the most appropriate estimate of the probability that a profile from a randomly selected black person from this area would match the evidence profile, we would use the Georgia database.
From page 150...
... Therefore, we believe that the uncertainties caused by deviations from HW and LE expectations are much less than those caused by differences in allele frequencies in different subgroups. The FBI compendia (FBI 1993b; Budowle, Monson, Giusti, and Brown 1994a, 1994b)
From page 151...
... That suggests a conservative procedure that can be used if it is not known whether the perpetrator is black or white: a match probability could be calculated from both databases and the higher of the two values used. If only one database is used, it might be the wrong one, and the result might be misleading.
From page 152...
... ~ , ~ ' 1 ~1 1 1 1 / .' ~ /"', Air.,' ~ I I I i I I I I I l o~110-2 10 ~10 ~10 ~10-1° 10-12 U.S. FIGURE 5.4 A scatter plot for the white population.
From page 153...
... Several other studies have used deliberately wrong or artificially stratified databases and showed that such manipulations do not produce grossly wrong results (Evett and Pinchin 1991; Berry, Evett, and Pinchin 19921. As mentioned earlier, in the data compiled by TWGDAM there was only one four-locus match in the white population and one in the Hispanic population among 58 million pairwise comparisons.
From page 154...
... If we assume that the evidence DNA is from a white person, and if we falsely assume that the pooled mixture of whites and blacks is in HW and LE proportions, then the graph shows the range of error that would exist if the pooled database were used instead of the more appropriate database for whites. In the top graph in the figure, a point that is below and to the right of the diagonal line overestimates the true probability
From page 155...
... X 1o6 104 10 10 10 White 1o - 1o 1o~12 FIGURE 5.6 A scatter plot comparing the white population (abscissa) with an equal mixture of whites and blacks (ordinate)
From page 156...
... We further note that when fixed bins are used in the manner recommended in the section in this chapter on statistical aspects of VNTR analysis, the procedure is usually conservative, and that applying the 2p rule increases the conservatism of the method. Finally, using Equations 4.10 with fixed bins adds to the conservatism.
From page 157...
... The 1992 report recommended that until the ceiling principle could be put into effect, the interim ceiling principle be applied. In contrast to the ceiling principle, the interim ceiling principle has been widely used and sometimes misused.
From page 158...
... The current methods employed by forensic scientists have been demonstrated to be robust scientifically" (p 899~. If the interim ceiling principle is used despite that recommendation, TWGDAM recommends an approach intended to overcome some of the criticisms of the 1992 NRC report.
From page 159...
... Furthermore, the purpose of the ceiling principles is to allow for differences in allele frequencies in different subgroups, for which HW and LE are insensitive measures. Therefore, we believe that all loci in the selected databases should be used in the calculation.
From page 160...
... However, such a method is inappropriate, and we do not recommend it. It gives an approximate upper bound to the mean value, but not conditioned on the particular profile in question; it does not answer the question we are most interested in, the probability of a match of the particular evidence and suspect profile.
From page 161...
... The methods generally used, however, involve taking measurement uncertainty into account by determining a match window. Two procedures for determining match probabilities are the floating-bin and fixed-bin methods.
From page 162...
... If fixed bins are employed, then the fixed bin that has the largest frequency among those overlapped by the match window should be used. Ceiling Principles The abundance of data in different ethnic groups within the major races and the genetically and statistically sound methods recommended in this report imply that both the ceiling principle and the interim ceiling principle are unnecessary.
From page 163...
... In the simplest case, in which the database contains N persons, all with the same ethnic background, the effect is just to multiply an individual match probability by N leading to Equation 5.3.
From page 164...
... ~. Since the posterior odds equals the prior odds times the likelihood ratio, the likelihood ratio is 1/P(M)
From page 165...
... If the homozygosity of some loci is moderate or high, as for some PCR loci, the following refinement of our approximate upper bound can be useful because it shows that a smaller number of loci may yield uniqueness at each given probability level. In the above derivation, instead of dropping IiPi4, note from Jensen's inequality (see, for example, James and James 1959)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.