Page 125

Statistical Issues

In Chapter 4, we presented ways to estimate the frequencies of genotypes and profiles in the population. In this chapter, we consider how to interpret frequencies as probabilities and likelihood ratios and how to make adjustments when a suspect is found through a database search. We also discuss the degree of uncertainty of such estimates according to statistical theory and empirical tests that use different databases. Finally, we ask how many loci would be needed to establish a profile as unique. The chapter includes a discussion of the statistics of matching and binning of VNTRs.1

Two major issues regarding uncertainty must be addressed in the statistical evaluation of DNA evidence. One is associated with the characteristics of a database, such as its size and whether it is representative of the appropriate population. The other might be called the subpopulation problem. In the first instance, inferences based on values in a database might be uncertain because the database is not compiled from a sample of the most relevant population or the sample is not representative. If the database is small, the values derived from it can be uncertain even if it is compiled from a scientifically drawn sample; this can be addressed by providing confidence intervals on the estimates. The second issue, the subpopulation problem, is broader than the first. Although the formulae might provide good estimates of the match probability for the average member of the population, they might not be appropriate for a member of an unusual

^{1} Some references for general background that are pertinent to this chapter or parts of it are Aldous (1989), Finkelstein and Levin (1990), Aitken and Stoney (1991), Aitken (1995).

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 125

Page 125
5 Statistical Issues In Chapter 4, we presented ways to estimate the frequencies of genotypes and profiles in the population. In this chapter, we consider how to interpret frequencies as probabilities and likelihood ratios and how to make adjustments when a suspect is found through a database search. We also discuss the degree of uncertainty of such estimates according to statistical theory and empirical tests that use different databases. Finally, we ask how many loci would be needed to establish a profile as unique. The chapter includes a discussion of the statistics of matching and binning of VNTRs.1
Two major issues regarding uncertainty must be addressed in the statistical evaluation of DNA evidence. One is associated with the characteristics of a database, such as its size and whether it is representative of the appropriate population. The other might be called the subpopulation problem. In the first instance, inferences based on values in a database might be uncertain because the database is not compiled from a sample of the most relevant population or the sample is not representative. If the database is small, the values derived from it can be uncertain even if it is compiled from a scientifically drawn sample; this can be addressed by providing confidence intervals on the estimates. The second issue, the subpopulation problem, is broader than the first. Although the formulae might provide good estimates of the match probability for the average member of the population, they might not be appropriate for a member of an unusual
1 Some references for general background that are pertinent to this chapter or parts of it are Aldous (1989), Finkelstein and Levin (1990), Aitken and Stoney (1991), Aitken (1995).

OCR for page 125

Page 126
subgroup. Our approach is empirical: we compare different subpopulations and also, to mimic a worst case scenario, perform sample calculations deliberately using an inappropriate database.
Data Sources A simple random sample of a given size from a population is one chosen so that each possible sample has an equal chance of being selected. Ideally, the reference data set from which genotype frequencies are calculated would be a simple random sample or a stratified or otherwise scientifically structured random sample from the relevant population. Several conditions make the actual situation less than ideal. One is a lack of agreement as to what the relevant population is (should it be the whole population or only young males? should it be local or national?) and the consequent need to consider several possibilities. A second is that we are forced to rely on convenience samples, chosen not at random but because of availability or cost. It is difficult, expensive, and impractical to arrange a statistically valid random-sampling scheme. The saving point is that the features in which we are interested are believed theoretically and found empirically to be essentially uncorrelated with the means by which samples are chosen. Comparison of estimated profile frequencies from different data sets shows relative insensitivity to the source of the data, as we document later in the chapter. Furthermore, the VNTRs and STRs used in forensic analysis are usually not associated with any known function and therefore should not be correlated with occupation or behavior. So those convenience samples are effectively random.
The convenience samples from which the databases are derived come from various sources. Some data come from blood banks. Some come from genetic-counseling and disease-screening centers. Others come from mothers and putative fathers in paternity tests. The data summarized in FBI (1993b), which we have used in previous chapters and will again in this chapter, are from a variety of sources around the world, from blood banks, paternity-testing centers, molecular-biology and human-genetics laboratories, hospitals and clinics, law-enforcement officers, and criminal records.
As mentioned previously, most markers used for DNA analysis, VNTRs and STRs in particular, are from regions of DNA that have no known function. They are not related in any obvious way to gene-determined traits2, and there is no reason to suspect that persons who contribute to blood banks or who have been
2 Some loci used in PCR-based typing are associated with genes. It is important to determine if a particular forensic allele is associated with a disease state and hence subject to selection. A forensic marker might happen to be closely linked to an important gene, such as one causing some observable trait, and could conceivably be in strong linkage disequilibrium. As the number of mapped genes increases, this will become increasingly common. But for that to affect the reliability of a database, the trait would have to appear disproportionately in the populations that contribute to the database.

OCR for page 125

Page 127
involved in paternity suits or criminal proceedings differ from a random sample of the population with respect to DNA markers. In addition, there is empirical evidence to the contrary: If we compare samples chosen in different ways, the results from calculations made from the different databases are quite similar.
Although most of the data that we are concerned with are from persons in the United States, there are increasing numbers from elsewhere in the world, and these can be used for comparison. The 1993 FBI compendium includes samples from whites in the United States (Arizona, California, Florida, Georgia, Illinois, Kentucky, Maryland, Michigan, Minnesota, Missouri, Nevada, North Carolina, Oregon, South Carolina, Vermont, Virginia, and Washington), France, Israel, Spain, Switzerland, Denmark, England, Germany, Finland, Italy, and Tasmania. Data on blacks come from the United States (California, Florida, Georgia, Kentucky, Maryland, Michigan, Minnesota, Nevada, North Carolina, Oregon, South Carolina, Virginia, and Washington), Haiti, Togo, and England. Data on Hispanics come from several states in the United States. The FBI places data from eastern and western US Hispanics into separate databases because of the somewhat different origins of these populations.
American Indians present a special difficulty because they have more population subdivision, as demonstrated by higher values of (see Chapter 4), than populations of whites, blacks, or Hispanics. The data are increasing rapidly, and substantial numbers are available from Arizona, Minnesota, North Carolina, Oregon, Ontario, and Saskatchewan, as well as from particular tribes (Sioux, Navaho).
Match Probability and Likelihood Ratio Suppose that a DNA sample from a crime scene and one from a suspect are compared, and the two profiles match at every locus tested. Either the suspect left the DNA or someone else did. We want to evaluate the probability of finding this profile in the "someone else" case. That person is assumed to be a random member of the population of possible suspects. So we calculate the frequency of the profile in the most relevant population or populations. The frequency can be called the random-match probability, and it can be regarded as an estimate of the answer to the question: What is the probability that a person other than the suspect, randomly selected from the population, will have this profile? The smaller that probability, the greater the likelihood that the two DNA samples came from the same person. Alternatively stated, if the probability is very small, we can say that either the two samples came from the same person or a very unlikely coincidence has occurred. (As in Chapter 4, the calculations in this chapter assume that no error has occurred in the determination of the DNA profiles.)
An alternative is to calculate the likelihood ratio (LR), a measure of the strength of the evidence regarding the hypothesis that the two profiles came from

OCR for page 125

Page 128
the same source. Suppose we find that the profiles of the person contributing the evidence DNA (E) and of the suspect (S) are both x. We consider two hypotheses: (1) the source of the evidence and the suspect are the same person, (2) the source of the evidence is a randomly selected person unrelated to the suspect. Although there are other possible hypotheses, it is usually sufficient to consider only these two. The likelihood ratio is the probability under hypothesis (1) that the suspect profile and the evidence-sample profile will both be x, divided by the corresponding probability under hypothesis (2). The greater the likelihood ratio, the stronger is the evidence in favor of the hypothesis corresponding to the numerator, that the source of the evidence-sample DNA and the suspect are the same person.
To write that symbolically, we let Pr1 and Pr2 indicate probabilities calculated under hypotheses 1 and 2. The LR for this simple comparison is
(5.1a)
Using a vertical line to indicate conditioning (statements to the left of the vertical line are conditional on statements to the right; for example, is the probability that the evidence sample will have profile x given that the suspect has profile x), we note that
We can then rewrite Equation 5.1a in two algebraically equivalent forms:
(5.1b)
(5.1c)
Unless an error has occurred in the DNA typing, the numerator in Equations 5.1b and 5.1c will always equal one if the profiles match. Suppose that the population frequency of x is P(x), and assume that the persons who contributed E and S, if different, are unrelated (and E and S are therefore statistically independent). Then the denominator is P(x), so
(5.2)
Therefore, in the usual case, the likelihood ratio is the reciprocal of the probability of a random match.3
The likelihood ratio, then, is a way of summarizing the DNA evidence. If
3 Under some circumstances, such as if the match window is small, the probability of a match between two samples from the same person might be less than 1. In principle, this could change the likelihood ratio (Kaye 1995b); in practice, the possible error is minuscule in comparison with uncertainties in the denominator. The effect of the size of the match window on the probability of a false negative is discussed later in the chapter.

OCR for page 125

Page 129
the LR is 1,000, the probability that the profiles are the same is 1,000 times as great if the samples came from the same person as it is if they came from different persons.
In the situation described above and reflected in the notation E and S, we imagine an evidence sample left at the crime scene by the putative perpetrator and a suspect with a matching profile. Although that is conceptually the simplest scenario and is used throughout this report for illustrative purposes, the mathematical formalism is valid more generally. For example, if a suspect is apprehended with blood on his clothes, this blood is the evidence sample to be matched against the genotypic profile of the victim. The LR given above is still valid, although for the most direct interpretation one would use Equation 5.1b, with S denoting the profile of the victim and E the evidence-sample profile.
Mixed Samples Mixed samples are sometimes found in crime situations—for instance, blood from two or more persons at the scene of a crime, victim and assailant samples on a vaginal swab, and material from multiple sexual assailants. In many cases, one of the contributors—for example, the victim—is known, and the genetic profile of the unknown is readily inferred. In some cases, it might be possible to distinguish the genetic profiles of the contributors to a mixture from differences in intensities of bands in an RFLP pattern or dots in a dot-blot typing; in either case, the analysis is similar to the unmixed case. However, when the contributors to a mixture are not known or cannot otherwise be distinguished, a likelihood-ratio approach offers a clear advantage and is particularly suitable.
Consider a simple case of a VNTR analysis in which, for a particular locus, there are four bands in the lane, known to be contributed by two persons. If the alleles from the two persons are known and correspond to the set of four in the lane, there is usually no problem of interpretation, since two of the bands will match one suspect and the other two bands will match the other. However, two of the bands might match the alleles of only one suspect, and the source of the other two might be unknown. The 1992 report (NRC 1992, p 59) says: "If a suspect's pattern is found within the mixed pattern, the appropriate frequency to assign such a 'match' is the sum of the frequencies of all genotypes that are contained within (i.e., that are a subset of) the mixed pattern." Suppose the four bands correspond to alleles (bins) A1, A2, A3, and A4, whose frequencies are p1. p2, p3, and p4. This procedure recommended in the 1992 report would calculate the match probability as
that is, the probability that a randomly selected person would have two alleles from the set of possibilities (A1, A2, A3, A4). As above, the reciprocal of this probability can be interpreted as a likelihood ratio.

OCR for page 125

Page 130
That calculation is hard to justify, because it does not make use of some of the information available, namely, the genotype of the suspect. The correct procedure, we believe, was described by Evett et al. (1991). Suppose that the suspect's genotype is A1A2. The hypothesis we wish to test is that the samples came from the suspect and one other person. The probability under this hypothesis of finding the profile shown by the evidence sample is 2p3p4, because under this hypothesis it is certain that two of the bands are A1 and A2. If the samples came from two randomly selected persons, the probability of any particular pair of profiles, such as A1A3 and A2A4, is (2p1p3)(2p2p4) = 4P1P2P3P4. There are six possible pairs of two-band profiles corresponding to the four bands, so the total probability is 24p1p2p3p4. The likelihood ratio, analogous to Equations 5.1, is
This LR, compared with that derived from the recommendation of the 1992 NRC report, is larger when the suspect bands are relatively rare and smaller when the suspect bands are relatively common. The reason is that we have taken account of the information in the genotype of the suspect rather than averaging over the set of possible genotypes consistent with the four-band evidence-sample profile.
There might be fewer than four bands, or multiple suspects might be identified. These and other, more complex cases can be analyzed in a similar manner (Evett et al. 1991). Some cases are treated in Appendix 5A and summarized in Table 5.1.
We have considered only simple cases. With VNTRs, it is possible, though very unlikely, that the four bands were contributed by more than two persons, who either were homozygous or shared rare alleles. With multiple loci, it will usually be evident if the sample was contributed by more than two persons. Calculations taking those possibilities into account could be made if there were reason to believe that more than two persons contributed to the sample.
Mixed samples are often difficult to analyze in systems where several loci are analyzed at once. Mixed samples can also lead to more complicated calculations with DQA, where some alleles are inferred by subtraction. (For example, there is no specific probe for the allele 1.2; the presence or absence of this allele is inferred from the reaction of DNA probes with the product of the combination 1.2, 1.3, and 4, but not with the products of 1.3 and 4 individually.)
The problem is complex, and some forensic experts follow the practice of making several reasonable assumptions and then using the calculation that is most conservative. For a fuller treatment of mixed samples, see Weir et al. ( 1996).
Bayes's Theorem The likelihood ratio and the match probability, being reciprocals, contain the same information. The LR, however, has a property that makes it especially

OCR for page 125

Page 131
useful, provided that prior odds are available on the hypothesis that the two DNA profiles have the same source. (Prior odds are the odds that the two DNA samples came from the same person on the basis of information other than the DNA. Posterior odds are the odds when the DNA information is included in the analysis.) That property can be stated this way:
The posterior odds are the prior odds multiplied by LR.4
In everyday words: Whatever are the odds that the two samples came from the same person in the absence of DNA evidence, the odds when the DNA evidence is included are LR times as great. That statement is an instance of Bayes's theorem.
For example, if there is reason to think that the prior odds that two DNA samples came from the same person (however this is determined) are 1:2, and the LR is 10,000, the posterior odds are 5,000:1. Many statisticians and forensic scientists prefer to use the likelihood ratio rather than the match probability (Berry 1991a; Berry et al. 1992; Evett et al. 1992; Balding and Nichols 1994; Collins and Morton 1994) because it admits an inferential interpretation that the simple match probability does not. Odds can be converted into a probability by the relation Prob = Odds/(Odds + 1), or Odds = Prob/(1-Prob). Thus, a likelihood ratio, which is not a probability, can be used to obtain a probability.
Paternity testing
The relation between posterior and prior odds is routinely used in paternity analysis (Walker 1983; AABB 1994). If the putative father is not excluded by blood-group, enzyme, and DNA evidence, a ''paternity index" is calculated. The paternity index PI is a likelihood ratio—the probability of the mother-child-father profile combination if the putative father is the true father divided by the probability of this combination if a randomly selected man is the father. Customarily, the calculations make use of a database or databases appropriate to the race(s) of the persons involved.
If the prior odds are 1:1—that is, if the putative father is assumed to be equally likely to be and not to be the true father—the posterior odds are the same as the likelihood ratio; but for other prior odds, that is not the case. Suppose that PI is calculated to be 1,000. If the prior odds (from evidence other than that from DNA) in favor of the putative father's being the true father are judged to be 10:1, the posterior odds are 10 times PI, or 10,000:1. If the prior odds of his being the father are 1:10, the posterior odds are 0.1 times PI, or 100:1.
In routine parentage testing, probabilities are used instead of odds. As men-
4 This supposes that two simple hypotheses are being compared. When more complicated hypotheses are being compared (when the alternative hypothesis consists of different possibilities with different a priori probabilities), a Bayes factor, essentially a weighted LR, plays the role of the LR (Kass and Raftery 1995).

OCR for page 125

Page 132
tioned earlier, odds are converted into a probability by the relation Prob = Odds/ (Odds + 1). In this example, the posterior probabilities that the putative father is the true father are 10,000/10,001 = 0.9999 and 100/101 = 0.9901, for prior probabilities 10/11 (odds 10:1) and 1/11 (odds 1:10).
If the prior odds are assumed to be 1:1 (making the prior probability 1/2), the posterior probability is simply PI/(PI + 1). Thus, a paternity index of 1,000 corresponds to a posterior probability of 1,000/1,001, or 0.9990. This posterior probability is routinely called the "probability of paternity." We emphasize that it is a true probability of paternity only if the prior probability is 1/2, an assumption that should be clearly stated. It is sometimes justified on the grounds that it gives equal weight to the two parties, the mother and the putative father, in a paternity dispute, although, naturally, this justification has been criticized. A better procedure, used by some laboratories, is to use an empirically determined prior probability or to give several posterior probabilities corresponding to a range of prior probabilities.
With the high LRs typically found when DNA markers are used (and the putative father has not been excluded), a wide range of prior probabilities makes little difference. In our example, where the paternity index is 1,000, the posterior probabilities for the three prior probabilities, 10/11, 1/2, and 1/11, are 0.9999, 0.9990, and 0.9901. The high LR has made a 100-fold difference in prior odds largely irrelevant.
Bayes's Theorem in Criminal Cases What we would like to know and could most easily interpret is the probability that the suspect contributed the DNA in the evidence sample. To find that probability, we need to use a prior probability and Bayes's theorem. Despite the regular use of Bayes's theorem in genetic counseling and in paternity testing, it has been only rarely used in criminal cases in the United States. The main difficulty is probably an unwillingness of the courts to ask juries to assign odds on the basis of non-DNA evidence. It is difficult even for experts to express complex nonscientific evidence in terms of quantitative odds, and some commentators have regarded assigning prior odds to the probability that the evidence and suspect DNA came from the same person as a violation of the presumption of innocence (see Chapter 6). In many cases, however, the prior odds, within a wide range, are not important to a decision. With a four- or five-locus match, whether the prior odds are 1:20 or 20:1 will usually have no important effect on the posterior probability; if the LR is 100 million, multiplying it by 20 or 1/20 is not likely to change the conclusion. The procedure of presenting posterior probabilities for a range of assumed prior probabilities has found favor among some legal scholars. Various approaches for use in the courts are discussed in Chapter 6.
There are two additional reasons for presenting posterior probabilities corresponding to a range of priors. First, a prior probability that might be used with

OCR for page 125

Page 133
Bayes's theorem would properly be assessed by jurors, not an expert witness or an officer of the court. A prior probability might reflect subjective assessments of the evidence presented. Such assessments would presumably be done separately by each juror in light of that juror's experience. Second, there is no logical reason that non-DNA evidence has to be presented first. It might be confusing for a juror to hear prior odds assigned by one expert, then hear a likelihood ratio from that expert or another, followed by more non-DNA evidence. It might not be feasible to present the information to a jury in the order most easily incorporated into a Bayesian probability. For all those reasons, we believe it best, if Bayes's theorem is used, to present posterior probabilities (or odds) for a range of priors.
Two Fallacies Two widely recognized fallacies should be avoided (Thompson and Schumann 1987; Balding and Donnelly 1994b). The "prosecutor's fallacy"—also called the fallacy of the transposed conditional—is to confuse two conditional probabilities. Let P equal the probability of a match, given the evidence genotype. The fallacy is to say that P is also the probability that the DNA at the crime scene came from someone other than the defendant. An LR of 1,000 says that the match is 1,000 times as probable if the evidence and the suspect samples that share the same profile are from the same person as it is if the samples are from different persons. It does not say that the odds that the suspect contributed the evidence DNA are 1,000:1. To obtain such a probability requires using Bayes's theorem and a prior probability that is assumed or estimated on the basis of non-DNA evidence. As stated earlier, only if that prior probability is 1/2 will the posterior odds equal the LR.
The "defendant's fallacy" is to assume that in a given population, anyone with the same profile as the evidence sample is as likely to have left the sample as the suspect. For example, if 100 persons in a metropolitan area are expected to have the same DNA profile as the evidence sample, it is a fallacy to conclude that the probability that the suspect contributed the sample is only 0.01. The suspect was originally identified by other evidence, and such evidence is very unlikely to exist for the 99 other persons expected to have the same profile. Only if the suspect was found through a search of a DNA database might this kind of reasoning apply, and then only with respect to other contributors to the database, as we now discuss.
Suspect Identified by a DNA Database Search Thus far, we have assumed that the suspect was identified by evidence other than DNA, such as testimony of an eyewitness or circumstantial evidence. In that case, the DNA is tested and the match probability or likelihood ratio is computed for the event that a person selected at random from some population

OCR for page 125

Page 134
will have the genotypic profile of the evidence sample. There is an important difference between that situation and one in which the suspect is initially identified by searching a database to find a DNA profile matching that left at a crime scene. In the latter case, the calculation of a match probability or LR should take into account the search process.
As the number and size of DNA databanks increase, the identification of suspects by this means will become more common. Already, more than 20 suspects have been identified by searches through databases maintained by various states. The number and sizes of these databases are sure to increase.
To see the logical difference between the two situations described above, observe that if we toss 20 reputedly unbiased coins once each, there is roughly one chance in a million that all 20 will show heads. According to standard statistical logic, the occurrence of this highly unlikely event would be regarded as evidence discrediting the hypothesis that the coins are unbiased. But if we repeat this experiment of 20 tosses a large enough number of times, there will be a high probability that all 20 coins will show heads in at least one experiment. In that case, an event of 20 heads would not be unusual and would not in itself be judged as evidence that the coins are biased. The initial identification of a suspect through a search of a DNA database is analogous to performing the coin-toss experiment many times: A match by chance alone is more likely the larger the number of profiles examined.
There are different ways to take the search process into account. The 1992 NRC report recommends that the markers used to evaluate a match probability be different from those used to identify a suspect initially. In that case, the database search is much like identifying the suspect from non-DNA evidence, and the methods of Chapter 4 apply. However, the procedure might be difficult to implement. To avoid identifying several suspects who must then be investigated, one might need to use a large number of markers in the database search. Then, according to that procedure, those markers could not also be used in further analysis. If the amount of DNA in the evidence sample is too small, following the recommendation in the 1992 report could leave too few additional loci for computing a match probability or LR.
A correction to account for the database search can be made in computing the match probability. Let Mi denote the event that the i-th DNA profile in the database matches the evidence sample. To decide if the database search itself has contributed to obtaining a match (much as the repeated experiments might be held responsible for producing the 20 heads in the example given above), an event of interest is M, that at least one of the database profiles matches the evidence sample. Suppose that we hypothesize that the evidence sample was not left by someone whose DNA profile is in the database (or a close relative of such a person) and find that under this hypothesis P(M) is small. The usual statistical logic then leads to rejection of that hypothesis in favor of the alternative

OCR for page 125

Page 135
that (one of) the matching profile(s) in the database comes from the person who left the evidence sample.
Under the hypothesis that the person leaving the evidence sample is not represented in the database of N persons, a simple upper bound on the probability of M is given by
(5.3)
The equality in Equation 5.3 holds if the database is homogeneous, that is, if P(Mi) is the same for all profiles in the database (see Appendix 5B).
Equation 5.3 motivates the simple rule sometimes suggested by forensic scientists: multiply the match probability by the size of the database searched (or that part of the database that is relevant—for example, males in a search for a match to a semen sample). Suppose that P(Mi) = 10-6 and N = 1,000. Then P(M) £ 0.001.
In a computerized database search, the computer output ordinarily lists all profiles in the database that match the evidence-sample profile. It is also possible to search profiles one by one in one or more databases until one or more matches are obtained. If that procedure is followed, the appropriate database for computing the match probability is the complete set of profiles that are actually compared with the evidence sample. Other situations might not be so simple.5
Very Small Probabilities Some commentators have stated that very small probabilities are suspect because they are outside the range of previous experience. They argue that a probability of, say, one in 10 billion is not to be trusted simply because it is so small. However, it is not the magnitude of the number that is at issue but rather the reliability of the assumptions on which the calculation is based. The relevant issues are the reliability of the database and the appropriateness of the population genetics model, and these are the same for large as well as small probabilities.
5 If all potential suspects in a specific crime are asked to submit to DNA profiling, the situation is similar to the database searches described in the text, but it is more difficult to interpret. If all possible suspects are indeed tested (although it would be difficult or even impossible to show that this is the case), a match probability can be computed with the procedures in this chapter. However, the time and expense involved in DNA profiling may lead those doing the testing to terminate the search as soon as a match is obtained. Although the probability of obtaining a random match within the group tested is no different in this case than it would be if the database had been assembled before the suspect(s) had been identified, the obvious motivation of a perpetrator to avoid or delay testing weakens the statistical logic that a small match probability is evidence in favor of the hypothesis that the person who left the evidence sample and the person providing the matching sample are the same. If such a procedure for generating a DNA database is used, testing should continue through the whole database.

OCR for page 125

Page 155
Figure 5.6 A scatter plot comparing the white population (abscissa)with an equal mixture of whites and blacks (ordinate). In the upper graph, the HW rule was used; in the lower, Equations 4.10 with . The dashed lines represent deviations by a factor of 5 and the dotted lines by a factor of 15. Data from Lifecodes (Roeder et al. submitted).
of a match (that is, it errs in favor of the defendant). The majority of the points in the regions of higher probabilities are above the 45° line; that is, they are biased against the defendant. That is the effect that was of concern in the 1992 NRC report, but with respect to ethnic differences within racial groups rather

OCR for page 125

Page 156
than between racial groups. We have chosen an extreme example for illustration. Even so, all the points are within 15-fold of the 45° diagonal line in the graph.
The bottom graph in Figure 5.6 shows the effect of using Equations 4.10 for the mixed population rather than using the HW formula, Equation 4.1. The value of was taken to be 0.01, which is the value estimated from this data set (Roeder et al. 1995). It is clear from the graph that using Equations 4.10 usually leads to a conservative estimate, except for the higher probabilities shown in the lower left part of the graph. That makes sense, for it is clear from Equations 4.10 that when p is large, has little effect on profile-frequency estimates.
We conclude that, even if an artificial, intentionally inappropriate database of mixed profiles from whites and blacks is used, Equations 4.10 are conservative. We further note that when fixed bins are used in the manner recommended in the section in this chapter on statistical aspects of VNTR analysis, the procedure is usually conservative, and that applying the 2p rule increases the conservatism of the method. Finally, using Equations 4.10 with fixed bins adds to the conservatism.
To summarize: Within a racial group, geographic origin and ethnic composition have very little effect on the frequencies of forensic DNA profiles, although there are larger differences between major groups (races). It is probably safe to assume that within a race, the uncertainty of a value calculated from adequate databases (at least several hundred persons) by the product rule is within a factor of about 10 above and below the true value. If the calculated profile probability is very small, the uncertainty can be larger, but even a large relative error will not change the conclusion. If there is good reason to think that the suspect and the source of the evidence are from the same subpopulation, Equations 4.10 can be used.
The Ceiling Principles The 1992 NRC report assumed that population substructure might exist and recommended procedures for calculating profile frequencies that could be expected to be sufficiently conservative to accommodate the presence of substructure. Two such procedures are recommended in the 1992 report, the ''ceiling principle" and the "interim ceiling principle."
The ceiling principle (NRC 1992, p 82-85) places a lower limit on the size of the profile frequency by giving thresholds for the allele-frequency values used in the calculation. To determine the thresholds, the report recommended that 100 persons be sampled from each of 15-20 genetically homogenous populations spanning the racial and ethnic diversity of groups represented in the United States. For each allele the highest value among the groups sampled, or 5%, whichever was larger, would be used. Then the product rule would be applied to those values to determine the profile frequency. The choice and sampling of the 15-20 populations was to be supervised by the NCFDT (see Chapter 3), which has not come into being. The necessary ground work for applying the

OCR for page 125

Page 157
ceiling principle has not been done, and there have been few attempts to apply it. We share the view of many experts who have criticized it on practical and statistical grounds and who see no scientific justification for its use.
The 1992 report recommended that until the ceiling principle could be put into effect, the interim ceiling principle be applied. In contrast to the ceiling principle, the interim ceiling principle has been widely used and sometimes misused. The rule (NRC 1992, p 14-15, 91-93) is: "In applying the multiplication rule, the 95% upper confidence limit of the frequency of each allele should be calculated for separate US 'racial' groups and the highest of these values or 10% (whichever is the larger) should be used. Data on at least three major 'races' (e.g., whites, blacks, Hispanics, east Asians, and American Indians) should be analyzed." The report also stated that the multiplication (that is, product) rule should be applied only when there is no significant departure from HW and LE, even though the ceiling principle was introduced specifically to accommodate deviations from HW and LE.
If the interim ceiling principle is applied to four loci, the minimum probability, assuming that there are no single bands, is [2(0.1)(0.1)]4 = (1/50)4 = 1/6,250,000. With five loci the minimum probability becomes about one in 300 million. But if the 2p rule is used for single bands and any locus found to depart from HW proportions is not used, the probability can be much larger. For example, if only three loci are used and one is homozygous, the minimum is 2(0.1)(1/50)2 = 1/12,500.
Is the interim ceiling principle logical? Is it unnecessarily conservative? In view of all the accumulated data we have discussed, is it needed? The interim ceiling principle has the advantage that in any particular case it gives the same answer irrespective of the racial group. That is also a disadvantage, because it does not permit the use of well-established differences in frequencies in different races; it is inflexible and cannot be adjusted to the circumstances of a particular case.
The ceiling principles have been widely discussed, usually critically (Chakraborty and Kidd 1991; Cohen 1992; Morton 1992, 1995; Evett, Scranage, and Pinchin 1993; Kaye 1993, 1995a; Lempert 1993; Weir 1993a; Balding and Nichols 1994; Devlin, Risch, and Roeder 1994; Lander and Budowle 1994; TWGDAM 1994c; Morton 1995). Here are some of those criticisms:
· The 10% value is completely arbitrary, and there is no scientific justification of its choice as a ceiling value.
· Although calculation of an upper 95% confidence limit for an individual allele is justified as a standard statistical procedure, multiplication of those values is not.
· The ceiling principles do not make use of the large amount of allele-frequency data now available from different groups and subgroups.
· They do not make use of standard procedures long used by population geneticists to study subdivided populations.

OCR for page 125

Page 158
· It is excessively conservative. (Actually it is not always conservative, for one can contrive examples in which it is not [Slimowitz and Cohen 1993], but in realistic examples it is conservative.)
· The report's lack of specific instructions as to which population groups should be included has led some experts and attorneys to focus on extreme examples, perhaps involving small databases with large sampling errors or irrelevant populations; that practice was not foreseen by the writers of the 1992 report.
We agree with the criticisms listed above. Our view is that sufficient data have been gathered that neither ceiling principle is needed. We have suggested alternative procedures, all of which are conservative to some degree. We believe that estimates based on the formulae outlined in Chapter 4—and with proper attention to uncertainties—are now appropriate. In special cases in which there is no appropriate database, such as for some American Indian tribes, the estimates (based on the methods in this report) for several related groups should be used.
TWGDAM (1994c) has recently issued a report on the ceiling principle. TWGDAM "cannot recommend the application of the ceiling principle. The basis for the need for a ceiling principle is flawed. . . . The current methods employed by forensic scientists have been demonstrated to be robust scientifically" (p 899).
If the interim ceiling principle is used despite that recommendation, TWGDAM recommends an approach intended to overcome some of the criticisms of the 1992 NRC report. The recommended approach differs from that in the 1992 report in several ways:
· When the measurement error spans a fixed-bin border, take the frequency of the most frequent of the bins instead of summing the overlapped bins, as recommended by the 1992 report.
· Native-American databases are not to be used to generate values for the ceiling; the groups to be used are whites, blacks, Hispanics, and east Asians.
· The multiple of the standard deviation for an upper 95% confidence limit should be 1.64, not 1.96, which was given in a footnote on page 92 of the 1992 report. NRC (1992) confused one-tailed and two-tailed confidence coefficients.
We agree with the TWGDAM recommendations and add the following interpretations, which we believe are consistent with the 1992 report.
· The ceiling principles are intended for criminal, not civil cases. They are therefore inappropriate for paternity testing, unless that is part of a criminal proceeding.
· The ceiling principles were intended for VNTRs with many alleles, no one of which has a very high frequency. They are not applicable to PCR-based systems, which ordinarily have few alleles. For example, applying the upper 95% confidence limit produces allele frequencies that add up to more than one, and,

OCR for page 125

Page 159
with two alleles, to heterozygote frequencies that can be greater than the HW maximum of 1/2.
· As originally presented (NRC 1992, p 91), the ceiling principle would use only those loci not differing significantly from HW and LE. But populations with the least reliable numbers (that is, the smallest databases) are the very ones most likely not to show a statistically significant departure from HW and LE. Thus, an analyst who uses the interim ceiling principle will often be forced to reject more reliable loci in favor of less reliable ones. Furthermore, the purpose of the ceiling principles is to allow for differences in allele frequencies in different subgroups, for which HW and LE are insensitive measures. Therefore, we believe that all loci in the selected databases should be used in the calculation.
In summary, the procedures we have recommended in Chapter 4 are based on population genetics and empirical data and can encompass suitable degrees of conservatism. With such procedures available, we believe that the interim ceiling principle is not needed and can be abandoned.
Direct Count from a Database The 1992 NRC report stated (p 91) that "the testing laboratory should check to see that the observed multilocus genotype matches any sample in the population database. Assuming that it does not, it should report that the DNA pattern was compared to a database of N individuals from the population and no match was observed, indicating its rarity in the population." The Committee noted that if there were no occurrences of a profile in 100 samples, the upper confidence limit is 3%. It went on to say (p 76) that "such estimates produced by straightforward counting have the virtue that they do not depend on theoretical assumptions, but simply on the sample's having been randomly drawn from the appropriate population. However, such estimates do not take advantage of the full potential of the genetic approach."
The ceiling method uses random-mating theory but does not make full use of population data. The counting method does not even combine allele frequencies and thereby loses even more information. In addition, very small probabilities cannot be estimated accurately from samples of realistic size; modeling is required. In fact, most profiles are not found in any database, so there must be a convention as to how to handle zeros. Since we believe that the abundant data make the ceiling principles unnecessary, this is true a fortiori for the direct counting method.
Some statisticians and others have questioned the accuracy of using population-genetics theory that incorporates estimated allele distributions in forensic calculations. Somewhat comparable calculations are available that do not use this information. For a Poisson distribution, an upper 100(1 - a)% confidence limit L for the expected number of events when zero events have been observed

OCR for page 125

Page 160
is L = - ln(a). For a 95% confidence limit, a = 0.05 and L = 3. For illustration, the TWGDAM data included about 7,000 persons in the white database. That yields (7,000)(6,999)/2 = 24.5 million pairs of profiles. Only one four-locus match was found and none for five or more. Let us assume that all persons were tested at five loci—the same five loci—and regard these pairs as a random sample. Then a Poisson approximation similar to that leading to Equation 5.5 (but without assuming HW and LE) leads to the conclusion that an upper 95% confidence limit for the probability of a match between suspect and evidence DNA at those five loci is 3/(24.5 million), or about 1 in 8 million. This calculation illustrates the possibility of procedures that do not employ estimated allele distributions and population-genetics theory but still give very small match probabilities, provided that sufficiently large databases of genotype profiles are available.
However, such a method is inappropriate, and we do not recommend it. It gives an approximate upper bound to the mean value, but not conditioned on the particular profile in question; it does not answer the question we are most interested in, the probability of a match of the particular evidence and suspect profile. It has not been demonstrated to be robust with respect to various database problems; for example, the same loci may not always have been tested. Also, it does not use available information about allele frequencies and thus does not permit sharper inferences conditional upon that information. Finally, the value is strongly dependent on the size of the database.
The population-genetic assumptions that we use are robust, are accurate within the limits discussed elsewhere in this report, and make sensible use of information about allele frequencies.
Conclusions and Recommendations Statistical Issues Confidence limits for profile probabilities, based on allele frequencies and the size of the database, can be calculated by methods explained in this report. We recognize, however, that confidence limits address only part of the uncertainty. For a more realistic estimate, we examined empirical data from the comparison of different subpopulations and of subpopulations with the whole. The empirical studies show that the differences between the frequencies of the individual profiles estimated by the product rule from different adequate subpopulation databases (at least several hundred persons) are within a factor of about 10 of each other, and that provides a guide to the uncertainty of the determination for a single profile. For very small estimated profile frequencies, the uncertainty can be greater, both because of the greater relative uncertainty of individually small probabilities and because more loci are likely to be multiplied. But with very small probabilities, a large relative error is not likely to change the conclusion.

OCR for page 125

Page 161
Database Searches If the suspect is identified through a DNA database search, the interpretation of the match probability and likelihood ratio given in Chapter 4 should be modified.
Recommendation 5.1: When the suspect is found by a search of DNA databases, the random-match probability should be multiplied by N, the number of persons in the database.
If one wishes to describe the impact of the DNA evidence under the hypothesis that the source of the evidence sample is someone in the database, then the likelihood ratio should be divided by N. As database searches become more extensive, another problem may arise. If the database searched includes a large proportion of the population, the analysis must take this into account. In the extreme case a search of the whole population should, of course, provide a definitive answer.
Uniqueness With an increasing number of loci available for forensic analysis, we are approaching the time when each person's profile will be unique (except for identical twins and possibly other close relatives). Suppose that, in a population of N unrelated persons, a given DNA profile has probability P. The probability (before a suspect has been profiled) that the particular profile observed in the evidence sample is not unique is at most NP.
A lower bound on the probability that every person is unique depends on the population size, the number of loci, and the heterozygosity of the individual loci. Neglecting population structure and close relatives, 10 loci with a geometric mean heterozygosity of 95% give a probability greater than about 0.999 that no two unrelated persons in the world have the same profile. Once it is decided what level of probability constitutes uniqueness, appropriate calculations can readily be made.
In any particular case, the chance that the DNA profile for the evidence sample is unique is of more concern than the chance that all DNA profiles are unique. Hence, the calculation in the first paragraph will be the one more often employed.
Matching and Binning VNTR data are essentially continuous, and, in principle, a continuous model should be used to analyze them. The methods generally used, however, involve taking measurement uncertainty into account by determining a match window. Two procedures for determining match probabilities are the floating-bin and fixed-bin methods. The floating-bin method is statistically preferable but requires

OCR for page 125

Page 162
access to a computerized database. The fixed-bin method is more widely used and understood, and the necessary data tables are widely and readily available. When our fixed-bin recommendation is followed, the two methods lead to very similar results. Both methods are acceptable.
Recommendation 5.2. If floating bins are used to calculate the random-match probabilities, each bin should coincide with the corresponding match window. If fixed bins are employed, then the fixed bin that has the largest frequency among those overlapped by the match window should be used.
Ceiling Principles The abundance of data in different ethnic groups within the major races and the genetically and statistically sound methods recommended in this report imply that both the ceiling principle and the interim ceiling principle are unnecessary.
Further Research The rapid rate of discovery of new markers in connection with human gene-mapping should lead to many new markers that are highly polymorphic, mutable, and selectively neutral, but which, unlike VNTRs, can be amplified by PCR and for which individual alleles can usually be distinguished unambiguously with none of the statistical problems associated with matching and binning. Furthermore, radioactive probes need not be used with many other markers, so identification can be prompt and problems associated with using radioactive materials can be avoided. It should soon be possible to have systems so powerful that no statistical and population analyses will be needed, and (except possibly for close relatives) each person in a population can be uniquely identified.
Recommendation 5.3. Research into the identification and validation of more and better marker systems for forensic analysis should continue with a view to making each profile unique.
Appendix 5A Mixed stains introduce a number of complexities. We limit our consideration to cases in which the stain comes from two persons, but only one suspect is identified. The case where four bands are observed, two of which match the suspect, was given in the text. Here we consider circumstances in which fewer than four bands are found in the evidentiary DNA. This may mean that either the suspect or the other contributor to the stain produced a single band. Thus, the 2p rule may be needed. It is also possible that there are only two bands, but other loci indicate that the stain is mixed. These cases are summarized in Table 5.1.

OCR for page 125

Page 163
TABLE 5.1 Likelihood Ratios for Mixed Stainsa
Crime scene
Suspect
Rule
Likelihood Ratio
A1A2A3A4
A1A2
—
A1A2A3
A2A3
2p
p2
A1A2A3
A1
2p
p2
A1A2
A1A2
2p
p2
A1A2
A1
2p
p2
aFor each combination of crime-scene and suspect genotypes, the likelihood ratio is given for each of two rules for dealing with single bands (or homozygotes).
Appendix 5B If the database is not homogeneous, that is, if P(Mi) is different for different values of i, then the inequality in Equation 5.3 is still valid, so
(5.9)
where P(Mi) can be evaluated by the methods of Chapter 4. Many of the terms in the sum will be the same. In the simplest case, in which the database contains N persons, all with the same ethnic background, the effect is just to multiply an individual match probability by N, leading to Equation 5.3.
If we assume that the database consists of n1 whites and n2 blacks, then Equation 5.9 simplifies to
(5.10)
where W denotes the event that a randomly selected white profile matches the evidence-sample profile and B denotes the same match event for a randomly selected black profile.

OCR for page 125

Page 164
Remark. An approximation that will often be somewhat closer to P(M) than Equation 5.9 is
(5.11)
That approximation will give approximately the same answer as Equation 5.9 when SiP(Mi) is small, and that is the case of practical importance.
The event M involves all markers tested, both those employed for identification purposes and any additional markers used for confirmation. If we let Mi,1 denote the event of a match of the i-th profile on the initial batch of markers tested for the purpose of the database search, and Mi,2 the event of a match of the i-th profile on the subsequent markers tested, then under the assumption of linkage equilibrium, P(Mi) = P(Mi,1P)P(Mi,2). That same factorization would hold under the assumption of linkage equilibrium for an arbitrary division of the markers into two subsets.
From a Bayesian viewpoint, there are other methods to deal with database searches, although the final result is much the same as that given above (see also Balding and Donnelly 1995). Although the assignment of prior probabilities is problematic and appears to have been used rarely if at all in criminal forensic investigations in the United States, some related ideas can be useful in clarifying certain issues. Let Q be the probability of the event E that some person whose profile is in the database left the evidence sample, with 1 - Q being the probability of event Ec that the evidence sample was left by someone whose profile is not in the database. Suppose that there is a match between the evidence-sample profile and at least one profile in the database. Assuming that and , where P(M) is evaluated as above, we find the posterior odds that the evidence sample was left by someone whose profile is in the database to be . Since the posterior odds equals the prior odds times the likelihood ratio, the likelihood ratio is 1/P(M), as above.
If there is a unique match in the database, the preceding argument, which implicates the database as a whole, would, of course, implicate the person with the unique matching profile. The following alternative argument focuses directly on such a person. Let Ej denote the event that the i-th person whose profile is in the database left the evidence sample, and let Mi be the event that the profile of the i-th person matches that of the evidence sample. Let Ui be the event that the i-th person has a unique matching profile. Let qi denote the prior probability of Ei, so that Q = Siqi, and let be the conditional probability of a random match under the condition Ec that no one profiled in the database left the evidence sample. The posterior odds implicating the i-th person as the source of the evidence sample are . Under the assumption that all possible sources of the evidence sample are unrelated, it can be shown that this ratio equals qi/( 1 - Q)p, and even without that assumption, the same expression is a lower bound for the posterior odds. In the special case in which qi = Q/N, where N is the size of the database, the formula becomes

OCR for page 125

Page 165
Q/[(1 -Q)Np]; when Np is small, this is essentially the same as the preceding case. A Bayesian analysis is particularly well-suited to deal with the case where the database can be expected to contain almost all reasonable suspects. In that case, the prior odds, Q/(1 - Q), would be large.
Appendix 5C Equation 5.5 can be derived as follows. Let M denote the number of pairwise matches in the population when K loci are typed. To evaluate the probability that P{M ³ 1} we let , where is the probability of the genotype in some fixed enumeration of the set of all possible genotypes. As an application of the "birthday problem" with unequal probabilities (Aldous 1989, p 109), we have
(5.12)
if N is large and max() is small. The contribution to E of a single locus, expressed in terms of the allele frequencies pi and homozygosity f at that locus, is
Taking the product over all loci, we find that an upper bound for E is PL[2fL2]. Hence, a simple approximate upper bound for the desired probability is
(5.13)
where f = (P fL)1/K, the geometric mean of the homozygosities.
If the homozygosity of some loci is moderate or high, as for some PCR loci, the following refinement of our approximate upper bound can be useful because it shows that a smaller number of loci may yield uniqueness at each given probability level. In the above derivation, instead of dropping S1pi4, note from Jensen's inequality (see, for example, James and James 1959) that S1 p14 ³(Si pi2)3 = f3. That leads to the approximate upper bound obtained by setting
(5.14)
in Equation 5.12.
As an example, suppose N = 5 X 109 and fL = 0.5 for every L. If we insist that the probability of simultaneous uniqueness of all profiles exceed 0.99, then Equation 5.13 requires 71 loci, whereas Equations 5.12 and 5.14 show that 50 actually suffice.