Page 9

Overview

This overview describes the essentials of the subject with a minimum of jargon, statistics, and technical details. The aim is to present technical information in nontechnical language, but without distorting the meaning by oversimplifying. Although this overview is intended to be self-contained, we shall refer to relevant sections in the main report for fuller explanations, corroborative details, and justification of recommended procedures. We have included an illustrative example at the end of the overview. The glossary and the list of abbreviations at the end of the report may be useful.

Introduction

DNA typing, with its extremely high power to differentiate one human being from another, is based on a large body of scientific principles and techniques that are universally accepted. These newer molecular techniques permit the study of human variability at the most basic level, that of the genetic material itself, DNA. Standard techniques of population genetics and statistics can be used to interpret the results of forensic DNA typing. Because of the newness of the techniques and their exquisite discriminating power, the courts have subjected DNA evidence to extensive scrutiny. What at first seemed like daunting complexity in the interpretation of DNA tests has sometimes inhibited the full use of such evidence. An objective of this report is to clarify and explain how DNA evidence can be used in the courtroom.

If the array of DNA markers used for comparison is large enough, the chance that two different persons will share all of them becomes vanishingly small. With



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 9
Page 9 Overview This overview describes the essentials of the subject with a minimum of jargon, statistics, and technical details. The aim is to present technical information in nontechnical language, but without distorting the meaning by oversimplifying. Although this overview is intended to be self-contained, we shall refer to relevant sections in the main report for fuller explanations, corroborative details, and justification of recommended procedures. We have included an illustrative example at the end of the overview. The glossary and the list of abbreviations at the end of the report may be useful. Introduction DNA typing, with its extremely high power to differentiate one human being from another, is based on a large body of scientific principles and techniques that are universally accepted. These newer molecular techniques permit the study of human variability at the most basic level, that of the genetic material itself, DNA. Standard techniques of population genetics and statistics can be used to interpret the results of forensic DNA typing. Because of the newness of the techniques and their exquisite discriminating power, the courts have subjected DNA evidence to extensive scrutiny. What at first seemed like daunting complexity in the interpretation of DNA tests has sometimes inhibited the full use of such evidence. An objective of this report is to clarify and explain how DNA evidence can be used in the courtroom. If the array of DNA markers used for comparison is large enough, the chance that two different persons will share all of them becomes vanishingly small. With

OCR for page 9
Page 10 appropriate DNA test systems, the uniqueness of any individual on the planet (except an identical twin) is likely to be demonstrable in the near future. In the meantime, the justification for an inference that two identical DNA profiles come from the same person rests on probability calculations that employ principles of population genetics. Such calculations are, of course, subject to uncertainty. When in doubt, we err on the side of conservatism (that is, in favor of the defendant). We also discuss ways of keeping laboratory and other errors to a minimum. We emphasize that DNA analysis, when properly carried out and interpreted, is a very powerful forensic tool. Our Assignment This committee was asked to update an earlier report, prepared for the National Research Council (NRC) in 1992. There are two principal reasons why such an update is needed. First, forensic science and techniques have progressed rapidly in recent years. Laboratory standards are higher, and new DNA markers are rapidly being introduced. An abundance of new data on DNA markers in different population groups is now available, allowing estimates of the frequencies of those markers in various populations to be made with greater confidence. Second, some of the statements in the first report have been misinterpreted or misapplied in the courts. This report deals mainly with two subjects: The first involves the laboratory determination of DNA profiles. DNA can be obtained in substantial amounts and in good condition, as when blood or tissue is obtained from a person, or it can be in limited amounts, degraded, or contaminated, as in some samples from crime scenes. Even with the best laboratory technique, there is intrinsic, unavoidable variability in the measurements; that introduces uncertainty that can be compounded by poor laboratory technique, faulty equipment, or human error. We consider how such uncertainty can be reduced and the risk of error minimized. The second subject is the interpretation of a finding that the DNA profile of a suspect (or sometimes a victim) matches that of the evidence DNA, usually taken from the crime scene. The match might happen because the two samples are from the same person. Alternatively it might be that the samples are from different persons and that an error has occurred in the gathering of the evidence or in the laboratory. Finally, it might be that the samples are from different people who happen to have the same DNA profile; the probability of that event can be calculated. If the probability is very low, then either the DNA samples are from the same person or a very unlikely coincidence has occurred. The interpretation of a matching profile involves at least two types of uncertainty. The first arises because the US population is not homogeneous. Rather it consists of different major races (such as black and white), within which there

OCR for page 9
Page 11 are various subgroups (e.g., persons of Italian and Finnish ancestry) that are not completely mixed in the "melting pot." The extent of such population structure and how it can be taken into account are in the province of population genetics. The second uncertainty is statistical. Any calculation depends on the numbers in available databases. How reliable are those numbers and how accurate are the calculations based on them and on population genetic theory? We discuss these questions and give answers based on statistical theory and empirical observations. Finally, some legal issues are discussed. We consider how the courts have reacted to this new technology, especially since the 1992 NRC report. That earlier report considered a number of issues that are outside our province. Issues such as confidentiality and security, storage of samples for possible future use, legal aspects of data banks on convicted felons, non-DNA information in data banks, availability and costs of experts, economic and ethical aspects of new DNA information, accountability and public scrutiny, and international exchange of information are not in our charge. As this report will reveal, we agree with many recommendations of the earlier one but disagree with others. Since we make no attempt to review all the statements and recommendations in the 1992 report, the lack of discussion of such an item should not be interpreted as either endorsing or rejecting it. DNA Typing DNA typing for forensic purposes is based on the same fundamental principles and uses the same techniques that are routinely employed in a wide variety of medical and genetic situations, such as diagnosis and gene mapping. Those methods analyze the DNA itself. That means that a person's genetic makeup can be determined directly, not indirectly through gene products, as was required by earlier methods. DNA is also resistant to many conditions that destroy most other biological compounds, such as proteins. Furthermore, only small amounts of DNA are required; that is especially true if PCR (polymerase chain reaction) methods, to be described later, are employed. For those reasons, direct DNA determinations often give useful results when older methods, such as those employing blood groups and enzymes, do not. We emphasize that one of the most important benefits of DNA technology is the clearing of falsely-accused innocent suspects. According to the FBI, about a third of those named as the primary suspect in rape cases are excluded by DNA evidence. Cases in which DNA analysis provides evidence of innocence ordinarily do not reach the courts and are therefore less widely known. Prompt exclusions can eliminate a great deal of wasted effort and human anguish. Before describing the techniques of DNA identification, we first provide some necessary genetic background and a minimum vocabulary.

OCR for page 9
Page 12 Basic Genetic Principles Each human body contains an enormous number of cells, all descended by successive divisions from a single fertilized egg. The genetic material, DNA, is in the form of microscopic chromosomes, located in the inner part of the cell, the nucleus. A fertilized egg has 23 pairs of chromosomes, one member of each pair having come from the mother and the other from the father. The two members of a pair are said to be homologous. Before cell division, each chromosome splits into two. Because of the precision of chromosome distribution in the cell-division process, each daughter cell receives identical chromosomes, duplicates of the 46 in the parent cell. Thus, each cell in the body should have the same chromosome makeup. This means that cells from various tissues, such as blood, hair, skin, and semen, have the same DNA content and therefore provide the same forensic information. There are some exceptions to the rule of identical chromosomes in every cell, but they do not affect the conclusion that diverse tissues provide the same information. The most important exception occurs when sperm and eggs are formed. In this process, each reproductive cell receives at random one representative of each pair, or 23 in all. The double number, 46, is restored by fertilization. With the exception of the sex chromosomes, X and Y (the male-determining Y is smaller than the X), the two members of a pair are identical in size and shape. (It might seem puzzling that sperm cells, with only half of the chromosomes, can provide the same information as blood or saliva. The reason is that DNA from many sperm cells is analyzed at once, and collectively all the chromosomes are represented.) A chromosome is a very thin thread of DNA, surrounded by other materials, mainly protein. (DNA stands for deoxyribonucleic acid.) The DNA in a single chromosome, if stretched out, would be an inch or more in length. Remarkably, all that length is packed into a cell nucleus some 1/1,000 inch in diameter. The DNA is compacted by coils within coils. The DNA thread is actually double, consisting of two strands twisted to form a helix (Figure 0.1). Each strand consists of a string of bases held together by a sugar-phosphate backbone. The four bases are abbreviated A, T, G, and C (these stand for adenine, thymine, guanine, and cytosine, but we shall employ only the abbreviations). In double-stranded DNA, the bases line up in pairs, an A opposite a T and a G opposite a C: C A T T A G A C T G A T G T A A T C T G A C T A Thus, if the sequence of bases on one strand is known, the other is determined. Prior to cell division, the double strand splits into two single strands, each containing a single base at each position. There are free-floating bases in the cell nucleus, and these attach to each single strand according to the A-T, G-C pairing rule. Then they are tied together and zipped up by enzymes. In this way, each

OCR for page 9
Page 13 Figure 0.1   Diagram of a chromosome, with a small region  expanded to show the double-helical structure of DNA. The ''steps " of the twisted ladder are four kinds of base pairs, AT, TA, GC,  or CG. From NRC (1992). DNA double helix makes a copy of itself. There are then two identical double strands, each half old and half new, and one goes to each daughter cell. That accounts for the uniformity of DNA makeup throughout the body. The total number of base pairs in a set of 23 chromosomes is about 3 billion. A gene is a stretch of DNA, ranging from a few thousand to tens of thousands of base pairs, that produces a specific product, usually a protein. The order of the four kinds of bases within the gene determines its function. The specific base sequence acts as an encoded message written in three-letter words, each specifying an amino acid (a protein building block). In the diagram above, CAT specifies one amino acid, TAG another, ACT a third, and so on. These amino acids are joined together to make a chain, which folds in various ways to make a three dimensional protein. The gene product may be detected by laboratory methods, as with blood groups, or by some visible manifestation, such as eye color. The position that a gene occupies along the DNA thread is its locus. In chemical composition, a gene is no different from the rest of the DNA in the chromosome. Only its having a specific sequence of bases, enabling it to encode a specific protein, makes each gene unique. Genes are interspersed among the rest of the DNA and actually compose only a small fraction of the total. Most of the rest has no known function.

OCR for page 9
Page 14 Alternative forms of a gene, for example those producing normal and sicklecell hemoglobin, are called alleles. The word genotype refers to the gene makeup. A person has two genes at each locus, one maternal, one paternal. If there are two alleles, A and a, at a locus, there are three genotypes, AA, Aa, and aa. The word genotype can be extended to any number of loci. In forensic work, the genotype for the group of analyzed loci is called the DNA profile. (We avoid the word fingerprint to prevent confusion with dermal fingerprints.) If the same allele is present in both chromosomes of a pair, the person with that pair is homozygous. If the two are different, the person is heterozygous. (The corresponding nouns are homozygote and heterozygote.) Thus, genotypes AA and aa are homozygous and Aa is heterozygous. Genes on the same chromosome are said to be linked, and they tend to be inherited together. They can become unlinked, however, by the process of crossing over, which involves breakage of two homologous chromosomes at corresponding sites and exchange of partners (Figure 0.2). Genes that are on nonhomologous chromosomes are inherited independently, as are genes far apart on the same chromosome. Occasionally, an allele may mutate; that is, it may suddenly change to another allele, with a changed or lost function. When the gene mutates, the new form is copied as faithfully as the original gene, so a mutant gene is as stable as the gene before it mutated. Most genes mutate very rarely, typically only once in some 100,000 generations, but the rates for different genes differ greatly. Mutations can occur in any part of the body, but our concern is those that occur in the reproductive system and therefore can be transmitted to future generations. Forensic DNA Identification VNTRs One group of DNA loci that are used extensively in forensic analysis are those containing Variable Numbers of Tandem Repeats (VNTRs). These are not genes, since they produce no product, and those that are used for forensic determinations have no known effect on the person. That is an advantage, for it means that VNTRs are less likely to be influenced by natural selection, which could lead to different frequencies in different populations. For example, several genes that cause malaria resistance are more common in people of Mediterranean or African ancestry, where malaria has been common. A typical VNTR region consists of 500 to 10,000 base pairs, comprising many tandemly repeated units, each some 15 to 35 base pairs in length. The exact number of repeats, and hence the length of the VNTR region, varies from one allele to another, and different alleles can be identified by their lengths. VNTR loci are particularly convenient as markers for human identification because they have a very large number of different alleles, often a hundred or more, although

OCR for page 9
Page 15 Figure 0.2   Diagram of crossing over. The chromosomes pair  (upper diagram), break at corresponding points (middle),  and exchange parts. The result is that alleles A and B, which were formerly on the same chromosome,  are now on different chromosomes. only 15 to 25 can be distinguished practically, as we explain later. (The word allele is traditionally applied to alternative forms of a gene; here we extend the word to include nongenic regions of DNA, such as VNTRs.) VNTRs also have a very high mutation rate, leading to changes in length. An individual mutation usually changes the length by only one or a few repeating units. The result is a very large number of alleles, no one of which is common. The number of possible genotypes (pairs of alleles) at a locus is much larger than the number of alleles, and when several different loci are combined, the total number of genotypes becomes enormous. To get an idea of the amount of genetic variability with multiple alleles and multiple loci, consider first a locus with three alleles, A1, A2, and A3. There are three homozygous genotypes, A1A1, A2A2, and A3A3, and three heterozygous ones, A1A2, A1A3, and A2A3. In general, if there are n alleles, there are n homozygous genotypes and n(n- 1)/2 heterozygous ones. For example, if there are 20 alleles, there are 20 + (20 X 19)/2 = 210 genotypes. Four loci with 20 alleles each would have 210 X 210 X 210 X 210, or about 2 billion possible genotypes. For a genetic system to be useful for identification, it is not enough that it yield a large number of genotypes. The relative frequencies of the genotypes are also important. The more nearly equal the different frequencies are, the greater the discriminatory power. VNTRs exhibit both characteristics. DNA Profiling Genetic types at VNTR loci are determined by a technique called VNTR profiling. Briefly, the technique is as follows (Figure 0.3). First, the DNA is extracted from whatever material is to be examined. The DNA is then cut by a specific enzyme into many small fragments, millions in each cell. A tiny fraction of those fragments includes the particular VNTR to be analyzed. The fragmented

OCR for page 9
Page 16 Figure 0.3 An outline of the DNA profiling process. DNA is then placed in a small well at one edge of a semisolid gel. Each of the different DNA samples to be analyzed is placed in a different well. Additional wells receive various known DNA samples to serve as controls and fragment-size indicators. Then the gel is placed in an electric field and the DNA migrates away from the wells. The smaller the fragment, the more rapidly it moves. After a suitable time, the electric current is stopped, and the different fragments will have migrated different distances, the shorter ones for greater distances. In the process, the DNA fragments are denatured, meaning that the double strands in each fragment are separated into single strands. The fragments are then transferred by simple blotting to a nylon membrane, which is tougher and easier to handle than the gel and to which the single-stranded fragments adhere. Then a radioactive probe is added. A probe is a short section of single-stranded DNA complementary to the specific VNTR of interest, meaning that it has a C where the VNTR has a G, an A where the VNTR has a T, and so on, so that the probe is specifically attracted to this particular VNTR. When the membrane is placed on a photographic film, the radioactive probes take a picture of themselves, producing dark spots on the film at positions corresponding to the particular DNA fragments to which the probe has attached. This photo is called an autoradiograph, or autorad for short. The two DNA samples to be compared (usually from the evidence, E, and from a suspect, S) are placed in separate lanes in the gel, with DNA in several other lanes serving as different kinds of controls. Because of the large number of VNTR alleles, most loci are heterozygous, and there will usually be two bands

OCR for page 9
Page 17 in each lane. If the two DNA samples, E and S, came from the same individual, the two bands in each lane will be in the same, or nearly the same, positions; if the DNA came from different persons, they will usually be in quite different positions. The sizes of the fragments are estimated by comparison with a "ladder"  in which the spots are of known size. Figure 0.4 shows an example. In this case, the question is whether either of two victims, V1 and V2, match a blood stain, called E blood in the figure, Figure 0.4 An autorad from an actual case, illustrating fragment-length variation at the D1S7 locus.  The lanes from left to right are: (1) standard DNA ladder, used to estimate sizes;  (2) K562, a standard cell line with two bands of known size, used as a control: (3) within-laboratory quality control sample; (4) standard ladder; (5) DNA from blood at the crime scene; (6) standard ladder; (7) DNA from the first victim: (8) another sample from the first victim; (9) standard ladder; (10) DNA from the second victim; (11) DNA from the first suspect; (12) DNA from the second suspect; (13) standard ladder. Courtesy of the State of California Department of Justice DNA Laboratory.

OCR for page 9
Page 18 that was found on the clothing of suspect S 1. S2 is a second suspect in the case. The sizing ladders are in lanes 1, 4, 6, 9, and 13; these are repeated in several lanes to detect possible differences in the rate of migration in different lanes. K562 and QC are other controls. On looking at the figure, one sees that the evidence blood (E blood) is not from V2 (or from S1 or S2), since the bands are in quite different positions. However, it might well be from V , since the bands in E and V1 are at the same position. After such an analysis, the radioactive probe is washed off the membrane. Then a new probe, specific for another VNTR locus, is added and the whole process repeated. This is continued for several loci, usually four or more. There is a practical limit, however, since the washing operation may eventually remove some of the DNA fragments, making the bands on the autorad weak or invisible. In the example in Figure 0.4, testing at 9 additional loci gave consistent matches between E blood and Victim 1, leaving little doubt as to the source of the blood. In most laboratories, the sizes of the fragments are measured by a computer, which also does the calculations that are described below. A DNA fragment from the evidence is declared to match the one from a suspect (or, in the case of Figure 0.4, from a victim) if they are within a predetermined relative distance. If the bands do not match, that is the end of the story: the DNA samples did not come from the same individual. If the DNA patterns do match, or appear to match, the analysis is carried farther, as described in the next section. A difficulty with VNTRs using radioactive probes is the long time required to complete the analysis. One or two weeks are needed for sufficient radiation to make a clear autorad, and, as just described, the different loci are done in succession. As a result, the process takes several weeks. Some newer techniques use luminescent chemicals instead of radioactive ones. As such techniques are perfected and come into wider use, the process will speed up considerably. Matching and Binning of VNTRs Because of measurement uncertainty, the estimates of fragment sizes are essentially continuous. The matching process consists of determining whether two bands are close enough to be within the limits of the measurement uncertainty. After the two bands have been determined to match, they are binned. In this process, the band is assigned to a size class, known as a bin. Two analytical procedures are the-fixed-bin and the floating-bin methods. The floating-bin method is statistically preferable, but it requires access to a computerized data base. The fixed-bin method is simpler in some ways and easier for the average laboratory to use; hence, it is more widely employed. Only the fixed-bin method is described here, but the reader may refer to Chapter 5 (p 142) for a description of floating-bin procedures.

OCR for page 9
Page 19 A match between two different DNA sources (e.g., evidence and suspect DNA) is typically determined in two stages. First is a visual examination. Usually the bands in the two lanes to be compared will be in very similar positions or in clearly different positions. In the latter case, there is no match, and the DNA samples are assumed to have come from different persons. In Figure 0.4, only the bands of V1 match the evidence blood. The role of a visual test is that of a preliminary screen, to eliminate obvious mismatches from further study and thereby save time and effort. The second, measurement-confirmation step is based on the size of the fragment producing the band, as determined by size standards (the standard ladders) on the same autorad (Figure 0.4). The recorded size is subject to measurement uncertainty, which is roughly proportional to the fragment size. Based on duplicate measurements of the same sample in different laboratories, roughly 2/3 of the measurements are within 1% of the correct value. In practice, a value larger than 1%, usually 2.5%—although this varies in different laboratories—is used to prevent the possible error of classifying samples from the same person as being different. The measurement with 2.5% of its value added and subtracted yields an uncertainty window. Two bands, say from suspect and evidence, are declared to match if their uncertainty windows overlap; otherwise a nonmatch is declared. Compare the top two diagrams in Figure 0.5. Figure 0.5  Diagrams showing the extent of the uncertainty windows (a,b) and the match window  (c). In the top group, the uncertainty windows do not overlap; in the second they do.  The bottom diagram shows the match window of a fragment along with the fixed bin. The match window overlaps bins 10 and 11.

OCR for page 9
Page 36 Chapter 5 that will make the principle more workable and less susceptible to creative misapplications. DNA in the Courts Prior to 1992, there was controversy over our two main issues, laboratory error and population substructure. The 1992 NRC report was intended to resolve the controversy, but the arguments went on. One reason is that the scientific community has not spoken with one voice; defense and prosecution witnesses have given highly divergent statistical estimates or have disagreed as to the validity of all estimates. For this reason, some courts have held that the analyses are not admissible in court. The courts, however, have accepted the soundness of the typing procedures, especially for VNTRs. The major disagreement in the courts has been over population substructure and possible technical or human errors. The interim ceiling principle, in particular, has also been the subject of considerable disagreement. We hope that our report will ease the acceptance of DNA analysis in the courts and reduce the controversy. We shall not summarize the various court findings and opinions here. The interested reader can find this information in Chapter 6, which also discusses the implications that our recommendations could have on the production and introduction of DNA evidence in court proceedings. Conclusions and Recommendations Conclusions and recommendations are given at the ends of the chapters in which the relevant subject is discussed. For convenience, they are repeated here. Admissibility of DNA Evidence (Chapter 2) DNA analysis is one of the greatest technical achievements for criminal investigation since the discovery of fingerprints. Methods of DNA profiling are firmly grounded in molecular technology. When profiling is done with appropriate care, the results are highly reproducible. In particular, the methods are almost certain to exclude an innocent suspect. One of the most widely used techniques involves VNTRs. These loci are extremely variable, but individual alleles cannot be distinguished, because of intrinsic measurement variability, and the analysis requires statistical procedures. The laboratory procedure involves radioactivity and requires a month or more for full analysis. PCR-based methods are prompt, require only a small amount of material, and can yield unambiguous identification of individual alleles. The state of the profiling technology and the methods for estimating frequencies and related statistics have progressed to the point where the admissibility of properly collected and analyzed DNA data should not be in doubt. We expect

OCR for page 9
Page 37 continued development of new and better methods and hope for their prompt validation, so that they can quickly be brought into use. Laboratory Errors (Chapter 3) The occurrence of errors can be minimized by scrupulous care in evidence-collecting, sample-handling, laboratory procedures, and case review. Detailed guidelines for QC and QA (quality control and quality assurance), which are updated regularly, are produced by several organizations, including TWGDAM. ASCLD-LAB is established as an accrediting agency. The 1992 NRC report recommended that a National Committee on Forensic DNA Typing (NCFDT) be formed to oversee the setting of DNA-analysis standards. The DNA Identification Act of 1994 gives this responsibility to a DNA Advisory Board appointed by the FBI. We recognize the need for guidelines and standards, and for accreditation by appropriate organizations. Recommendation 3.1. Laboratories should adhere to high quality standards (such as those defined by TWGDAM and the DNA Advisory Board) and make every effort to be accredited for DNA work (by such organizations as ASCLD-LAB). Proficiency Tests Regular proficiency tests, both within the laboratory and by external examiners, are one of the best ways of assuring high standards. To the extent that it is feasible, some of the tests should be blind. Recommendation 3.2: Laboratories should participate regularly in proficiency tests, and the results should be available for court proceedings. Duplicate Tests We recognize that no amount of care and proficiency testing can eliminate the possibility of error. However, duplicate tests, performed as independently as possible, can reduce the risk of error enormously. The best protection that an innocent suspect has against an error that could lead to a false conviction is the opportunity for an independent retest. Recommendation 3.3: Whenever feasible, forensic samples should be divided into two or more parts at the earliest practicable stage and the unused parts retained to permit additional tests. The used and saved portions should be stored and handled separately. Any additional tests should be performed independently of the first by personnel not involved in the first test and preferably in a different laboratory.

OCR for page 9
Page 38 Population Genetics (Chapter 4) Sufficient data now exist for various groups and subgroups within the United States that analysts should present the best estimates for profile frequencies. For VNTRs, using the 2p rule for single bands and HW for double bands is generally conservative for an individual locus. For multiple loci, departures from linkage equilibrium are not great enough to cause errors comparable to those from uncertainty of allele frequencies estimated from databases. With appropriate consideration of the data, the principles in this report can be applied to PCR-based tests. For those in which exact genotypes can be determined, the 2p rule should not be used. A conservative estimate is given by using the HW relation for heterozygotes and a conservative value of in Equation 4.4a for homozygotes. Recommendation 4.1: In general, the calculation of a profile frequency should be made with the product rule. If the race of the person who left the evidence-sample DNA is known, the database for the person's race should be used; if the race is not known, calculations for all racial groups to which possible suspects belong should be made. For systems such as VNTRs, in which a heterozygous locus can be mistaken for a homozygous one, if an upper bound on the genotypic frequency at an apparently homozygous locus (single band) is desired, then twice the allele (bin) frequency, 2p, should be used instead of p2. For systems in which exact genotypes can be determined, should be used for the frequency at such a locus instead of p2. A conservative value of for the US population is 0.01; for some small, isolated populations, a value of 0.03 may be more appropriate. For both kinds of systems, 2pipj should be used for heterozygotes. A more conservative value of = 0.03 might be chosen for PCR-based systems in view of the greater uncertainty of calculations for such systems because of less extensive and less varied population data than for VNTRs. Evidence DNA and Suspect from the Same Subgroup Sometimes there is evidence that the suspect and other possible sources of the sample belong to the same subgroup. That can happen, e.g., if they are all members of an isolated village. In this case, a modification of the procedure is desirable. Recommendation 4.2: If the particular subpopulation from which the evidence sample came is known, the allele frequencies for the specific subgroup should be used as described in Recommendation 4.1. If allele frequencies for the subgroup are not available, although data for the full population are, then the calculations should use the population-structure Equations 4.10 for each locus, and the resulting values should then be multiplied.

OCR for page 9
Page 39 Insufficient Data For some groups—and several American Indian and Inuit tribes are in this category—there are insufficient data to estimate frequencies reliably, and even the overall average might be unreliable. In this case, data from other, related groups provide the best information. The groups chosen should be the most closely related for which adequate databases exist. These might be chosen because of geographical proximity, or a physical anthropologist might be consulted. There should be a limit on the number of such subgroups analyzed to prevent inclusion of more remote groups less relevant to the case. Recommendation 4.3: If the person who contributed the evidence sample is from a group or tribe for which no adequate database exists, data from several other groups or tribes thought to be closely related to it should be used. The profile frequency should be calculated as described in Recommendation 4.1 for each group or tribe. Dealing with Relatives In some instances, there is evidence that one or more relatives of the suspect are possible perpetrators. Recommendation 4.4: If the possible contributors of the evidence sample include relatives of the suspect, DNA profiles of those relatives should be obtained. If these profiles cannot be obtained, the probability of finding the evidence profile in those relatives should be calculated with Formulae 4.8 or 4.9. Statistical Issues (Chapter 5) Confidence limits for profile probabilities, based on allele frequencies and the size of the database, can be calculated by methods explained in this report. We recognize, however, that confidence limits address only part of the uncertainty. For a more realistic estimate, we examined empirical data from the comparison of different subpopulations and of subpopulations within the whole. The empirical studies show that the differences between the frequencies of the individual profiles estimated by the product rule from different adequate subpopulation databases (at least several hundred persons) are within a factor of about 10 of each other, and that provides a guide to the uncertainty of the determination for a single profile. For very small estimated profile frequencies, the uncertainty can be greater, both because of the greater relative uncertainty of individually small probabilities and because more loci are likely to be multiplied. But with very small probabilities, even a larger relative error is not likely to change the conclusion. Database Searches If the suspect is identified through a DNA database search, the interpretation of the match probability and likelihood ratio given in Chapter 4 should be modified.

OCR for page 9
Page 40 Recommendation 5.1: When the suspect is found by a search of DNA databases, the random-match probability should be multiplied by N, the number of persons in the database. If one wishes to describe the impact of the DNA evidence under the hypothesis that the source of the evidence sample is someone in the database, then the likelihood ratio should be divided by N. As databases become more extensive, another problem may arise. If the database searched includes a large proportion of the population, the analysis must take that into account. In the extreme case, a search of the whole population should, of course, provide a definitive answer. Uniqueness With an increasing number of loci available for forensic analysis, we are approaching the time when each person's profile is unique (except for identical twins and possibly other close relatives). Suppose that, in a population of N unrelated persons, a given DNA profile has probability P. The probability (before a suspect has been profiled) that the particular profile observed in the evidence sample is not unique is at most NP. A lower bound on the probability that every person is unique depends on the population size, the number of loci, and the heterozygosity of the individual loci. Neglecting population structure and close relatives, 10 loci with a geometric mean heterozygosity of 95% give a probability greater than about 0.999 that no two unrelated persons in the world have the same profile. Once it is decided what level of probability constitutes uniqueness, appropriate calculations can readily be made. We expect that the calculation in the first paragraph will be the one more often employed. Matching and Binning VNTR data are essentially continuous, and, in principle, a continuous model should be used to analyze them. The methods generally used, however, involve taking measurement uncertainty into account by determining a match window. Two procedures for determining match probabilities are the floating-bin and the fixed-bin methods. The floating-bin method is statistically preferable but requires access to a computerized database. The fixed-bin method is more widely used and understood, and the necessary data tables are widely and readily available. When our fixed-bin recommendation is followed, the two methods lead to very similar results. Both methods are acceptable. Recommendation 5.2: If floating bins are used to calculate the random-match probabilities, each bin should coincide with the corresponding match window. If fixed bins are employed, then the fixed bin that has the largest frequency among those overlapped by the match window should be used.

OCR for page 9
Page 41 Ceiling Principles The abundance of data in different ethnic groups within the major races and the genetically and statistically sound methods recommended in this report imply that both the ceiling principle and the interim ceiling principle are unnecessary. Further Research The rapid rate of discovery of new markers in connection with human gene-mapping should lead to many new markers that are highly polymorphic, mutable, and selectively neutral, but which, unlike VNTRs, can be amplified by PCR and for which individual alleles can usually be distinguished unambiguously with none of the statistical problems associated with matching and binning. Furthermore, radioactive probes need not be used with many other markers, so identification can be prompt and problems associated with using radioactive materials can be avoided. It should soon be possible to have systems so powerful that no statistical and population analyses will be needed, and (except possibly for close relatives) each person in a population can be uniquely identified. Recommendation 5.3: Research into the identification and validation of more and better marker systems for forensic analysis should continue with a view to making each profile unique. Legal Issues (Chapter 6) In assimilating scientific developments, the legal system necessarily lags behind the scientific world. Before making use of evidence derived from scientific advances, courts must scrutinize the proposed testimony to determine its suitability for use at trial, and controversy within the scientific community often is regarded as grounds for the exclusion of the scientific evidence. Although some controversies that have come to closure in the scientific literature continue to limit the presentation of DNA evidence in some jurisdictions, courts are making more use of the ongoing research into the population genetics of DNA profiles. We hope that our review of the research will contribute to this process. Our conclusions and recommendations for reducing the risk of laboratory error, for applying human population genetics to DNA profiles, and for handling uncertainties in estimates of profile frequencies and match probabilities might affect the application of the rules for the discovery and admission of evidence in court. Many suggestions can be offered to make our recommendations most effective: for example, that every jurisdiction should make it possible for all defendants to have broad discovery and independent experts; that accreditation, proficiency testing, and the opportunity for independent testing (whenever feasible) should be prerequisites to the admission of laboratory findings; that in resolving disputes over the adequacy or interpretation of DNA tests, the power

OCR for page 9
Page 42 of the court to appoint its own experts should be exercised more frequently; and that experts should not be barred from presenting any scientifically acceptable estimate of a random-match probability. We have chosen, however, to make no formal recommendations on such matters of legal policy; we do, however, make a recommendation concerning scientific evidence—namely, the need for behavioral research that will assist legal decision makers in developing standards for communicating about DNA in the courtroom. Recommendation 6.1: Behavioral research should be carried out to identify any conditions that might cause a trier of fact to misinterpret evidence on DNA profiling and to assess how well various ways of presenting expert testimony on DNA can reduce such misunderstandings. We trust that our efforts to explain the state of the forensic science and some of the social-science findings that are pertinent to resolving these issues will contribute to better-informed judgments by courts and legislatures. Illustrative Example A Typical Case As an illustration, we have chosen an example that involves VNTR loci. The methods used for the other systems are very similar, except that they usually do not involve the complications of matching and binning, so the more complicated situation is better for illustration. We shall analyze the same data in several ways. Suppose that samples of blood are obtained from a crime scene and DNA from two suspects, 1 and 2. We should like to know whether the profile of either suspect matches the profile of the evidence DNA. First we isolate the DNA from the three samples, making sure that all three have been handled separately and that each step in the chain of custody has been checked and documented. The DNA is first cut into small segments by an enzyme, Hae III. The fragments from the evidence sample (E) and from the two suspects (S1 and S2) are placed in small wells in the gel, each sample in a separate lane. Along with these three are a number of controls, as illustrated in Figure 0.4, each with its own lane. The laboratory has been careful not to put any of the three DNA samples into adjacent lanes to prevent possible leakage of DNA into the wrong lane. After being placed in an electric field for a carefully determined time, the DNA in all the lanes is transferred by blotting to a nylon membrane (stronger and easier to handle than the gel). Then a radioactive probe that is specific for locus D2S44 is flooded onto the membrane. The probe adheres to the corresponding region in the DNA sample, and any nonadhering probes are washed off. The membrane is then placed in contact with a photographic film to prepare an autorad. Figure 0.7 illustrates the result in this case.

OCR for page 9
Page 43 The rough size of the fragment can be determined from the scale in the figure. In practice, the scale is a ladder, a group of DNA fragments that differ from each other in increments of approximately 1,000 base pairs (the ladder can be seen in Figure 0.4) It is immediately apparent (Figure 0.7) that E and S1 match as far as the eye can tell, but that S2 is clearly different. That alone is sufficient to exclude S2 as a suspect. The sizes of the six bands are determined by comparison with the ladder. This operation is ordinarily done by a computer programmed to scan the autorad and measure the sizes of the bands. Figure 0.7 Diagram of a hypothetical autorad for evidence DNA (E) and two suspects (S1 and S2). Note that E and S1 appear to match, whereas S2 is clearly not  the source of the evidence DNA. The numbers at the two sides are numbers of base pairs.

OCR for page 9
Page 44 TABLE 0.2  The Uncertainty Windows for a VNTR Marker (Probe D2S44) in an Illustrative Example Source Band Size 2.5% Uncertainty  Window E Larger 1,901 48 1,853-1,949   Smaller 1.137 28 1,109-1,165 S1 Larger 1,876 47 1,829-1,923   Smaller 1,125 28 1,097-1,153 S2 Larger 3,455 86 3,369-3,541   Smaller 1,505 38 1,467-1.543 The calculations (or computer output) are shown in Table 0.2. The measured value of each band is given, along with upper and lower limits of the uncertainty window, which spans the range from  2.5%  below  to 2.5%  above the measured value. Comparing the uncertainty window of S1 and E for the smaller band, we see that the windows overlap; the upper limit of Sl, 1,153, is within the range, 1,109 to 1,165, of E. Likewise, the uncertainty windows of the larger bands also overlap. In contrast, the uncertainty windows for the two bands from S2 do not overlap any of the evidence bands. So our visual impression is confirmed by the measurements. S2 is cleared, whereas S1 remains as a possible source of the evidence DNA. The next step is to compute the size of the match window (Table 0.3), which will be used to find the frequency of this marker in a relevant database of DNA marker frequencies. This is the measurement E plus and minus 5% of TABLE 0.3 Match Windows and Frequencies for Several VNTR Markers in an Illustrative Example Locus Band Size 5% Match Window Bin(s) Freq. D2S44 Larger 1,901 95 1,806-1,996 11, 12 0.083   Smaller 1,137 57 1,080-1,194 6 0.024 D17S79 Larger 1,685 84 1,601-1,769 9, 10 0.263   Smaller 1,120 56 1,064-1,176 4, 6 0.015 D1S7 Single       14 0.068 D4S139 Larger       10 0.072   Smaller       13 0.131 D10S28 Larger       9 0.047   Smaller       16 0.065 Probability of a random match, 5 loci: P = 2(0.083)(0.024) x 2(0.263)(0.015) x 2(0.068) x 2(0.072)(0.131) x 2(0.047)(0.065) = 1/(2 billion) Uncertainty range: 1/(200 million) to 1/(20 billion)

OCR for page 9
Page 45 its value. So for the larger band the limits are 1,901-95 and 1,901 + 95, or 1,806 to 1,996. We then look at a bin-frequency table, shown in Table 0.1 (p 20). The table shows that the lower limit, 1,806, lies in bin 11, and the upper limit, 1,996, is in bin 12. Notice that the frequency of the alleles in bin 11 is 0.083 and that in bin 12 is 0.050, so we take the larger value, 0.083. This is shown as the frequency in the rightmost column of Table 0.3. Continuing, we find the size of the smaller band of E is 1,137, and its lower and upper limits are 1,080 and 1,194. Both of these values are within bin 6 in Table 0.1. Its frequency is 0.024, shown in the right column of Table 0.3. Now the membrane is ''stripped," meaning that the probes are washed off. Then the membrane is flooded with a new set of probes, this time specific for locus D17S79. Assume that the measurements of E are 1,685 and 1,120, and that the uncertainty windows of E and S1 again overlap. The ± 5% match window for the larger band is 1,601 to 1,769, and comparing this with Table 0.1 shows that the match window overlaps bins 9 and 10, of which 9 has the higher frequency, 0.263. In the same way, the match window for the smaller band overlaps bands 4 and 6, and the larger frequency is 0.015. Again, the membrane is stripped and a new probe specific for DI S7 is added. This time, there is only one band. The individual is either homozygous, or heterozygous and the second band did not appear on the gel. So we apply the 2p rule, doubling the frequency from 0.068 to 0.136. Now the process is continued through two more probes, D4S139 and D10S28, with the frequencies shown in the Table 0.3. (If you wish, you may verify these numbers from Table 4.5, p 101, which also shows frequencies for black and Southeastern Hispanic databases.) The next step is to compute the probability that a randomly chosen person has the same profile as the evidence sample, E. For this, we use the product rule with the 2p rule for the single band. For each double band, we compute twice the product of the two frequencies. For the single band, we use twice the allele frequency. Thus, going down through the table, the probability is The maximum uncertainty of this estimate is about 10-fold in either direction, so the true value is estimated to lie between 1 in 200 million and 1 in 20 billion. Suspect Found by Searching a Database In the example above, we assumed that the suspect was found through an eyewitness, circumstantial evidence, or from some other information linking him to the crime. Now assume that the suspect was found by searching a database. If the database consists of 10,000 profiles, we follow the rule of multiplying the calculated probability by that number. Thus, the match probability, instead of one in 2 billion, is 10,000 times greater, or one in 200,000.

OCR for page 9
Page 46 Suspect and Evidence from the Same Subpopulation It might be that the crime took place in a very small, isolated village, and the source of the evidence and suspect are both known to be from that village. In that case, we use the modified Equation 0.2b. Consider first D2S44, in which p1 = 0.083 and p2 = 0.024. Suppose that the village is very small and that we wish to be very conservative, so we take . The probability from Equation 0.2b is Continuing in the same way through the other four loci, using Equation 0.2a for D1S7, and multiplying the results gives about 1/(600 million). A PCR-Based System We shall not give a specific example for a PCR-based system. The reason is that the situation is simpler, since there is usually no matching and binning. The detailed procedures are specific for each system and will not be repeated here. The techniques in general (e.g., for STRs) are the same as for VNTRs. They involve positions of bands in gels and photographs of the bands. The methods often use chemical stains rather than radioactive probes; that saves time. The allele frequency is determined directly from the database, and the calculations of match probabilities and likelihood ratios are exactly the same as those just illustrated.