The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 129
Reference Guide on DNA Identification Evidence DAVID H. KAYE AND GEORGE SENSABAUGH David H. Kaye, M.A., J.D., is Distinguished Professor of Law, Weiss Family Scholar, and Graduate Faculty Member, Forensic Science Program, The Pennsylvania State University, University Park, and Regents’ Professor Emeritus, Arizona State University Sandra Day O’Connor College of Law and School of Life Sciences, Tempe. George Sensabaugh, D.Crim., is Professor of Biomedical and Forensic Sciences, School of Public Health, University of California, Berkeley. ConTenTs I. Introduction, 131 A. Summary of Contents, 131 B. A Brief History of DNA Evidence, 132 C. Relevant Expertise, 134 II. Variation in Human DNA and Its Detection, 135 A. What Are DNA, Chromosomes, and Genes? 136 B. What Are DNA Polymorphisms and How Are They Detected? 139 1. Sequencing, 139 2. Sequence-specific probes and SNP chips, 140 3. VNTRs and RFLP testing, 140 4. STRs, 141 5. Summary, 142 C. How Is DNA Extracted and Amplified? 143 D. How Is STR Profiling Done with Capillary Electrophoresis? 144 E. What Can Be Done to Validate a Genetic System for Identification? 148 F. What New Technologies Might Emerge? 148 1. Miniaturized “lab-on-a-chip” devices, 148 2. High-throughput sequencing, 149 3. Microarrays, 150 4. What questions do the new technologies raise? 150 III. Sample Collection and Laboratory Performance, 151 A. Sample Collection, Preservation, and Contamination, 151 1. Did the sample contain enough DNA? 151 2. Was the sample of sufficient quality? 152 129

OCR for page 129
Reference Manual on Scientific Evidence B. Laboratory Performance, 153 1. What forms of quality control and assurance should be followed? 153 2. How should samples be handled? 156 IV. Inference, Statistics, and Population Genetics in Human Nuclear DNA Testing, 159 A. What Constitutes a Match or an Exclusion? 159 B. What Hypotheses Can Be Formulated About the Source? 160 C. Can the Match Be Attributed to Laboratory Error? 161 D. Could a Close Relative Be the Source? 162 E. Could an Unrelated Person Be the Source? 163 1. Estimating allele frequencies from samples, 164 2. The product rule for a randomly mating population, 165 3. The product rule for a structured population, 166 F. Probabilities, Probative Value, and Prejudice, 167 1. Frequencies and match probabilities, 167 2. Likelihood ratios, 172 3. Posterior probabilities, 173 G. Verbal Expressions of Probative Value, 174 1. “Rarity” or “strength” testimony, 175 2. Source or uniqueness testimony, 175 V. Special Issues in Human DNA Testing, 176 A. Mitochondrial DNA, 176 B. Y Chromosomes, 181 C. Mixtures, 182 D. Offender and Suspect Database Searches, 186 1. Which statistics express the probative value of a match to a defendant located by searching a DNA database? 186 2. Near-miss (familial) searching, 189 3. All-pairs matching within a database to verify estimated random-match probabilities, 191 VI. Nonhuman DNA Testing, 193 A. Species and Subspecies, 193 B. Individual Organisms, 195 Glossary of Terms, 199 References on DNA, 210 130

OCR for page 129
Reference Guide on DNA Identification Evidence I. Introduction Deoxyribonucleic acid, or DNA, is a molecule that encodes the genetic informa- tion in all living organisms. Its chemical structure was elucidated in 1954. More than 30 years later, samples of human DNA began to be used in the criminal justice system, primarily in cases of rape or murder. The evidence has been the subject of extensive scrutiny by lawyers, judges, and the scientific community. It is now admissible in all jurisdictions, but there are many types of forensic DNA analysis, and still more are being developed. Questions of admissibility arise as advancing methods of analysis and novel applications of established methods are introduced.1 This reference guide addresses technical issues that are important when con- sidering the admissibility of and weight to be accorded analyses of DNA, and it identifies legal issues whose resolution requires scientific information. The goal is to present the essential background information and to provide a framework for resolving the possible disagreements among scientists or technicians who testify about the results and import of forensic DNA comparisons. A. Summary of Contents Section I provides a short history of DNA evidence and outlines the types of scientific expertise that go into the analysis of DNA samples. Section II provides an overview of the scientific principles behind DNA typ- ing. It describes the structure of DNA and how this molecule differs from person to person. These are basic facts of molecular biology. The section also defines the more important scientific terms and explains at a general level how DNA differences are detected. These are matters of analytical chemistry and laboratory procedure. Finally, the section indicates how it is shown that these differences permit individuals to be identified. This is accomplished with the methods of probability and statistics. Section III considers issues of sample quantity and quality as well as laboratory performance. It outlines the types of information that a laboratory should produce to establish that it can analyze DNA reliably and that it has adhered to established laboratory protocols. Section IV examines issues in the interpretation of laboratory results. To assist the courts in understanding the extent to which the results incriminate the defen- dant, it enumerates the hypotheses that need to be considered before concluding that the defendant is the source of the crime scene samples, and it explores the 1. For a discussion of other forensic identification techniques, see Paul C. Giannelli et al., Ref- erence Guide on Forensic Identification Expertise, in this manual. See also David H. Kaye et al., The New Wigmore, A Treatise on Evidence: Expert Evidence (2d ed. 2011). 131

OCR for page 129
Reference Manual on Scientific Evidence issues that arise in judging the strength of the evidence. It focuses on questions of statistics, probability, and population genetics.2 Section V describes special issues in human DNA testing for identification. These include the detection and interpretation of mixtures, Y-STR testing, mitochondrial DNA testing, and the evidentiary implications of DNA database searches of various kinds. Finally, Section VI discusses the forensic analysis of nonhuman DNA. It iden- tifies questions that can be useful in judging whether a new method or application of DNA science has the scientific merit and power claimed by the proponent of the evidence. A glossary defines selected terms and acronyms encountered in genetics, molecular biology, and forensic DNA work. B. A Brief History of DNA Evidence “DNA evidence” refers to the results of chemical or physical tests that directly reveal differences in the structure of the DNA molecules found in organisms as diverse as bacteria, plants, and animals.3 The technology for establishing the iden- tity of individuals became available to law enforcement agencies in the mid to late 1980s.4 The judicial reception of DNA evidence can be divided into at least five phases.5 The first phase was one of rapid acceptance. Initial praise for RFLP (restriction fragment length polymorphism) testing in homicide, rape, paternity, and other cases was effusive. Indeed, one judge proclaimed “DNA fingerprinting” to be “the single greatest advance in the ‘search for truth’ . . . since the advent of cross-examination.”6 In this first wave of cases, expert testimony for the prosecu- tion rarely was countered, and courts readily admitted DNA evidence. In a second wave of cases, however, defendants pointed to problems at two levels—controlling the experimental conditions of the analysis and interpreting the results. Some scientists questioned certain features of the procedures for extracting and analyzing DNA employed in forensic laboratories, and it became apparent 2. For a broader discussion of statistics, see David H. Kaye & David A. Freedman, Reference Guide on Statistics, in this manual. 3. Differences in DNA also can be revealed by differences in the proteins that are made accord- ing to the “instructions” in a DNA molecule. Blood group factors, serum enzymes and proteins, and tissue types all reveal information about the DNA that codes for these chemical structures. Such immunogenetic testing predates the “direct” DNA testing that is the subject of this chapter. On the nature and admissibility of the “indirect” DNA testing, see, for example, David H. Kaye, The Double Helix and the Law of Evidence 5–19 (2010); 1 McCormick on Evidence § 205(B) (Kenneth Broun ed., 6th ed. 2006). 4. The first reported appellate opinion is Andrews v. State, 533 So. 2d 841 (Fla. Dist. Ct. App. 1988). 5. The description that follows is adapted from 1 McCormick on Evidence, supra note 3, § 205(B). 6. People v. Wesley, 533 N.Y.S.2d 643, 644 (Alb. County. Ct. 1988). 132

OCR for page 129
Reference Guide on DNA Identification Evidence that declaring matches or nonmatches in the DNA variations being compared was not always trivial. Despite these concerns, most cases continued to find the DNA analyses to be generally accepted, and a number of states provided for admissibility of DNA tests by legislation. Concerted attacks by defense experts of impressive credentials, however, produced a few cases rejecting specific proffers on the ground that the testing was not sufficiently rigorous.7 A different attack on DNA profiling begun in cases during this period proved far more successful and led to a third wave of cases in which many courts held that estimates of the probability of a coincidentally matching DNA profile were inadmissible. These estimates relied on a simple population genetics model for the frequencies of DNA profiles, and some prominent scientists claimed that the appli- cability of the mathematical model had not been adequately verified. A heated debate on this point spilled over from courthouses to scientific journals and con- vinced the supreme courts of several states that general acceptance was lacking. A 1992 report of the National Academy of Sciences proposed a more “conservative” computational method as a compromise,8 and this seemed to undermine the claim of scientific acceptance of the less conservative procedure that was in general use. In response to the population genetics criticism and the 1992 report came an outpouring of critiques of the report and new studies of the distribution of the DNA variations in many populations. Relying on the burgeoning literature, a second National Academy panel concluded in 1996 that the usual method of estimating fre- quencies in broad racial groups generally was sound, and it proposed improvements and additional procedures for estimating frequencies in subgroups within the major population groups.9 In the corresponding fourth phase of judicial scrutiny of DNA evidence, the courts almost invariably returned to the earlier view that the statistics associated with DNA profiling are generally accepted and scientifically valid. In the fifth phase of the judicial evaluation of DNA evidence, results obtained with the newer “PCR-based methods” entered the courtroom. Once again, courts considered whether the methods rested on a solid scientific foundation and were generally accepted in the scientific community. The opinions are practically unanimous in holding that the PCR-based procedures satisfy these standards. Before long, forensic scientists settled on the use of one type of DNA variation (known as short tandem repeats, or STRs) to include or exclude individuals as the source of crime scene DNA. 7. Moreover, a minority of courts, perhaps concerned that DNA evidence might be conclusive in the minds of jurors, added a “third prong” to the general-acceptance standard of Frye v. United States, 293 F. 1013 (D.C. Cir. 1923). This augmented Frye test requires not only proof of the general acceptance of the ability of science to produce the type of results offered in court, but also of the proper application of an approved method on the particular occasion. For criticism of this approach, see David H. Kaye et al., supra note 1, § 6.3.3(a)(2). 8. National Research Council, DNA Technology in Forensic Science (1992) [hereinafter NRC I]. 9. National Research Council, The Evaluation of Forensic DNA Evidence (1996) [hereinafter NRC II]. 133

OCR for page 129
Reference Manual on Scientific Evidence Throughout these phases, DNA tests also exonerated an increasing number of men who had been convicted of capital and other crimes, posing a challenge to traditional postconviction remedies and raising difficult questions of postcon- viction access to DNA samples.10 The value of DNA evidence in solving older crimes also prompted extensions of some statutes of limitations.11 In sum, in little more than a decade, forensic DNA typing made the transition from a novel set of methods for identification to a relatively mature and well- studied forensic technology. However, one should not lump all forms of DNA identification together. New techniques and applications continue to emerge, ranging from the use of new genetic systems and new analytical procedures to the typing of DNA from plants and animals. Before admitting such evidence, courts normally inquire into the biological principles and knowledge that would justify inferences from these new technologies or applications. As a result, this guide describes not only the predominant STR technology, but also newer analytical techniques that can be used for forensic DNA identification. C. Relevant Expertise Human DNA identification can involve testimony about laboratory findings, about the statistical interpretation of those findings, and about the underlying principles of molecular biology. Consequently, expertise in several fields might be required to establish the admissibility of the evidence or to explain it adequately to the jury. The expert who is qualified to testify about laboratory techniques might not be qualified to testify about molecular biology, to make estimates of popula- tion frequencies, or to establish that an estimation procedure is valid.12 10. See, e.g., Osborne v. District Attorney’s Office for Third Judicial District, 129 S. Ct. 2308 (2009) (narrowly rejecting a convicted offender’s claim of a due process right to DNA testing at his expense, enforceable under 42 U.S.C. § 1983, to establish that he is probably innocent of the crime for which he was convicted after a fair trial, when (1) the convicted offender did not seek extensive DNA testing before trial even though it was available, (2) he had other opportunities to prove his innocence after a final conviction based on substantial evidence against him, (3) he had no new evidence of innocence (only the hope that more extensive DNA testing than that done before the trial would exonerate him), and (4) even a finding that he was not source of the DNA would not conclusively demonstrate his innocence); Skinner v. Switzer, 131 S. Ct. 1289 (2011); Brandon L. Garrett, Judging Innocence, 108 Colum. L. Rev. 55 (2008); Brandon L. Garrett, Claiming Innocence, 92 Minn. L. Rev. 1629 (2008). 11. See, e.g., Veronica Valdivieso, DNA Warrants: A Panacea for Old, Cold Rape Cases? 90 Geo. L.J. 1009 (2002). 12. Nonetheless, if previous cases establish that the testing and estimation procedures are legally acceptable, and if the computations are essentially mechanical, then highly specialized statistical exper- tise might not be essential. Reasonable estimates of DNA characteristics in major population groups can be obtained from standard references, and many quantitatively literate experts could use the appropriate formulae to compute the relevant profile frequencies or probabilities. NRC II, supra note 9, at 170. Limitations in the knowledge of a technician who applies a generally accepted statistical procedure can be explored on cross-examination. See Kaye et al., supra note 1, § 2.2. Accord Roberson v. State, 16 S.W.3d 156, 168 (Tex. Crim. App. 2000). 134

OCR for page 129
Reference Guide on DNA Identification Evidence Trial judges ordinarily are accorded great discretion in evaluating the qualifi- cations of a proposed expert witness, and the decisions depend on the background of each witness. Courts have noted the lack of familiarity of academic experts— who have done respected work in other fields—with the scientific literature on forensic DNA typing and on the extent to which their research or teaching lies in other areas.13 Although such concerns may affect the persuasiveness of particular testimony, they rarely result in exclusion on the grounds that the witness simply is not qualified as an expert. The scientific and legal literature on the objections to DNA evidence is extensive. By studying the scientific publications, or perhaps by appointing a spe- cial master or expert adviser to assimilate this material, a court can ascertain where a party’s expert falls within the spectrum of scientific opinion. Furthermore, an expert appointed by the court under Federal Rule of Evidence 706 could testify about the scientific literature generally or even about the strengths or weaknesses of the particular arguments advanced by the parties. Given the great diversity of forensic questions to which DNA testing might be applied, it is not feasible to list the specific scientific expertise appropriate to all applications. Assessing the value of DNA analyses of a novel application involv- ing unfamiliar species can be especially challenging. If the technology is novel, expertise in molecular genetics or biotechnology might be necessary. If testing has been conducted on a particular organism or category of organisms, expertise in that area of biology may be called for. If a random-match probability has been presented, one might seek expertise in statistics as well as the population biology or population genetics that goes with the organism tested. Given the penetration of molecular technology into all areas of biological inquiry, it is likely that indi- viduals can be found who know both the technology and the population biology of the organism in question. Finally, when samples come from crime scenes, the expertise and experience of forensic scientists can be crucial. Just as highly focused specialists may be unaware of aspects of an application outside their field of exper- tise, so too scientists who have not previously dealt with forensic samples can be unaware of case-specific factors that can confound the interpretation of test results. II. Variation in Human DNA and Its Detection DNA is a complex molecule that contains the “genetic code” of organisms as diverse as bacteria and humans. Although the DNA molecules in human cells are 13. E.g., State v. Copeland, 922 P.2d 1304, 1318 n.5 (Wash. 1996) (noting that defendant’s statistical expert “was also unfamiliar with publications in the area,” including studies by “a leading expert in the field” whom he thought was “a ‘guy in a lab somewhere’”). 135

OCR for page 129
Reference Manual on Scientific Evidence largely identical from one individual to another, there are detectable variations— except for identical twins, every two human beings have some differences in the detailed structure of their DNA. This section describes the basic features of DNA and some ways in which it can be analyzed to detect these differences. A. What Are DNA, Chromosomes, and Genes? The DNA molecule is made of subunits that include four chemical structures known as nucleotide bases. The names of these bases (adenine, thymine, guanine, and cytosine) usually are abbreviated as A, T, G, and C. The physical structure of DNA is often described as a double helix because the molecule has two spiraling strands connected to each other by weak bonds between the nucleotide bases. As shown in Figure 1, A pairs only with T and G only with C. Thus, the order of the single bases on either strand reveals the order of the pairs from one end of the molecule to the other, and the DNA molecule could be said to be like a long sequence of As, Ts, Gs, and Cs. Figure 1. Sketch of a small part of a double-stranded DNA molecule. Nucleotide bases are held together by weak bonds. A pairs with T; C pairs with G. Most human DNA is tightly packed into structures known as chromo- somes, which come in different sizes and are located in the nuclei of cells. The chromosomes are numbered4-1descending order of size) 1 through 22, with the (in fixed image remaining chromosome being an X or a much smaller Y. If the bases are like letters, then each chromosome is like a book written in this four-letter alphabet, and the nucleus is like a bookshelf in the interior of the cell. All the cells in one 136

OCR for page 129
Reference Guide on DNA Identification Evidence individual contain identical copies of the same collection of books. The sequence of the As, Ts, Gs, and Cs that constitutes the “text” of these books is referred to as the individual’s nuclear genome. All told, the genome comprises more than three billion “letters” (As, Ts, Gs, and Cs). If these letters were printed in books, the resulting pile would be as high as the Washington Monument. About 99.9% of the genome is identical between any two individuals. This similarity is not really surprising—it accounts for the common features that make humans an identifiable species (and for features that we share with many other species as well). The remaining 0.1% is particular to an individual. This variation makes each person (other than identical twins) geneti- cally unique. This small percentage may not sound like a lot, but it adds up to some three million sites for variation among individuals. The process that gives rise to this variation among people starts with the pro- duction of special sex cells—sperm cells in males and egg cells in females. All the nucleated cells in the body other than sperm and egg cells contain two versions of each of the 23 chromosomes—two copies of chromosome 1, two copies of chromo- some 2, and so on, for a total of 46 chromosomes. The X and Y chromosomes are the sex-determining chromosomes. Cells in females contain two X chromosomes, and cells in males contain one X and one Y chromosome. An egg cell, however, contains only 23 chromosomes—one chromosome 1, one chromosome 2, . . . , and one X chromosome—each selected at random from the woman’s full complement of 23 chromosome pairs. Thus, each egg carries half the genetic information present in the mother’s 23 chromosome pairs, and because the assortment of the chromo- somes is random, each egg carries a different complement of genetic information. The same situation exists with sperm cells. Each sperm cell contains a single copy of each of the 23 chromosomes selected at random from a man’s 23 pairs, and each sperm differs in the assortment of the 23 chromosomes it carries. Fertilization of an egg by a sperm therefore restores the full number of 46 chromosomes, with the 46 chromosomes in the fertilized egg being a new combination of those in the mother and father. The process resembles taking two decks of cards (a male and a female deck) and shuffling a random half from the male deck into a random half from the female deck, to produce a new deck. During pregnancy, the fertilized cell divides to form two cells, each of which has an identical copy of the 46 chromosomes. The two then divide to form four, the four form eight, and so on. As gestation proceeds, various cells specialize (“differentiate”) to form different tissues and organs. Although cell differentiation yields many different kinds of cells, the process of cell division results in each prog- eny cell having the same genomic complement as the cell that divided. Thus, each of the approximately 100 trillion cells in the adult human body has the same DNA text as was present in the original 23 pairs of chromosomes from the fertilized egg, one member of each pair having come from the mother and one from the father. A second mechanism operating during the chromosome reduction process in sperm and egg cells further shuffles the genetic information inherited from mother 137

OCR for page 129
Reference Manual on Scientific Evidence and father. In the first stage of the reduction process, each chromosome of a chromosome pair aligns with its partner. The maternally inherited chromosome 1 aligns with the paternally inherited chromosome 1, and so on through the 22 pairs; X chromosomes align with each other as well, but X and Y chromosomes do not. While the chromosome pairs are aligned, they exchange pieces to create new com- binations. The recombined chromosomes are passed on in the sperm and eggs. As a consequence, the chromosomes we inherit from our parents are not exact copies of their chromosomes, but rather are mosaics of these parental chromosomes. The swapping of material between chromosome pairs (as they align in the emerging sex cells) and the random selection (of half of each parent’s 46 chromo- somes) in making sex cells is called recombination. Recombination is the principal source of diversity in individual human genomes. The diverse variations occur both within the genes and in the regions of DNA sequences between the genes. A gene can be defined as a segment of DNA, usually from 1000 to 10,000 base pairs long, that “codes” for a protein. The cell produces specific proteins that correspond to the order of the base pairs (the “letters”) in the coding part of the gene.14 Human genes also contain noncoding sequences that regulate the cell type in which a protein will be synthesized and how much protein will be produced.15 Many genes contain interspersed non- coding, nonregulatory sequences that no longer participate in protein synthesis. These sequences, which have no apparent function, constitute about 23% of the base pairs within human genes.16 In terms of the metaphor of DNA as text, the gene is like an important paragraph in the book, often with some gibberish in it. Proteins perform all sorts of functions in the body and thus produce observ- able characteristics. For example, a tiny part of the sequence that directs the pro- duction of the human group-specific complement protein (a protein that binds to vitamin D and transports it to certain tissues) is G C A A A A T T G C C T G A T G C C A C A C C C A A G G A A C T G G C A. 14. The sequence in which the building blocks (amino acids) of a protein are arranged corre- sponds to the sequence of base pairs within a gene. (A sequence of three base pairs specifies a particular 1 of the 20 possible amino acids in the protein. The mapping of a set of three nucleotide bases to a par- ticular amino acid is the genetic code. The cell makes the protein through intermediate steps involving coding RNA transcripts.) About 1.5% of the human genome codes for the amino acid sequences. 15. These noncoding but functional sequences include promoters, enhancers, and repressors. 16. This gene-related DNA consists of introns (which interrupt the coding sequences, called exons, in genes and which are edited out of the RNA transcript for the protein), pseudogenes (evo- lutionary remnants of once-functional genes), and gene fragments. The idea of a gene as a block of DNA (some of which is coding, some of which is regulatory, and some of which is functionless) is an oversimplification, but it is useful enough here. See, e.g., Mark B. Gerstein et al., What Is a Gene, Post-ENCODE? History and Updated Definition, 17 Genome Res. 669 (2007). 138

OCR for page 129
Reference Guide on DNA Identification Evidence This gene always is located at the same position, or locus, on chromosome 4. As we have seen, most individuals have two copies of each gene at a given locus— one from the father and one from the mother. A locus where almost all humans have the same DNA sequence is called monomorphic (“of one form”). A locus where the DNA sequence varies among significant numbers of individuals (more than 1% or so of the population pos- sesses the variant) is called polymorphic (“of many forms”), and the alternative forms are called alleles. For example, the GC protein gene sequence has three common alleles that result from substitutions in a base at a given point. Where an A appears in one allele, there is a C in another. The third allele has the A, but at another point a G is swapped for a T. These changes are called single nucleotide polymorphisms (SNPs, pronounced “snips”). If a gene is like a paragraph in a book, a SNP is a change in a letter some- where within that paragraph (a substitution, a deletion, or an insertion), and the two versions of the gene that result from this slight change are the alleles. An individual who inherits the same allele from both parents is called a homozygote. An individual with distinct alleles is a heterozygote. DNA sequences used for forensic analysis usually are not genes. They lie in the vast regions between genes (about 75% of the genome is extragenic) or in the apparently nonfunctional regions within genes. These extra- and intragenic regions of DNA have been found to contain considerable sequence variation, which makes them particularly useful in distinguishing individuals. Although the terms “locus,” “allele,” “homozygous,” and “heterozygous” were developed to describe genes, the nomenclature has been carried over to describe all DNA variation—coding and noncoding alike. Both types are inherited from mother and father in the same fashion. B. What Are DNA Polymorphisms and How Are They Detected? By determining which alleles are present at strategically chosen loci, the forensic scientist ascertains the genetic profile, or genotype, of an individual (at those loci). Although the differences among the alleles arise from alterations in the order of the ATGC letters, genotyping does not necessarily require “reading” the full DNA sequence. Here we outline the major types of polymorphisms that are (or could be) used in identity testing and the methods for detecting them. 1. Sequencing Researchers are investigating radically new and efficient technologies to sequence entire genomes, one base pair at a time, but the direct sequencing methods now in existence are technically demanding, expensive, and time-consuming for whole- genome sequencing. Therefore, most genetic typing focuses on identifying only 139

OCR for page 129
Reference Manual on Scientific Evidence autosome. A chromosome other than the X and Y sex chromosomes. band. See autoradiograph. band shift. Movement of DNA fragments in one lane of a gel at a different rate than fragments of an identical length in another lane, resulting in the same pattern “shifted” up or down relative to the comparison lane. Band shift does not necessarily occur at the same rate in all portions of the gel. base pair (bp). Two complementary nucleotides bonded together at the match- ing bases (A and T or C and G) along the double helix “backbone” of the DNA molecule. The length of a DNA fragment often is measured in numbers of base pairs (1 kilobase (kb) = 1000 bp); base-pair numbers also are used to describe the location of an allele on the DNA strand. Bayes’ theorem. A formula that relates certain conditional probabilities. It can be used to describe the impact of new data on the probability that a hypothesis is true. See the chapter on statistics in this manual. bin, fixed. In VNTR profiling, a bin is a range of base pairs (DNA fragment lengths). When a database is divided into fixed bins, the proportion of bands within each bin is determined and the relevant proportions are used in esti- mating the profile frequency. binning. Grouping VNTR alleles into sets of similar sizes because the alleles’ lengths are too similar to differentiate. bins, floating. In VNTR profiling, a bin is a range of base pairs (DNA fragment lengths). In a floating bin method of estimating a profile frequency, the bin is centered on the base-pair length of the allele in question, and the width of the bin can be defined by the laboratory’s matching rule (e.g., ±5% of band size). blind proficiency test. See proficiency test. capillary electrophoresis. A method for separating DNA fragments (includ- ing STRs) according to their lengths. A long, narrow tube is filled with an entangled polymer or comparable sieving medium, and an electric field is applied to pull DNA fragments placed at one end of the tube through the medium. The procedure is faster and uses smaller samples than gel electro- phoresis, and it can be automated. ceiling principle. A procedure for setting a minimum DNA profile frequency proposed in 1992 by a committee of the National Academy of Sciences. One hundred persons from each of 15 to 20 genetically homogeneous populations spanning the range of racial groups in the United States are sampled. For each allele, the higher frequency among the groups sampled (or 5%, whichever is larger) is used in calculating the profile frequency. Compare interim ceiling principle. chip. A miniaturized system for genetic analysis. One such chip mimics capil- lary electrophoresis and related manipulations. DNA fragments, pulled by 200

OCR for page 129
Reference Guide on DNA Identification Evidence small voltages, move through tiny channels etched into a small block of glass, silicon, quartz, or plastic. This system should be useful in analyzing STRs. Another technique mimics reverse dot blots by placing a large array of oligo- nucleotide probes on a solid surface. Such hybridization arrays are useful in identifying SNPs and in sequencing mitochondrial DNA. chromosome. A rodlike structure composed of DNA, RNA, and proteins. Most normal human cells contain 46 chromosomes, 22 autosomes and a sex chromosome (X) inherited from the mother, and another 22 autosomes and one sex chromosome (either X or Y) inherited from the father. The genes are located along the chromosomes. See also homologous chromosomes. coding and noncoding DNA. The sequence in which the building blocks (amino acids) of a protein are arranged corresponds to the sequence of base pairs within a gene. (A sequence of three base pairs specifies a particular one of the 20 possible amino acids in the protein. The mapping of a set of three nucleotide bases to a particular amino acid is the genetic code. The cell makes the protein through intermediate steps involving coding RNA transcripts.) About 1.5% of the human genome codes for the amino acid sequences. Another 23.5% of the genome is classified as genetic sequence but does not encode proteins. This portion of the noncoding DNA is involved in regulating the activity of genes. It includes promoters, enhancers, and repressors. Other gene-related DNA consists of introns (that interrupt the coding sequences, called exons, in genes and that are edited out of the RNA transcript for the protein), pseudogenes (evolutionary remnants of once- functional genes), and gene fragments. The remaining, extragenic DNA (about 75% of the genome) also is noncoding. CODIS (combined DNA index system). A collection of databases on STR and other loci of convicted felons, maintained by the FBI. complementary sequence. The sequence of nucleotides on one strand of DNA that corresponds to the sequence on the other strand. For example, if one sequence is CTGAA, the complementary bases are GACTT. control region. See D-loop. cytoplasm. A jelly-like material (80% water) that fills the cell. cytosine (C). One of the four bases, or nucleotides, that make up the DNA double helix. Cytosine binds only to guanine. See nucleotide. database. A collection of DNA profiles. degradation. The breaking down of DNA by chemical or physical means. denature, denaturation. The process of splitting, as by heating, two comple- mentary strands of the DNA double helix into single strands in preparation for hybridization with biological probes. 201

OCR for page 129
Reference Manual on Scientific Evidence deoxyribonucleic acid (DNA). The molecule that contains genetic informa- tion. DNA is composed of nucleotide building blocks, each containing a base (A, C, G, or T), a phosphate, and a sugar. These nucleotides are linked together in a double helix—two strands of DNA molecules paired up at complementary bases (A with T, C with G). See adenine, cytosine, guanine, thymine. diploid number. See haploid number. D-loop. A portion of the mitochrondrial genome known as the “control region” or “displacement loop” instrumental in the regulation and initiation of mtDNA gene products. Two short “hypervariable” regions within the D-loop do not appear to be functional and are the sequences used in identity or kinship testing. DNA polymerase. The enzyme that catalyzes the synthesis of double-stranded DNA. DNA probe. See probe. DNA profile. The alleles at each locus. For example, a VNTR profile is the pattern of band lengths on an autorad. A multilocus profile represents the combined results of multiple probes. See genotype. DNA sequence. The ordered list of base pairs in a duplex DNA molecule or of bases in a single strand. DQ. The antigen that is the product of the DQA gene. See DQA, human leukocyte antigen. DQA. The gene that codes for a particular class of human leukocyte antigen (HLA). This gene has been sequenced completely and can be used for forensic typing. See human leukocyte antigen. EDTA. A preservative added to blood samples. electropherogram. The PCR products separated by capillary electrophoresis can be labeled with a dye that glows at a given wavelength in response to light shined on it. As the tagged fragments pass the light source, an electronic camera records the intensity of the fluorescence. Plotting the intensity as a function of time produces a series of peaks, with the shorter fragments pro- ducing peaks sooner. The intensity is measured in relative fluorescent units and is proportional to the number of glowing fragments passing by the detec- tor. The graph of the intensity over time is an electropherogram. electrophoresis. See capillary electrophoresis, gel electrophoresis. endonuclease. An enzyme that cleaves the phosphodiester bond within a nucleotide chain. environmental insult. Exposure of DNA to external agents such as heat, mois- ture, and ultraviolet radiation, or chemical or bacterial agents. Such exposure 202

OCR for page 129
Reference Guide on DNA Identification Evidence can interfere with the enzymes used in the testing process or otherwise make DNA difficult to analyze. enzyme. A protein that catalyzes (speeds up or slows down) a reaction. epigenetic. Heritable changes in phenotype (appearance) or gene expression caused by mechanisms other than changes in the underlying DNA sequence. Epigenetic marks are molecules attached to DNA that can determine whether genes are active and used by the cell. ethidium bromide. A molecule that can intercalate into DNA double helices when the helix is under torsional stress. Used to identify the presence of DNA in a sample by its fluorescence under ultraviolet light. exon. See coding and noncoding DNA. fallacy of the transposed conditional. See transposition fallacy. false match. Two samples of DNA that have different profiles could be declared to match if, instead of measuring the distinct DNA in each sample, there is an error in handling or preparing samples such that the DNA from a single sample is analyzed twice. The resulting match, which does not reflect the true profiles of the DNA from each sample, is a false match. Some people use “false match” more broadly, to include cases in which the true profiles of each sample are the same, but the samples come from different individuals. Compare true match. See also match, random match. gel, agarose. A semisolid medium used to separate molecules by electrophoresis. gel electrophoresis. In RFLP analysis, the process of sorting DNA fragments by size by applying an electric current to a gel. The different-size fragments move at different rates through the gel. gene. A set of nucleotide base pairs on a chromosome that contains the “instruc- tions” for controlling some cellular function such as making an enzyme. The gene is the fundamental unit of heredity; each simple gene “codes” for a specific biological characteristic. gene frequency. The relative frequency (proportion) of an allele in a population. genetic drift. Random fluctuation in a population’s allele frequencies from generation to generation. genetics. The study of the patterns, processes, and mechanisms of inheritance of biological characteristics. genome. The complete genetic makeup of an organism, including roughly 23,000 genes and many other DNA sequences in humans. Over three billion nucleotide base pairs comprise the haploid human genome. genotype. The particular forms (alleles) of a set of genes possessed by an organ- ism (as distinguished from phenotype, which refers to how the genotype expresses itself, as in physical appearance). In DNA analysis, the term is 203

OCR for page 129
Reference Manual on Scientific Evidence applied to the variations within all DNA regions (whether or not they con- stitute genes) that are analyzed. genotype, multilocus. The alleles that an organism possesses at several sites in its genome. genotype, single locus. The alleles that an organism possesses at a particular site in its genome. guanine (G). One of the four bases, or nucleotides, that make up the DNA double helix. Guanine binds only to cytosine. See nucleotide. haploid number. Human sex cells (egg and sperm) contain 23 chromosomes each. This is the haploid number. When a sperm cell fertilizes an egg cell, the number of chromosomes doubles to 46. This is the diploid number. haplotype. A specific combination of linked alleles at several loci. Hardy-Weinberg equilibrium. A condition in which the allele frequencies within a large, random, intrabreeding population are unrelated to patterns of mating. In this condition, the occurrence of alleles from each parent will be independent and have a joint frequency estimated by the product rule. See independence, linkage disequilibrium. heteroplasmy, heteroplasty. The condition in which some copies of mito- chondrial DNA in the same individual have different base pairs at certain points. heterozygous. Having a different allele at a given locus on each of a pair of homologous chromosomes. See allele. Compare homozygous. homologous chromosomes. The 44 autosomes (nonsex chromosomes) in the normal human genome are in homologous pairs (one from each parent) that share an identical set of genes, but may have different alleles at the same loci. homozygous. Having the same allele at a given locus on each of a pair of homologous chromosomes. See allele. Compare heterozygous. human leukocyte antigen (HLA). Antigen (foreign body that stimulates an immune system response) located on the surface of most cells (excluding red blood cells and sperm cells). HLAs differ among individuals and are associated closely with transplant rejection. See DQA. hybridization. Pairing up of complementary strands of DNA from differ- ent sources at the matching base-pair sites. For example, a primer with the sequence AGGTCT would bond with the complementary sequence TCCAGA on a DNA fragment. independence. Two events are said to be independent if one is neither more nor less likely to occur when the other does. interim ceiling principle. A procedure proposed in 1992 by a committee of the National Academy of Sciences for setting a minimum DNA profile fre- quency. For each allele, the highest frequency (adjusted upward for sampling 204

OCR for page 129
Reference Guide on DNA Identification Evidence error) found in any major racial group (or 10%, whichever is higher), is used in product-rule calculations. Compare ceiling principle. intron. See coding and noncoding DNA. kilobase (kb). A measure of DNA length (1000 bases). likelihood ratio. A measure of the support that an observation provides for one hypothesis as opposed to an alternative hypothesis. The likelihood ratio is computed by dividing the conditional probability of the observation given that one hypothesis is true by the conditional probability of the observation given the alternative hypothesis. For example, the likelihood ratio for the hypothesis that two DNA samples with the same STR profile originated from the same individual (as opposed to originating from two unrelated individuals) is the reciprocal of the random-match probability. Legal scholars have introduced the likelihood ratio as a measure of the probative value of evidence. Evidence that is 100 times more probable to be observed when one hypothesis is true as opposed to another has more probative value than evidence that is only twice as probable. linkage. The inheritance together of two or more genes on the same chromosome. linkage equilibrium. A condition in which the occurrence of alleles at different loci is independent. locus. A location in the genome, that is, a position on a chromosome where a gene or other structure begins. mass spectroscopy. The separation of elements or molecules according to their molecular weight. In the version being developed for DNA analysis, small quantities of PCR-amplified fragments are irradiated with a laser to form gaseous ions that traverse a fixed distance. Heavier ions have longer times of flight, and the process is known as matrix-assisted laser desorption-ionization time-of-flight mass spectroscopy. MALDI-TOF-MS, as it is abbreviated, may be useful in analyzing STRs. match. The presence of the same allele or alleles in two samples. Two DNA profiles are declared to match when they are indistinguishable in genetic type. For loci with discrete alleles, two samples match when they display the same set of alleles. For RFLP testing of VNTRs, two samples match when the pattern of the bands is similar and the positions of the corresponding bands at each locus fall within a preset distance. See match window, false match, true match. match window. If two RFLP bands lie within a preset distance, called the match window, that reflects normal measurement error, they can be declared to match. microsatellite. Another term for an STR. minisatellite. Another term for a VNTR. 205

OCR for page 129
Reference Manual on Scientific Evidence mitochondria. A structure (organelle) within nucleated (eukaryotic) cells that is the site of the energy-producing reactions within the cell. Mitochondria contain their own DNA (often abbreviated as mtDNA), which is inherited only from mother to child. molecular weight. The weight in grams of 1 mole (approximately 6.02 × 1023 molecules) of a pure, molecular substance. monomorphic. A gene or DNA characteristic that is almost always found in only one form in a population. Compare polymorphism. multilocus probe. A probe that marks multiple sites (loci). RFLP analysis using a multilocus probe will yield an autorad showing a striped pattern of 30 or more bands. Such probes are no longer used in forensic applications. multilocus profile. See profile. multiplexing. Typing several loci simultaneously. mutation. The process that produces a gene or chromosome set differing from the type already in the population; the gene or chromosome set that results from such a process. nanogram (ng). A billionth of a gram. nucleic acid. RNA or DNA. nucleotide. A unit of DNA consisting of a base (A, C, G, or T) and attached to a phosphate and a sugar group; the basic building block of nucleic acids. See deoxyribonucleic acid. nucleus. The membrane-covered portion of a eukaryotic cell containing most of the DNA and found within the cytoplasm. oligonucleotide. A synthetic polymer made up of fewer than 100 nucleotides; used as a primer or a probe in PCR. See primer. paternity index. A number (technically, a likelihood ratio) that indicates the sup- port that the paternity test results lend to the hypothesis that the alleged father is the biological father as opposed to the hypothesis that another man selected at random is the biological father. Assuming that the observed phenotypes cor- rectly represent the phenotypes of the mother, child, and alleged father tested, the number can be computed as the ratio of the probability of the phenotypes under the first hypothesis to the probability under the second hypothesis. Large values indicate substantial support for the hypothesis of paternity; values near zero indicate substantial support for the hypothesis that someone other than the alleged father is the biological father; and values near unity indicate that the results do not help in determining which hypothesis is correct. pH. A measure of the acidity of a solution. phenotype. A trait, such as eye color or blood group, resulting from a genotype. point mutation. See SNP. 206

OCR for page 129
Reference Guide on DNA Identification Evidence polymarker. A commercially marketed set of PCR-based tests for protein polymorphisms. polymerase chain reaction (PCR). A process that mimics DNA’s own repli- cation processes to make up to millions of copies of short strands of genetic material in a few hours. polymorphism. The presence of several forms of a gene or DNA characteristic in a population. population genetics. The study of the genetic composition of groups of individuals. population structure. When a population is divided into subgroups that do not mix freely, that population is said to have structure. Significant structure can lead to allele frequencies being different in the subpopulations. primer. An oligonucleotide that attaches to one end of a DNA fragment and provides a point for more complementary nucleotides to attach and replicate the DNA strand. See oligonucleotide. probe. In forensics, a short segment of DNA used to detect certain alleles. The probe hybridizes, or matches up, to a specific complementary sequence. Probes allow visualization of the hybridized DNA, either by a radioactive tag (usually used for RFLP analysis) or a biochemical tag (usually used for PCR-based analyses). product rule. When alleles occur independently at each locus (Hardy-Weinberg equilibrium) and across loci (linkage equilibrium), the proportion of the population with a given genotype is the product of the proportion of each allele at each locus, times factors of two for heterozygous loci. proficiency test. A test administered at a laboratory to evaluate its performance. In a blind proficiency study, the laboratory personnel do not know that they are being tested. prosecutor’s fallacy. See transposition fallacy. protein. A class of biologically important molecules made up of a linear string of building blocks called amino acids. The order in which these components are arranged is encoded in the DNA sequence of the gene that expresses the protein. See coding DNA. pseudogenes. Genes that have been so disabled by mutations that they can no longer produce proteins. Some pseudogenes can still produce noncoding RNA. quality assurance. A program conducted by a laboratory to ensure accuracy and reliability. quality audit. A systematic and independent examination and evaluation of a laboratory’s operations. 207

OCR for page 129
Reference Manual on Scientific Evidence quality control. Activities used to monitor the ability of DNA typing to meet specified criteria. random match. A match in the DNA profiles of two samples of DNA, where one is drawn at random from the population. See also random-match probability. random-match probability. The chance of a random match. As it is usually used in court, the random-match probability refers to the probability of a true match when the DNA being compared to the evidence DNA comes from a person drawn at random from the population. This random true match probability reveals the probability of a true match when the samples of DNA come from different, unrelated people. random mating. The members of a population are said to mate randomly with respect to particular genes of DNA characteristics when the choice of mates is independent of the alleles. recombination. In general, any process in a diploid or partially diploid cell that generates new gene or chromosomal combinations not found in that cell or in its progenitors. reference population. The population to which the perpetrator of a crime is thought to belong. relative fluorescent unit (RFU). See electropherogram. replication. The synthesis of new DNA from existing DNA. See polymerase chain reaction. restriction enzyme. Protein that cuts double-stranded DNA at specific base- pair sequences (different enzymes recognize different sequences). See restric- tion site. restriction fragment length polymorphism (RFLP). Variation among people in the length of a segment of DNA cut at two restriction sites. restriction fragment length polymorphism (RFLP) analysis. Analysis of individual variations in the lengths of DNA fragments produced by digesting sample DNA with a restriction enzyme. restriction site. A sequence marking the location at which a restriction enzyme cuts DNA into fragments. See restriction enzyme. reverse dot blot. A detection method used to identify SNPs in which DNA probes are affixed to a membrane, and amplified DNA is passed over the probes to see if it contains the complementary sequence. ribonucleic acid (RNA). A single-stranded molecule “transcribed” from DNA. “Coding” RNA acts as a template for building proteins according the sequences in the coding DNA from which it is transcribed. Other RNA transcripts can be a sensor for detecting signals that affect gene expression, a switch for turning genes off or on, or they may be functionless. 208

OCR for page 129
Reference Guide on DNA Identification Evidence sequence-specific oligonucleotide (SSO) probe. Also, allele-specific oligo- nucleotide (ASO) probe. Oligonucleotide probes used in a PCR-associated detection technique to identify the presence or absence of certain base-pair sequences identifying different alleles. The probes are visualized by an array of dots rather than by the electrophoretograms associated with STR analysis. sequencing. Determining the order of base pairs in a segment of DNA. short tandem repeat (STR). See variable number tandem repeat. single-locus probe. A probe that only marks a specific site (locus). RFLP analy- sis using a single-locus probe will yield an autorad showing one band if the individual is homozygous, two bands if heterozygous. Likewise, the probe will produce one or two peaks in an STR electrophoretogram. SNP (single nucleotide polymorphism). A substitution, insertion, or deletion of a single base pair at a given point in the genome. SNP chip. See chip. Southern blotting. Named for its inventor, a technique by which processed DNA fragments, separated by gel electrophoresis, are transferred onto a nylon membrane in preparation for the application of biological probes. thymine (T). One of the four bases, or nucleotides, that make up the DNA double helix. Thymine binds only to adenine. See nucleotide. transposition fallacy. Also called the prosecutor’s fallacy, the transposition fallacy confuses the conditional probability of A given B [P(A|B)] with that of B given A [P(B|A)]. Few people think that the probability that a person speaks Spanish (A) given that he or she is a citizen of Chile (B) equals the probability that a person is a citizen of Chile (B) given that he or she speaks Spanish (A). Yet, many court opinions, newspaper articles, and even some expert witnesses speak of the probability of a matching DNA genotype (A) given that someone other than the defendant is the source of the crime scene DNA (B) as if it were the probability of someone else being the source (B) given the matching profile (A). Transposing conditional probabilities correctly requires Bayes’ theorem. true match. Two samples of DNA that have the same profile should match when tested. If there is no error in the labeling, handling, and analysis of the samples and in the reporting of the results, a match is a true match. A true match establishes that the two samples of DNA have the same profile. Unless the profile is unique, however, a true match does not conclusively prove that the two samples came from the same source. Some people use “true match” more narrowly, to mean only those matches among samples from the same source. Compare false match. See also match, random match. variable number tandem repeat (VNTR). A class of RFLPs resulting from multiple copies of virtually identical base-pair sequences, arranged in succes- sion at a specific locus on a chromosome. The number of repeats varies from 209

OCR for page 129
Reference Manual on Scientific Evidence individual to individual, thus providing a basis for individual recognition. VNTRs are longer than STRs. window. See match window. X chromosome. See chromosome. Y chromosome. See chromosome. References on DNA Forensic DNA Interpretation (John Buckleton et al. eds., 2005). John M. Butler, Fundamentals of Forensic DNA Typing (2010). Ian W. Evett & Bruce S. Weir, Interpreting DNA Evidence: Statistical Genetics for Forensic Scientists (1998). William Goodwin et al., An Introduction to Forensic Genetics (2d ed. 2011). David H. Kaye, The Double Helix and the Law of Evidence (2010). National Research Council Committee on DNA Forensic Science: An Update, The Evaluation of Forensic DNA Evidence (1996). National Research Council Committee on DNA Technology in Forensic Science, DNA Technology in Forensic Science (1992). The President’s DNA Initiative, Forensic DNA Resources for Specific Audiences, available at www.dna.gov/audiences/. 210