National Academy of Sciences | 150 Year Anniversary

Questions? Call 800-624-6242

| Items in cart [0]

The National Academies Press

PAPERBACK
price:$45.95
add to cart

Rights & Permissions

topleft topright

The Evaluation of Forensic DNA Evidence (1996)
Commission on Life Sciences (CLS)

Citation Manager

. "5 Statistical Issues." The Evaluation of Forensic DNA Evidence. Washington, DC: The National Academies Press, 1996.

Please select a format:

BibTeX EndNote RefMan


Page
126
bottomleft bottomright

The following HTML text is provided to enhance online readability. Many aspects of typography translate only awkwardly to HTML. Please use the page image as the authoritative form to ensure accuracy.


Page 126

subgroup. Our approach is empirical: we compare different subpopulations and also, to mimic a worst case scenario, perform sample calculations deliberately using an inappropriate database.

Data Sources

A simple random sample of a given size from a population is one chosen so that each possible sample has an equal chance of being selected. Ideally, the reference data set from which genotype frequencies are calculated would be a simple random sample or a stratified or otherwise scientifically structured random sample from the relevant population. Several conditions make the actual situation less than ideal. One is a lack of agreement as to what the relevant population is (should it be the whole population or only young males? should it be local or national?) and the consequent need to consider several possibilities. A second is that we are forced to rely on convenience samples, chosen not at random but because of availability or cost. It is difficult, expensive, and impractical to arrange a statistically valid random-sampling scheme. The saving point is that the features in which we are interested are believed theoretically and found empirically to be essentially uncorrelated with the means by which samples are chosen. Comparison of estimated profile frequencies from different data sets shows relative insensitivity to the source of the data, as we document later in the chapter. Furthermore, the VNTRs and STRs used in forensic analysis are usually not associated with any known function and therefore should not be correlated with occupation or behavior. So those convenience samples are effectively random.

The convenience samples from which the databases are derived come from various sources. Some data come from blood banks. Some come from genetic-counseling and disease-screening centers. Others come from mothers and putative fathers in paternity tests. The data summarized in FBI (1993b), which we have used in previous chapters and will again in this chapter, are from a variety of sources around the world, from blood banks, paternity-testing centers, molecular-biology and human-genetics laboratories, hospitals and clinics, law-enforcement officers, and criminal records.

As mentioned previously, most markers used for DNA analysis, VNTRs and STRs in particular, are from regions of DNA that have no known function. They are not related in any obvious way to gene-determined traits2, and there is no reason to suspect that persons who contribute to blood banks or who have been

2 Some loci used in PCR-based typing are associated with genes. It is important to determine if a particular forensic allele is associated with a disease state and hence subject to selection. A forensic marker might happen to be closely linked to an important gene, such as one causing some observable trait, and could conceivably be in strong linkage disequilibrium. As the number of mapped genes increases, this will become increasingly common. But for that to affect the reliability of a database, the trait would have to appear disproportionately in the populations that contribute to the database.

Page
126