matches at such alleles that might be accorded too much evidentiary weight, if the general population frequency were used in calculating the probability of a match.
Determining whether an allele has especially high frequency does not require a very large sample. A collection of 100 randomly chosen people provides a sample of 200 alleles, which is quite adequate for estimating allele frequencies.
Genetically homogeneous populations from various regions of the world should be examined to determine the extent of variation in allele frequency. Ideally, the populations should span the range of ethnic groups that are represented in the United States—e.g., English, Germans, Italians, Russians, Navahos, Puerto Ricans, Chinese, Japanese, Vietnamese, and West Africans. Some populations will be easy to sample through arrangements with blood banks in the appropriate country; other populations might be studied by sampling recent immigrants to the United States. The choice and sampling of the 15-20 populations should be supervised by the National Committee on Forensic DNA Typing (NCFDT) described in Chapter 2.
We emphasize, however, that it is not necessary to be comprehensive. The goal is not to ensure that the ethnic background of every particular defendant is represented, but rather to define the likely range of allele frequency variation.
Because only a limited number of populations can be sampled, it is necessary to make some allowance for unexamined populations. As usual, the problem is rare alleles. Genetic drift has the greatest proportional effect on rare alleles and may cause substantial variation in their frequency. Even if one sees allele frequencies of 1% in several ethnic populations, it is not safe to conclude that the frequency might not be five-fold higher in some subgroups.
To overcome this problem, we recommend that ceiling frequencies be 5% or higher. We selected this threshold because we concluded that allele frequency estimates that were substantially lower would not provide sufficiently reliable predictors for other, unsampled subgroups. Our reasoning was based on population genetic theory and computational results, and we aimed at accounting for the effects of sampling error and for genetic drift. The latter consideration was especially important, because it scales inversely with effective population size (i.e., small populations have larger drift) and because it accumulates over generations. The use of such a ceiling frequency would correspond to a lower bound of 5% on allele frequencies. Even if one observed allele frequencies of about 1%, one would guard against the possibility that the frequency in a subpopulation had drifted higher by using the lower bound of 5%. Thus, the lowest frequency attrib-