ferentiation. Together with regions conserved between D. melanogaster and Drosophila pseudoobscura, we tag 5.3 kb of noncoding DNA as potentially regulatory. Ninety-seven of the 408 common noncoding SNPs surveyed are within putatively regulatory regions. If these methods collectively identify the majority of functional noncoding polymorphisms, genotyping only these SNPs in an association mapping framework would reduce genotyping effort for noncoding regions 4-fold.
A major goal of modern biological research is to understand the relationship between genotype and phenotype. The search for genetic variation contributing to differences among individuals is exemplified by association studies that aim to identify those segregating genetic polymorphisms that confer risk to common polygenic, or complex diseases in humans (Carlson et al., 2004). Association mapping involves genotyping a dense set of SNPs in a large population of individuals, and asking whether there is evidence of an association between the genotype at each SNP and the phenotype. A significant association suggests that the genotyped SNP is either itself responsible for conferring disease risk or strongly correlated, i.e., is in linkage disequilibrium, with the causal site. We refer to SNPs contributing to phenotypic variation as functional SNPs (fSNPs).
The human genome harbors 4.6–7.1 million common SNPs [minor allele frequency above 5%; Kruglyak and Nickerson (2001) and Stephens et al. (2001)], with the vast majority presumed to be nonfunctional. Unfortunately, it is not yet cost-effective to exhaustively test every SNP for an association with a disease phenotype. Despite a great deal of academic and private research, genotyping technology remains unable to efficiently genotype millions of SNPs in thousands of individuals at reasonable cost (Syvänen, 2001). Thus, some intelligent way of reducing the genotyping effort is needed.
One such method, the HapMap project (International HapMap Consortium, 2003), seeks to take advantage of the level of linkage disequilibrium (LD) across the genome and choose a subset of SNPs to genotype that explain the majority of haplotype information. This approach is favored for humans, with the recent suggestion that the genome exhibits a block-like LD structure (Daly et al., 2001; Gabriel et al., 2002; Patil et al., 2001). Under the HapMap plan, between 200,000 and 1 million SNPs need to be genotyped to achieve complete genome coverage (Carlson et al., 2003; Gabriel et al., 2002; Goldstein et al., 2003; Patil et al., 2001). However, this plan is critically dependent on the degree to which available SNPs capture human haplotype diversity, which is hotly debated (Carlson et al., 2003; Reich et al., 2003), and on the reliability of the block definitions