which a genotype is associated with a geographic region rather than a phenotype. This will become especially problematic for phenotypes that vary by geographic region, such as flowering time or photoperiod sensitivity. Many important crops (e.g., barley, rice, and soybean) are derived from wild populations with extensive geographic structure (Lin et al., 2001; Morrell et al., 2003; Kuroda et al., 2006). This structure is often reflected in the domesticate as well, especially in cases involving multiple independent domestications (Londo et al., 2006; Morrell and Clegg, 2007). Unfortunately, for many of the crops in Table 11.1 we have little information about the location of domestication or population structure in wild populations. The conspicuous exceptions are rice, maize, barley, and wheat, whose domestication histories are becoming more clear (Matsuoka et al., 2002; Salamini et al., 2002; Londo et al., 2006; Morrell and Clegg, 2007). Studies of human diseases (Weiss and Clark, 2002) suggest that basic research on demographic history and population structure will be crucial to the success of LD mapping in plants.
The final design challenge that we will consider here is marker (usually SNP) density. LD mapping studies are very powerful when the causative mutation is genotyped (Risch and Merikangas, 1996; Long and Langley, 1999). If the causative mutation is not genotyped, it is still possible to identify association via markers that are in LD with the causative mutation. However, the extent of LD can vary dramatically among plant species (Flint-Garcia et al., 2003; Morrell et al., 2005), among genomic regions within plants (Gaut et al., 2007), and among population samples (Tenaillon et al., 2001; Ching et al., 2002; Caldwell et al., 2006). The distribution of LD is also affected by homologous gene conversion, which predominantly disrupts short-range LD patterns (Jeffreys and May, 2004; Padhukasahasram et al., 2004; Morrell et al., 2006). Study design, statistical analysis, and controlling for biological challenges such as population structure are very active areas of research (Rosenberg and Nordborg, 2006; Yu and Buckler, 2006), but several large-scale plant LD mapping studies are currently underway despite having little background information about the extent of LD and geographic structure in the populations being studied.
The difficulties inherent to LD mapping are reflected in the literature. In a genome-wide association study, Aranzana et al. (2005) confirmed several Arabidopsis QTLs for flowering time and pathogen resistance but also noted a high rate of false positive associations. Workers using large wild-caught populations of Drosophila have been unable to verify associations identified in lab populations, suggesting that some results may not be replicable regardless of sample size, the number of SNPs genotyped, or the care taken in study design (Macdonald and Long, 2004; Macdonald et al., 2005). Furthermore, failure to identify an association between a candidate gene and a phenotype of interest is likely underreported.