Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
13 Gene Genealogies and Population Variation in Plants BARBARA A. SCHAAL AND KENNETH M. OLSEN Early in the development of plant evolutionary biology, genetic drift, fluctuations in population size, and isolation were identified as critical processes that affect the course of evolution in plant species. Attempts to assess these processes in natural popula- tions became possible only with the development of neutral ge- netic markers in the 1960s. More recently, the application of his- torically ordered neutral molecular variation (within the conceptual framework of coalescent theory) has allowed a reevaluation of these microevolutionary processes. Gene genealogies trace the evolutionary relationships among haplotypes (alleles) within pop- ulations. Processes such as selection, fluctuation in population size, and population substructuring affect the geographical and genealogical relationships among these alleles. Therefore, ex- amination of these genealogical data can provide insights into the evolutionary history of a species. For example, studies of Arabidopsis thaliana have suggested that this species underwent rapid expansion, with populations showing little genetic differenti- ation. The new discipline of phylogeography examines the distri- bution of allele genealogies in an explicit geographical context. Phylogeographic studies of plants have documented the recolo- Department of Biology, Washington University, St. Louis, MO 63130 This paper was presented at the National Academy of Sciences colloquium âVariation and Evolution in Plants and Microorganisms: Toward a New Synthesis 50 Years After Stebbins,â held January 27â29, 2000, at the Arnold and Mabel Beckman Center in Irvine, CA. 235
236 / Barbara A. Schaal and Kenneth M. Olsen nization of European tree species from refugia subsequent to Pleistocene glaciation, and such studies have been instructive in understanding the origin and domestication of the crop cassava. Currently, several technical limitations hinder the widespread application of a genealogical approach to plant evolutionary stud- ies. However, as these technical issues are solved, a genealogi- cal approach holds great promise for understanding these pre- viously elusive processes in plant evolution. I n the following succinct statements, G. L. Stebbins presents what would become the framework for the study of plant evolutionary mechanisms for the next 50 years (Stebbins, 1950). Individual variation, in the form of mutation and gene recombi- nation, exists in all populations; . . . the molding of this raw mate- rial . . . into variation on the level of populations by means of natural selection, fluctuation in population size, random fixation and isolation is sufficient to account for all of the differences, both adaptive and non-adaptive, which exist between related races and species . . . The problem of the evolutionist is . . . evaluating on the basis of all available evidence the role which each of these known forces has played in any particular evolutionary line . . . A central thesis of Stebbinsâ seminal book, Variation and Evolution in Plants (Stebbins, 1950), is the notion that to understand evolution we must examine its action at the level of populations within species. This reason- ing may seem obvious to contemporary readers, but at the time of Steb- binsâ writing, the importance of population-level processes for evolution was far from apparent. Stebbinsâ elucidation of this connection is one of his most enduring contributions to plant evolutionary biology. Fifty years ago, the study of plant evolution was necessarily con- cerned with the phenotype, much of which is subject to selection. Mor- phology, karyotypes, and fitness components are central traits for under- standing evolution and adaptation, but they limit which evolutionary processes can be studied. In his book, Stebbins discusses such events as fluctuations in population size, random fixation (genetic drift), and isola- tion as all affecting the process of evolution (see passage quoted above; Stebbins, 1950). However, the study of these mechanisms requires mark- ers that are not under selection. In the years after the publication of Stebbinsâ book, among the first major technical advances in evolutionary biology were the development of protein electrophoresis and the identification of allozyme variation in natural populations. Many allozymes are selectively neutral, and thus, for
Gene Genealogies and Population Variation in Plants / 237 the first time, evolutionary biologists could attempt to assess the amount of neutral genetic variation within species as well as its spatial distribu- tion. Plant species were found to vary widely, both in levels of genetic variation and in the apportionment of this variation within and among populations. These observations spurred researchers to examine the mech- anisms underlying the process of genetic differentiation in plants. One of the most common approaches for doing this analysis has been to look for correlations between the life history characteristics of a species (e.g., mech- anisms of pollen and seed dispersal, system of mating, and generation length) and patterns of population genetic differentiation (Hamrick and Godt, 1989). Neutral allelic variation from allozymes also can be used to estimate levels of gene flow among populations. The population genetic theory developed by Wright, Fisher, and Malecot (among others) estab- lished that, for a group of populations at equilibrium, the level of genetic differentiation is roughly inversely proportional to the level of interpopu- lation gene flow per generation. This relationship is expressed by Wrightâs (1951) familiar equation for estimating gene flow under an island model: FST â 1/(4Nem + 1), where FST is the standardized variance in allele fre- quencies among populations, Ne is the effective population size, and m is the migration rate. The use of allozymes has led to more than 30 years of insight into how plant populations evolve. However, inferring population structure solely from allele frequencies has its limitations. Allozyme alleles (or their DNA analogs, restriction fragment length polymorphisms, amplified fragment length polymorphisms, and microsatellites) are unordered, meaning that the genealogical pattern of relationships among alleles cannot be easily inferred. As a result, these data cannot be used in directly assessing ge- netic change over time but rather require indirect approaches based on models that often assume equilibrium conditions. For example, Wrightâs equation (above) for quantifying gene flow under the island model as- sumes that populations have reached an equilibrium between gene flow and random genetic drift. This equilibrium perspective can be biologi- cally misleading, particularly for species in which recent history is a ma- jor determinant of population structure. We speculate that very few plant species have reached, or will ever reach, a gene flow-drift equilibrium. Many plants, both temperate and tropical, have altered their range subse- quent to glaciations in the last 20,000 years, a recent event on an evolu- tionary time scale. Likewise, many plant species have a metapopulation structure with subpopulations continually being colonized, dispersing migrants, and going extinct. Such metapopulations may reach a system- wide equilibrium in which probabilities of extinction and recolonization are constant given enough time, but such a situation is unlikely consider- ing the relatively short time frame of global climatic changes in the past.
238 / Barbara A. Schaal and Kenneth M. Olsen Equilibrium in plants is for the most part a theoretical construct with little relation to reality. We would argue that the second major development since the publi- cation of Stebbinsâ work has been the application of ordered, genealogical data to the study of population-level processes. This development, which has begun to reach its potential only in the last decade, was predicated on two major advances, one technical and the other conceptual. The techni- cal advance has been the widespread availability of ordered genetic varia- tion at the intraspecific level, typically in the form of DNA sequence variation or mapped restriction-site data. For such data, mutational dif- ferences among genetic variants indicate the patterns of relationship among variants. These data therefore provide the raw information needed to reconstruct genealogical relationships among alleles (i.e., gene trees). The conceptual advance has been the application of coalescent theory to the study of microevolutionary processes (for a recent review, see Fu and Li, 1999). For a population of constant size, new alleles are continu- ally arising through mutation, and others are going extinct over succes- sive generations (assuming neutrality). Therefore, the extant alleles of a gene in a population are all derived from (i.e., coalesce to) a single com- mon ancestral allele that existed at some point in the past (Fig. 1). Coales- cent theory provides a framework for studying the effects of population- level processes (e.g., population size fluctuations, selection, and gene flow) on the expected time to common ancestry of alleles within a gene tree. The application of gene genealogies to population genetics has allowed the study of population-level processes within a temporal, nonequilibrium framework. Thus, microevolution can be studied as a dynamic, historical process, changing over time within a species. At the foundation of all genealogical analyses is the gene tree, which represents the inferred genealogical relationships among alleles observed in a species. Most intraspecific gene trees are unrooted, because one often can- not determine the temporal polarity of mutations, even with an outgroup (Castelloe and Templeton, 1994). A common means of representing the in- ferred genealogy is with a âminimum spanning treeâ (e.g., Smouse, 1998), for which the number of mutational changes among alleles is minimized (Fig. 1). If homoplasy (mutational convergence or reversal) is infrequent, then a single most parsimonious minimum spanning tree often can be inferred by using maximum parsimony search algorithms (Swofford, 1993). For data showing high levels of homoplasy, more complicated tree estimation algorithms may be required (e.g., Templeton et al., 1992). Extant alleles on a gene tree are often separated by more than one mutational step, and thus, the gene tree typically contains a number of inferred intermediate alleles; these unobserved alleles may be extinct, may have been missed during population sampling, or may never have existed at all (if mutations did not accumulate in single steps).
Gene Genealogies and Population Variation in Plants / 239 FIGURE 1. Hypothetical population (Top) showing geographical distribution of alleles; allele genealogy (Middle) indicating true history of allelic divergence over time; unrooted minimum spanning tree (Bottom) showing inferred genealogical relationships among alleles.
240 / Barbara A. Schaal and Kenneth M. Olsen Allele genealogies can inform us about the effects of microevolution- ary forces on organismal and population lineages. However, as has be- come well established in the last decade, a gene tree is far from equivalent to the population lineages through which it is transmitted (see review in Avise, 2000). Therefore, caution must be used in drawing inferences about population-level processes from genealogical data. The hypothetical al- lele genealogy in Fig. 2 illustrates the potential incongruity that can exist between a gene tree and the populations in which it exists. After two populations have become isolated from each other (and barring subse- quent gene flow), the populations will diverge genetically until eventu- ally all of the alleles within each population are more closely related to each other than to those from the other population (Fig. 2). At this point, the alleles show reciprocal monophyly with respect to the two popula- tions and accurately reflect the history of population divergence. Before reaching reciprocal monophyly, however, alleles are expected to be poly- FIGURE 2. Hypothetical allele genealogy in populations A and B that became isolated from each other at time t0. At time t1, genealogical relationships show paraphyly with respect to the two populations. At time t2, alleles show reciprocal monophyly and are congruent with the history of population divergence.
Gene Genealogies and Population Variation in Plants / 241 phyletic, then paraphyletic, with respect to the populations (Neigel and Avise, 1986). In these cases, genealogical relationships among alleles are not expected to correspond to population identity (Fig. 2). Thus, for re- cently diverged populations, inferences about the history of population divergence based on the gene tree may be misleading or erroneous. In some cases, ancestral allelic variation may actually persist in populations after population divergence. These shared ancestral polymorphisms can easily be misinterpreted as evidence of interpopulation gene flow. Below, we present several studies that exemplify the usefulness of gene genealogies for studying population-level processes. We begin with several examples from Arabidopsis thaliana, including the homeotic loci APETALA3 and PISTILLATA and the disease-resistance locus RPS2, all of which are subject to selection. Then, we illustrate the utility of genealo- gies for tracing the postglacial range expansion in a variety of plant spe- cies. Finally, the usefulness of a genealogical approach for documenting crop origin is shown for cassava, a staple crop of the tropics. GENE GENEALOGY: AN EXAMPLE FROM ARABIDOPSIS The model plant, A. thaliana, is being used increasingly often for evo- lutionary studies. Arabidopsis offers many advantages as a study system, including its small size, simple genome, and rapid generation time. Mo- lecular biologists have elucidated the function of many genes in Arabidop- sis; mechanisms of development have been detailed, and the sequence of its genome is nearly complete. All of this work provides fertile ground for evolutionary biologists. There are an increasing number of excellent stud- ies that use this information. For example, the role of homeotic genes in the development of floral structures has furthered understanding of tis- sue differentiation in plants; this work has provided the background for studies that investigate the evolutionary diversification of morphogenic pathways. Purugganan and Suddith (1999) have examined the molecular evolution of the homeotic loci APETALA3 and PISTILLATA, which affect petal and stamen development in Arabidopsis flowers. They have com- pared the gene genealogies of these sequences with variation at five other nuclear loci of Arabidopsis. Based on an excess of low-frequency nucle- otide polymorphisms and elevated within-species replacement polymor- phisms, the authors conclude that A. thaliana has undergone rapid expan- sion in population numbers and size. Likewise, patterns of variation in restriction fragment length polymorphisms of several nuclear loci and the construction of a multilocus haplotype network have indicated that A. thaliana populations exhibit little to no geographical structuring (Bergel- son et al., 1998), a conclusion that is consistent with rapid population expansion.
242 / Barbara A. Schaal and Kenneth M. Olsen In the above examples from Arabidopsis, population history has strongly affected patterns of genealogy and molecular evolution. Other loci within the Arabidopsis genome may reflect different evolutionary pro- cesses, in particular selection. Arabidopsis has served as a model system for unraveling disease-resistance response, a trait presumed under strong selection. In Arabidopsis, the RPS2 gene is involved in the recognition of the plant pathogen Pseudomonas syringae pv. tomato. RPS2 interacts with an avirulence gene, avrRpt2, of the pathogen to initiate the cascade of events that led to disease resistance. Both the avirulence gene in the patho- gen and the resistance gene in the host must be functional to elicit resis- tance. These genes interact in a specific âgene-for-geneâ manner. The close relationship between avirulence genes and resistance genes as well as the obvious fitness consequences of resistance for a plant have led to specula- tion on the evolutionary dynamics of resistance genes. RPS2 encodes a 909-amino acid gene product. The gene contains sev- eral motifs that suggest it is part of a signaling pathway, including a leucine zipper, leucine-rich repeats, a hydrophobic region, and a nucle- otide-binding site. A gene genealogy for the RPS2 locus has been con- structed to investigate the molecular evolution of the gene (Caicedo et al., 1999); 17 accessions of A. thaliana, representing a diversity of ecotypes, were sequenced for RPS2, and their resistance to Pseudomonas was deter- mined. The resulting genealogy reveals an intriguing pattern (Fig. 3). Disease-resistance haplotypes (alleles) are clustered on the gene tree, in- FIGURE 3. Gene tree for the RPS2 locus of A. thaliana. Open circles represent susceptible haplotypes; open squares are resistant haplotypes; and open dia- monds are haplotypes intermediate in resistance. Closed circles are haplotypes not present in the sample but inferred from single-step mutations. The figure is modified from Caicedo et al. (1999).
Gene Genealogies and Population Variation in Plants / 243 dicating that resistance haplotypes are closely related. In contrast, a sus- ceptible ecotype is 23 mutational steps from this cluster (the other two susceptible ecotypes represent a mutation to a stop codon and a strain created by mutagenesis). Silent mutations are distributed more often on the long branch of the tree, whereas nonsilent mutations occur more fre- quently on the short branches of the genealogy. Such genealogies have potential for inferring gene function as well as unraveling the dynamics of molecular evolution. PHYLOGEOGRAPHY One of the most successful applications of genealogical methods in natural populations has been in field of phylogeography. The conceptual approach of phylogeography was pioneered by John Avise and colleagues (Avise et al., 1987). Avise (2000) defines phylogeography as âa field of study concerned with the principles and processes governing the geo- graphic distribution of genealogical lineages, especially those within and among closely related species.â Phylogeographic studies draw inferences about the history of population divergence based on associations between the geographical distribution of the alleles and their genealogical relation- ships. Because these studies are not based on equilibrium assumptions of gene flow and genetic drift, they have proved insightful in studying his- torical changes in patterns of gene flow, isolation, and secondary contact among divergent populations. The vast majority of phylogeography stud- ies have focused on animal systems, and most of these have relied on the rapidly evolving regions of the mitochondrial genome as a source of ge- netic variation. Phylogeographic studies in plants have lagged behind those of animal studies, primarily because of difficulties in finding or- dered, neutral intraspecific variation required for constructing gene trees (see Conclusions below). POSTGLACIAL MIGRATION Some of the most elegant studies of phylogeography in plants have examined the postglacial migration of species from Pleistocene refugia. A series of studies, using polymorphism in the chloroplast genome, on Eu- ropean trees, such as oaks (Quercus spp.), beech (Fagus sylvatica), and black alder (Alnus glutinosa), have shown similar patterns of variability; these species show a strong eastâwest cline in variation. Investigators interpret this cline as a result of postglacial migration from the same glacial refugia, leading to the concordance of variation patterns among species (Newton et al., 1999). Phylogeographic studies of eight oak species have demonstrated that recolonization of Europe subsequent to the last
244 / Barbara A. Schaal and Kenneth M. Olsen glaciation was from several refugia in the peninsulas of Iberia, Italy, and the Balkans (Dumolin-LapÃ¨gue et al., 1997). In this case, each refugium was represented by a distinct haplotype lineage. Fine-scale phylogeo- graphic analysis further indicated that chloroplast DNA polymorphisms are shared between several oak species and, in this case, are attributed to hybridization and introgression subsequent to the recolonization of Eu- rope. Similarly, chloroplast DNA analysis has shown concordance be- tween beech (Demesure et al., 1996) and black alder (King and Ferris, 1998) phylogeography. Both species are believed to have colonized Eu- rope after glaciation from a refugium in the Carpathian Mountains. More- over, the data indicate that an additional refugium for these species in Italy did not contribute to the recolonization of Europe. Similar concor- dance in phylogeographic patterns associated with postglacial spread is observed between plant species in the Pacific Northwest of North America. Soltis et al. (1997) have shown, via chloroplast DNA phylog- enies, similar patterns in the structuring of variation among several dif- ferent types of plants, including ferns, trees, and several members of the Saxifragaceae, suggesting that the present genetic structure of these spe- cies is strongly affected by their postglacial pattern of colonization. PHYLOGEOGRAPHY AND PLANT DOMESTICATION: MANIHOT ESCULENTA We have used a genealogical approach in examining two questions in- volving the species M. esculenta (Euphorbiaceae): the origin of the staple root crop cassava (M. esculenta subsp. esculenta) and the phylogeography of cassavaâs closest wild relative (M. esculenta subsp. flabellifolia). Cassava (manioc) is the sixth most important crop in the world (Mann, 1997). It is the primary source of calories in sub-Saharan Africa and serves as the main carbohydrate source for over 500 million people in the tropics world- wide (Best and Henry, 1992; Cock, 1985). Cassava is mostly grown by subsistence farmers, and despite its global importance as a food crop, it has traditionally received less attention by researchers than have temper- ate cereal crops. One fundamental question that has remained unresolved concerns the cropâs geographical and evolutionary origins. Cassava was traditionally proposed to be a âcompilospeciesâ derived from multiple hybridizing progenitor species in the genus Manihot (Jennings, 1995; Sauer, 1993). Manihot includes â98 species occurring in both northern South America (â80 spp.) and in Mexico/Central America (â17 spp.); sites of domestication were proposed from much of this vast geographical area. Traditional phylogenetic approaches were only partially successful in determining cassavaâs origin. Species of Manihot show low levels of diver-
Gene Genealogies and Population Variation in Plants / 245 gence in both morphological and molecular characters, probably reflect- ing a recent diversification of the genus (Rogers and Appan, 1973; Olsen and Schaal, 1999). A phylogeny of the genus based on DNA sequences in the nuclear ribosomal ITS region (B.A.S., unpublished data) is not highly resolved but does place cassava in a clade of South American species. This finding was consistent with the proposition, based on morphological char- acters (Allem, 1994), that cassava is derived from a single wild South American progenitor (referred to as M. esculenta subsp. flabellifolia under present taxonomy). To test the hypothesis that cassava is derived from flabellifolia, we examined DNA sequence variation in 20 crop accessions, 27 populations of flabellifolia, and 6 populations of a closely related species, Manihot pruinosa, which has been proposed to hybridize with flabellifolia (Allem, 1992, 1999). Populations of flabellifolia occur in mesic transitional forest patches in the ecotone between the lowland rainforest of the Amazon basin and the seasonally dry cerrado (savannaâscrub) found to the south and east on the Brazilian Shield plateau (Fig. 4). Populations were sampled in two transects, one along the southern border of the Amazon and the other along the eastern border. M. pruinosa is a cerrado species that occurs within the eastern range of flabellifolia. We included this species to test whether it has contributed to the genetic diversity of cassava through hybridization with flabellifolia. FIGURE 4. Locations of populations of M. esculenta subsp. flabellifolia (squares) and M. pruinosa (circles) sampled for the G3pdh phylogeography study. Shaded squares indicate populations containing one or more haplotypes found in domes- ticated cassava accessions. The figure is modified from Olsen and Schaal (1999).
246 / Barbara A. Schaal and Kenneth M. Olsen The study was based on sequence variation within a portion of the low-copy nuclear gene G3pdh, which encodes glyceraldehyde 3-phosphate dehydrogenase (Olsen and Schaal, 1999). Using primers designed by Strand et al. (1997), we PCR amplified and sequenced a 962-bp region that spanned three exons, four introns, and parts of two flanking exons. From the 424 alleles (212 individuals) that were sequenced, we observed a total of 63 nucleotide polymorphisms, which characterized 28 different hap- lotypes. Maximum parsimony analysis yielded two negligibly different gene tree topologies, one of which is shown in Fig. 5. Because the domestication of cassava is an extremely recent event evolutionarily speaking, one would not expect the divergence between cassava and flabellifolia to be reflected in the G3pdh gene tree. However, by looking at the haplotypes that are shared between cassava and the wild taxa and by examining the geographical locations of these haplotypes in the wild populations, we were able to draw several insights into the ori- FIGURE 5. G3pdh gene tree for M. esculenta and M. pruinosa. Letters correspond to haplotype designations in GenBank accession numbers (AF136119âAF136149). Shapes around letters indicate the taxon or taxa in which a haplotype was found, as indicated in the key. The figure is modified from Olsen and Schaal (1999).
Gene Genealogies and Population Variation in Plants / 247 gin of the crop. First, we found that genetic variation in the crop is almost entirely a subset of that found in flabellifolia. Flabellifolia contains 24 hap- lotypes, of which 6 are found in cassava; cassavaâs haplotype diversity, therefore, represents 25% of that found in M. esculenta overall. Thus, the crop is most likely derived directly from flabellifolia, rather than from several hybridizing progenitor species as traditionally thought. In addi- tion, we found that the cassava haplotypes occur in flabellifolia popula- tions along the southern border of the Amazon basin and not along the eastern border. This finding points to the southern Amazonian region as the likely site of domestication of cassava. Interestingly, paleobotanical and other anthropological data indicate this region as a probable zone of domestication shared with peanut, two species of chili pepper, and jack bean (Piperno and Pearsall, 1998). Finally, we found that none of the cassava haplotypes occur in M. pruinosa, suggesting that this species is not a progenitor of the crop. All of these conclusions are corroborated by an analysis of this same study system with microsatellite markers (K.M.O. and B.A.S., unpublished work). The phylogeographic aspect of the study has focused on historical patterns of population divergence in flabellifolia and between flabellifolia and pruinosa. The distribution of the rainforestâcerrado ecotone where these species occur is likely to have shifted during the climatic changes of the Pleistocene (Behling, 1998; Burnham and Graham, 1999). Although there is not yet a consensus on the pattern or extent of habitat shifts, cooler/drier periods (associated with glaciations in temperate latitudes) are expected to have favored the expansion of cerrado and transitional forest; during warmer, humid periods (including the present), these habi- tats would be expected to be more restricted and fragmented as rainforest expanded. The repeated climate fluctuations of the Pleistocene are there- fore predicted to have led to cycles of population fragmentation followed by range expansions and secondary contact in populations of flabellifolia and pruinosa. If these events have occurred, they should be reflected in the present phylogeographic structure of these taxa. These hypotheses are being tested currently through a nested cladis- tic analysis (Templeton et al., 1995) of the G3pdh data set (K.M.O., unpub- lished data). Although the statistical analyses are not complete, some preliminary insights are possible by visual inspection of the G3pdh gene tree. One interesting finding is that three haplotypes are shared between flabellifolia and pruinosa, suggesting interspecific introgression and/or shared ancestral polymorphisms that predate the divergence of these spe- cies. Two of the shared haplotypes (E and J) are common in eastern flabellifolia populations, and each is found in a single pruinosa individual from a population in close proximity to flabellifolia populations. This pat- tern suggests introgression from flabellifolia into M. pruinosa. The position
248 / Barbara A. Schaal and Kenneth M. Olsen of these haplotypes near the tips of the gene tree also favors this explana- tion over shared ancestral polymorphisms; tip haplotypes are likely to be younger than interior haplotypes (Castelloe and Templeton, 1994) and therefore would be less likely to represent ancestral variation. The third shared haplotype (M) is also a tip haplotype. However, this haplotype is common in M. pruinosa and is found in a single flabellifolia population approximately 1,000 km west of the current range of M. pruinosa. Al- though clearly not the result of contemporary gene flow, this pattern could possibly have arisen through hybridization in the recent past. Pa- lynological data indicate that during the last glacial maximum (<18,000 years B.P.), cerrado vegetation expanded into areas along the southern border of the Amazon basin that are presently rainforest (reviewed in Burnham and Graham, 1999). Thus, hybridizing pruinosa populations could have existed in this region as recently as 11,000 years B.P. Haplotypes on the G3pdh tree are not clustered by species (Fig. 4). Because flabellifolia and pruinosa are closely related taxa within a recently radiated genus, they would not necessarily be expected to have reached a pattern of reciprocal monophyly with respect to G3pdh haplotypes (Fig. 2). The phylogeographic structure within each species is also complex. However, although there is no simple concordance between the geo- graphical distributions of haplotypes and their genealogical relationships, contingency analyses (Posada et al., 1999) reveal that nested clades within the gene tree are geographically structured. Thus, the phylogeographic structure reflects more than just the random sorting of ancestral polymor- phisms among populations. Detailed phylogeographic analysis (Tem- pleton et al., 1995) and the analysis of DNA sequence data from two additional nuclear genes (K.M.O., unpublished data) will be useful in elucidating the historical processes that have led to the current phylo- geographic structure in this study system. CONCLUSIONS Gene genealogies have lead to several important insights into plant evolution and have the potential for far greater contributions. Many of the processes that affect the evolution of plant populations, such as selection, isolation, size fluctuations, and gene flow, are amenable to genealogical analysis. In particular, the use of genealogies within the framework of coalescence theory will allow us to understand in greater detail the role of historical fluctuations in population size, colonization, and range expan- sion. Although the large-scale metapopulation structure of many plants is clearly documented, there are relatively few studies of the genetic dy- namics of this structure: colonization and establishment of subpopula-
Gene Genealogies and Population Variation in Plants / 249 tions, gene migration, and extinction. The genetic aspects of such pro- cesses largely remain to be explored, and a historical genealogical ap- proach will be particularly instructive. The major impediment to wide application of gene genealogies for phylogeographic studies in plants is identifying DNA sequences with appropriate levels of ordered variation within chloroplast, mitochondrial, or nuclear genomes (Schaal et al., 1998). In many cases, the chloroplast spacer regions that have been informative for some species (see above) show little to no intraspecific variation in other plant species. Moreover, chloroplast restriction fragment length polymorphism genealogies based on length variation alone can be confounded by homoplasy. The nuclear genome remains problematic because of the difficulty in finding regions that have sufficient levels of neutral variation and that are not involved in intragenic recombination. Moreover, the effective population size of a nuclear gene is four times that of an organelle gene, because it is diploid and biparentally inherited. The larger effective population size results in increased coalescent times, which in turn, increases the likelihood of en- countering ancestral polymorphisms. High-resolution nuclear markers such as random amplified polymorphic DNAs and amplified fragment length polymorphisms are historically unordered, and variants cannot be related easily in a genealogical manner. Because of the difficulty in find- ing genealogically informative markers, many plant studies have been phylogeographic only in the broad sense, meaning that they detect an association between patterns of genetic variation and geography. Such studies do not incorporate a genealogical perspective. The search for appropriate markers has turned to nuclear genes that are increasingly the focus for genealogical studies. Nuclear genes often contain multiple introns, and many of the introns contain high levels of neutral variation. This approach has been applied successfully in several animal species: e.g., oysters (Hare and Avise, 1998), fish (Bagley and Gall, 1998), and birds (Degnan, 1993). Nuclear sequences of plants have been used to understand the genetic relationships of wild populations of A. thaliana (Bergelson et al., 1998), selection, and evolution of homeotic genes (Purugganan and Suddith, 1999), as well as in the example from cassava above. Numerous studies of nuclear gene genealogies are currently under way and promise to provide new insights into the processes identified by Stebbins a half century ago as central for the evolution of plants. This work was supported in part by a grant from the Explorerâs Club, by National Science Foundation Doctoral Dissertation Improvement Grant DEB 9801213 to K.M.O., and by grants from the Rockefeller and Gug- genheim Foundations to B.A.S.
250 / Barbara A. Schaal and Kenneth M. Olsen REFERENCES Allem, A. (1992) Manihot germplasm collecting priorities. In Report of the First Meeting of the International Network for Cassava Genetic Resources, eds. Roca, W. M. & Thro, A. M. (Cent, Int. Agric. Trop., Cali, Colombia), pp. 87â110. Allem, A. (1994) The origin of Manihot esculenta Crantz (Euphorbiaceae) Genet. Res. Crop Evol. 41, 133â150. Allem, A. (1999) The closest wild relatives of cassava (Manihot esculenta Crantz). Euphytica 107, 123â133. Avise, J. (2000) Phylogeography: the History and Formation of Species (Harvard, Cambridge, MA). Avise, J., Arnold, J., Ball, R., Bermingham, E., Lamb, T., Neigel, J. E., Reeb, C. A. & Saunders, N. C. (1987) Intraspecific phylogeography: the mitochondrial DNA bridge between population genetics and systematics. Annu. Rev. Ecol. Syst. 18, 489â522. Bagley, J. & Gall, G. (1998) Mitochondrial and nuclear DNA sequence variability among populations of rainbow trout (Oncorhynchus mykiss). Mol. Ecol. 7, 945â961. Behling, H. (1998) Late Quaternary vegetational and climatic changes in Brazil. Rev. Palaeobot. Palynol. 99, 143â156. Bergelson, J., Stahl, E., Dudek, S. & Kreitman, M. (1998) Genetic variation within and among populations of Arabidopsis thaliana. Genetics 148, 1289â1323. Best, R. & Henry, G. (1992) Cassava: towards the year 2000. In Report of the First Meeting of the International Network for Cassava Genetic Resources, eds. Roca, W. M. & Thro, A. M. (Cent. Int. Agric. Trop., Cali, Colombia), pp. 3â11. Burnham, R. & Graham, A. (1999) The history of neotropical vegetation: new developments and status. Ann. Mo. Bot. Gar. 86, 546â589. Caicedo, A. L., Schaal, B. A. & Kunkel, B. N. (1999) Diversity and molecular evolution of the RPS2 resistance gene in Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 96, 302â306. Castelloe, J. & Templeton, A. R. (1994) Root probabilities for intraspecific gene trees under neutral coalescent theory. Mol. Phylo. Evol. 3, 102â113. Cock, J. (1985) Cassava: new potential for a neglected crop (Westfield, London, UK). Degnan, S. (1993) The perils of single gene treesâmitochondrial versus single-copy nuclear DNA variation in white-eyes (Aves: Zosteropidae). Mol. Ecol. 2, 219â225. Demesure, B., Comps, B. & Petit, R. (1996) Chloroplast DNA phylogeography of the com- mon beech (Fagus sylvatica L.) in Europe. Evolution 50, 2515â2520. Dumolin-LapÃ¨gue, S., Demesure, B., Fineschi, S., Le Corre, V. & Petit, R. (1997) Phylo- geographic structure of white oaks throughout the European continent. Genetics 146, 1475â1487. Fu, Y. & Li, W.-H. (1999) Coalescing into the 21st century: an overview and prospects of coalescent theory. Theor. Pop. Bio. 56, 1â10. Hamrick, J. & Godt, M. (1989) Allozyme diversity in plant species. In Plant Population Genet- ics, Breeding, and Genetic Resources, eds. Brown, A., Clegg, M., Kahler, A. & Weir, B. (Sinauer, Sunderland, MA), pp. 43â63. Hare, M. & Avise, J. (1998) Population structure in the American oyster as inferred by nuclear gene genealogies. Mol. Biol. Evol. 15, 119â128. Jennings, D. (1995) Cassava: Manihot esculenta (Euphorbiaceae). In Evolution of Crop Plants, eds. Smartt, J. & Simmonds, N. (Wiley, New York), pp. 128â132. King, R. & Ferris, C. (1998) Chloroplast DNA phylogeography of Alnus glutinosa (L.) Gaertn. Mol. Ecol. 7, 1151â1163. Mann, C. (1997) Reseeding the green revolution. Science 277, 1038â1043. Neigel, J. & Avise, J. (1986) Phylogenetic relationships to mitochondrial DNA under various demographic models of speciation In Evolutionary Processes and Theory, eds. Karlin, S.& Nevo, E. (Academic, New York), pp. 515â534.
Gene Genealogies and Population Variation in Plants / 251 Newton, A., Allnutt, T., Gilles, A., Lowe A. & Ennos, R. (1999) Molecular phylogeography, intraspecific variation and the conservation of tree species Trends Ecol. Evol. 14, 140â 145. Olsen, K. M. & Schaal, B. A. (1999) Evidence on the origin of cassava: phylogeography of Manihot esculenta. Proc. Natl. Acad. Sci. USA 96, 5586â5591. Piperno, D. & Pearsall, D. (1998) The Origins of Agriculture in the Lowland Neotropics, (Aca- demic, New York). Posada, D., Crandall, K. & Templeton, A. (1999) Geodis, version 2.0 (Brigham Young Univ., Provo, UT). Purugganan, M. & Suddith, J. (1999) Molecular population genetics of floral homeotic loci: departures from the equilibriumâneutral model at the APETALA3 and PISTILLATA genes of Arabidopsis thaliana. Genetics 151, 839â848. Rogers, D. & Appan, S. (1973) Manihot and Manihotoides (Euphorbiaceae): a computer assisted study (Hafner, New York). Sauer, J. (1993) Historical Geography of Crop Plants (CRC, Boca Raton, FL). Schaal, B. A., Hayworth, D. A., Olsen, K. M., Rauscher J. T. & Smith, W. A. (1998) Phylo- geographic studies in plants: problems and prospects. Mol. Ecol. 7, 465â474. Smouse, P. (1998) To tree or not to tree. Mol. Ecol. 7, 399â412. Soltis, D., Gitzendanner, M., Strenge, D. & Soltis, P. (1997) Chloroplast DNA intraspecific phylogeography of plants from the Pacific Northwest of North America. Plant Syst. Evol. 206, 353â373. Stebbins, G. L. (1950) Variation and Evolution in Plants (Columbia, New York). Strand, A., Leebens-Mack, J. & Milligan, B. (1997) Nuclear DNA-based markers for plant evolutionary biology. Mol. Ecol. 6, 113â118. Swofford, D. (1993) PAUP: Phylogenetic Inference Using Parsimony, version 3.1, Ill. Nat. Hist. Survey (Champaign, IL). Templeton, A., Crandall, K. & Sing, C. (1992) A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease and DNA sequence data. III. Cladogram estimation. Genetics 132, 619â633. Templeton, A., Routman, E. & Phillips, C. (1995) Separating population structure from population history: A cladistic analysis of the geographical distribution of mitochon- drial DNA haplotypes in the tiger salamander, Ambystoma tigrinum. Genetics 140, 767â 782. Wright, S. (1951) The genetical structure of populations. Ann. Eugen. 15, 322â354.