4
The Science of Collecting Genetic Resources

This chapter examines the scientific base for collecting plant genetic resources. It examines the questions of assuring that collections capture a maximum amount of the genetic diversity in a plant population, given limited resources. How many samples must be collected? How many seeds should a sample contain? What is the likelihood of capturing a very rare gene in a collection?

TYPES OF COLLECTIONS

Several kinds of germplasm collections have evolved over the years in response to particular needs: base collections, back-up collections, active collections, and breeders' or working collections. To a certain extent these divisions may be somewhat artificial, because some collections may fulfill more than one purpose. A number of active collections were formerly breeders' collections, for example. The following discussion is intended to describes the variety of purposes that collections must serve.

Base Collections

Base collections provide for the long-term preservation of genetic variability through storage under optimal conditions. Base collection materials are not intended for distribution except to replace materials that have been lost from back-up or active collections. Base collections include the most comprehensive sample of the entire genetic



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies 4 The Science of Collecting Genetic Resources This chapter examines the scientific base for collecting plant genetic resources. It examines the questions of assuring that collections capture a maximum amount of the genetic diversity in a plant population, given limited resources. How many samples must be collected? How many seeds should a sample contain? What is the likelihood of capturing a very rare gene in a collection? TYPES OF COLLECTIONS Several kinds of germplasm collections have evolved over the years in response to particular needs: base collections, back-up collections, active collections, and breeders' or working collections. To a certain extent these divisions may be somewhat artificial, because some collections may fulfill more than one purpose. A number of active collections were formerly breeders' collections, for example. The following discussion is intended to describes the variety of purposes that collections must serve. Base Collections Base collections provide for the long-term preservation of genetic variability through storage under optimal conditions. Base collection materials are not intended for distribution except to replace materials that have been lost from back-up or active collections. Base collections include the most comprehensive sample of the entire genetic

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies variability of a species group. They are static in that they attempt to preserve the genetic variability that exists in accessions in its original, unchanged state. However, they are also dynamic in that additional accessions, such as newly collected materials, new cultivars produced by plant breeding, and populations of genetically enhanced materials, are added as they become available. Storage lives of many decades now appear to be possible, so that the losses of genetic variability that occur during storage and regeneration can be held within acceptable limits. Base collections are typically held under conditions of low relative humidity and at subfreezing (-10° to -20°C) or cryogenic (-150° to -195° C) temperatures. Difficulties are encountered with seeds of some species that cannot withstand chilling or drying. Alternative methods of long-term storage, such as cryopreservation of in vitro cultures, are needed (see Chapter 7). A global network of germplasm collections of various crops has been initiated with the guidance and help of the Food and Agriculture Organization of the United Nations and the International Board for Plant Genetic Resources, by designating different agencies that serve the base and back-up collections for principal crop species. However, different agencies vary markedly in their capabilities to fulfill their designated responsibilities. Back-Up Collections Back-up collections supplement base collections at another location. For example, the U.S. National Seed Storage Laboratory at Fort Collins, Colorado, holds duplicate back-up samples of portions of the maize collection from the Centro Internacional de Mejoramiento de Maíz y Trigo (CIMMYT; International Maize and Wheat Improvement Center) in Mexico and much of the rice collection from the International Rice Research Institute (IRRI) in the Philippines. These back-up holdings are insurance against loss in the primary CIMMYT and IRRI collections. Active Collections Active and base collections often include the same materials. Active collections provide seeds or other propagules for distribution to plant breeders or other users. Consequently, attempts are made to maintain sufficient quantities of materials of each accession in active collections, particularly those in heavy demand, so that requests can be met quickly. Materials in active collections are usually managed

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies Seeds are counted, packaged, and sealed in foil-laminated moisture-proof bags for storage at the National Seed Storage Laboratory. Credit U.S. Department of Agriculture, Agricultural Research Service.

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies under shorter-term and more variable storage standards than those in base collections. Growouts or multiplications to replenish seed supplies in active collections are therefore usually necessary at shorter intervals than they are for base collections, thus putting the genetic integrity of accessions more at risk. Breeders' Collections The breeders' or working collections include materials of frequent use in breeding programs and are usually short term in nature. Breeders know from experience that superior performance in their local region of adaptation has resulted from the long-term assembly of favorable combinations of alleles at many different genetic loci and that attempts to introgress alleles from exotic sources into such adapted materials are almost always detrimental to performance, at least in the short term. Hence, breeders' collections are made up almost exclusively of advanced lines developed in their own programs, together with elite cultivars, advanced breeding lines, and genetically enhanced populations obtained from breeders in regions with similar ecological conditions. It is widely recognized by breeders that reliance on adapted stocks may limit the potential for continued advances in performance. Modern breeders, however, infrequently turn directly to exotic materials in active collections for potentially useful variability. Rather, they usually obtain their exotic variability indirectly from genetically enhanced populations, breeding stocks, or both into which potentially useful alleles have been introgressed. Breeders' collections have, consequently, come to include sharply increasing proportions of genetically enhanced stocks that carry potentially useful alleles in adapted genetic backgrounds. SAMPLING STRATEGIES Sampling plays a critical role in the conservation process. However, consideration of optimum sampling strategies requires identification of appropriate measures of genetic diversity, especially measures of of potentially useful genetic diversity. Marshall and Brown (1975:53) emphasized, "There are definite limits to the numbers of samples which can be handled effectively [in conservation programs that are] imposed by the financial and personnel resources available to carry out each stage in the process." For some species the major limiting factor is an incapacity to collect endangered materials before they are lost; for others, it is the difficulty of preserving materials after they have been placed in collections. For most species, however,

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies the major limiting factors are an incapacity to regenerate and evaluate the materials in collections. Limiting factors highlight the need for judicious allocation of resources with the goal of enhancing efficiency within each of the main elements of the global genetic resource system. PARAMETERS OF GENETIC DIVERSITY The most basic parameters of genetic diversity are the numbers and the frequencies of alleles at Mendelian loci in the sampling unit under consideration (species, ecogeographical region, or population). Each Mendelian locus, however, is capable of mutating to many allelic states, and the potential number of different alleles, summed over all loci, is so large that it is impractical to think in terms of identifying, collecting, or conserving all of them. Appropriate sampling procedures must be used to ensure adequate representation of potentially useful alleles in the target species in samples of specified sizes taken from an appropriate number of collection sites. Neutral Allele Model The neutral theory of molecular evolution suggests that evolutionary change at the molecular level is caused by random drift of selectively equivalent mutants genes. These mutants are not subject to selection relative to one another because they do not affect the fitness of the carrier nor do they modify their morphological, physiological, or behavioral properties. Evolution consists of the gradual, random replacement of one neutral allele by another that is functionally equivalent to the first. The theory assumes that favorable mutations are so rare that they have little effect on the overall evolutionary rate of nucleotide and amino acid substitutions (Ayala and Kiger, 1980; Kimura, 1968). The neutral allele model is the simplest model (null hypothesis) for determining the expected number and frequency of alleles at a locus (Kimura and Crow, 1964). In this model, the expected numbers and frequencies of alleles are a function of the effective population size and the rate at which mutation produces novel neutral alleles at any locus. When either or both of these is small, it is expected that most neutral alleles will be either very common or very rare within populations but that, as population size or mutation rates increase, a higher proportion of neutral alleles will occur at intermediate frequencies. When effective population size is very large, numerous neutral alleles are expected to be present, but most of them will be

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies present at a very low frequency (p < .001). For biologically plausible values of population size and mutation rates, it is unlikely that a population will carry more than a single neutral allele per locus at a high frequency (p > .5), more than two neutral alleles at intermediate frequencies (.5 > p > .1), more than two or three neutral alleles at low frequencies (.1 > p > .01), or more than three neutral alleles at very low frequencies. If evolution is driven by natural selection, however, the frequencies of alleles will differ from expectations under the neutral model. Thus, if selection strongly favors one allele, the expected equilibrium situation within populations is that the highly favored allele is likely to be more frequent than p = .5, that fewer alleles will be present at intermediate frequencies, and that there will be more rare alleles than the number under the neutral model. Balancing selection, in contrast, is expected to lead to greater numbers of alleles present at intermediate frequencies, and if it is strong, it is expected to lead to maintenance in populations of large numbers of alleles, each at a low frequency (that is, allelic distribution profiles are expected to be more flat, platykurtic, than they are under the neutral model). EMPIRICAL STUDIES OF GENETIC DIVERSITY: NUMBERS OF ALLELES AND ALLELIC PROFILES The procedures used to determine the extent of genetic diversity within populations and species fall into two general classes, depending on the nature of the data: (1) those based on measurements made on quantitative traits (traits for which phenotypes are distributed in a continuous metrical series) and (2) those based on counts of numbers of alleles governing qualitative traits (traits for which phenotypes can be classified into discrete categories and the numbers and frequencies of alleles in each category can be enumerated by simple counting rules). Quantitative Characters Many studies of genetic diversity during the past half century were based on means, variances, and covariances estimated from measurements of polygenically controlled quantitative traits. However, the effects of any single locus on a quantitative trait cannot be disentangled from the joint effects of the several to many loci that affect these traits. Consequently, such studies provide, at best, only indirect and imprecise estimates of those measures of genetic diversity that are most informative in the context of sampling for purposes of

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies conservation (for example, numbers of loci, numbers and frequencies of alleles per locus, the proportion of polymorphic loci, and the degree of association among alleles of different loci). Thus, for the purposes of estimating amounts of genetic diversity, population geneticists and conservationists have increasingly turned to loci for which the effect of each allelic substitution is unique and unambiguously distinguishable from allelic substitutions at all other loci. Qualitative or Discretely Inherited Characters The earliest studies of discretely inherited characters were based on the loci governing morphology, pigmentation, physiology, and other discretely inherited variants of single Mendelian loci. Perhaps the most thoroughly studied system is the series of eye-color alleles in natural populations of Drosophila melanogaster and other Drosophila species. It was shown in detailed experiments involving the extraction and measurement of eye pigment that allelic variants, isoalleles, exist in flies with various eye-color classes. The normal wild phenotype (red eyes) includes a range of variants based on the amount of eye pigment; these variants are designated normal isoalleles. Those phenotypes deficient in eye pigment ( for example, white or apricot eyes) are mutant classes and designated mutant isoalleles (see Strickberger, 1976). The allelic profile of the red-eye/ white-eye locus thus features a cluster of normal wild-type isoalleles that can be distinguished from one another by special pigment measurement tests and several additional clusters of mutant isoalleles. In segregating populations derived from hybrids between wild-type and mutant individuals, it was typically found that there was an excess of wild-type individuals and a deficiency of mutant ones. It was also found that individuals with mutant eye-color phenotypes were unable to compete with individuals with wild-type phenotypes under laboratory conditions. The conclusion was that normal isoalleles produce gene products that are necessary for normal functioning of the organism, whereas mutant isoalleles produce only partly functional gene products. Furthermore, mutant isoalleles are usually infrequent or rare (frequency of .001 or less) under conditions of competition in wild-type and experimental populations. This is consistent with the expectation that the mutants are inferior adaptively. Although the majority of loci that govern traits that can be classified into discrete categories, such as color or morphology, appear to be nearly monomorphic for a wild-type allele or a cluster of wild-type isoalleles, some loci appear to have allelic profiles featuring one or two frequent alleles plus a number of infrequent or rare alleles. Also, a few loci are known that

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies feature large numbers of nearly equally frequent alleles; among the best authenticated of such loci are those that govern incompatibilities between haploid pollen and diploid styles in plants. Rare self-in-compatibility alleles, especially novel, newly mutated incompatibility alleles, have a selective advantage over frequent alleles; consequently, it is not surprising that the equilibrium status for such systems is one that features many alleles per locus, with each allele being present at low and more or less equal frequencies. In summary, studies of loci whose variants can be classified into discrete categories indicate that allelic profiles for most such loci in Drosophila and other species tend toward near monomorphism (p > .9) for a single wild-type allele, or a cluster of nearly indistinguishable wild-type isoalleles, accompanied by a number of infrequent to rare mutant alleles or clusters of mutant isoalleles. Furthermore, it is widely accepted that such allelic profiles represent selectional-mutational equilibrium situations that develop because one of the normal wild-type isoalleles has a greater or lesser selective advantage over all other alleles at a locus. Enzyme Variants In recent years studies of enzyme variants (isozymes) have added greatly to the amount of information on allelic frequency profiles for single loci. The use of isozymes for this purpose has several advantages. (1) Enzyme specificity allows alleles to be identified unambiguously with single loci. (2) Allelic expressions are usually codominant; that is, all alleles at a locus are expressed in single individuals. (3) Several to many isozyme loci can be assayed in single individuals. Thus, even though only one-third to one-quarter of mutations alter enzymes in ways that are detectable by protein electrophoresis, isozyme techniques allow a much higher proportion of genetic variability to be detected at the level of the individual locus than previously available methods do. The statistics most often reported in surveys of enzyme variability are the numbers and frequencies of the alleles observed at various loci. However, both of these measures are heavily dependent on sample size. Samples larger than N = 20 gametes are required to obtain reasonably precise measures of the numbers of alleles at a locus; this is because infrequent or even moderately frequent alleles may often go undetected, even when sample sizes per population are as large as N = 80 gametes. It is appropriate to emphasize that several alleles (usually about 3 to 10) occur at many isozyme loci and that estimates of the frequencies of each of these alleles at each locus

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies are very much more sensitive to sample size than are estimates of the total numbers of alleles per locus. Therefore, caution should be used when making comparisons of the allelic diversity reported to be present in different species and even in different populations of the same species. This is because the kinds and numbers of enzyme systems examined, as well as sample sizes, vary widely among investigators and each of these factors can strongly influence the numbers and frequencies of alleles reported. FORMULATION OF SAMPLING STRATEGIES Marshall and Brown (1975) defined four conceptual classes of alleles as follows: (1) common, widespread, (2) rare, widespread, (3) common, localized, and (4) rare, localized. They postulated that the first and fourth classes are likely to be of little concern in formulating sampling strategies. Alleles of the first class (common, widespread) are almost certainly included even in very small samples collected from only a few populations: for example, the probability is greater than .999 that an allele present at a frequency of p > .5 will be included in a sample of only 10 gametes from a single population. In contrast, for alleles of the fourth class (rare, localized), the inclusion of a rare allele will be unusual and serendipitous, even in very large samples taken from a large number of populations. (For example, to detect a rare allele present at a frequency of .001 in only 1 among 100 populations of a species, a sample of 2,994 gametes from each of the 100 populations [about 300,000 gametes] is required to obtain [at a probability of .95] a single copy of this rare allele.) It follows that formulation of sampling strategies for the first and fourth classes of alleles does not lead to general guiding principles in sampling, other than the trivial principle of ignoring both classes or of collecting impractically large samples to detect the rare alleles. However, Marshall and Brown (1975) considered the two remaining conceptual classes of alleles, rare, widespread and the common, localized, to be relevant to the formulation of sampling strategies. Alleles of the second class (rare, widespread) behave as if the species (or population or target area) to be collected is a single, large, unstructured population. It follows that sampling of this class depends only on the total number of gametes in the combined sample from all of the populations in which collections are made. The appropriate strategy is therefore to draw a total sample large enough to ensure that the desired number of rare widespread alleles is included. Table 4-1 gives the sample sizes required, at various probability levels,

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies TABLE 4-1 Numbers of Alleles That Must Be Sampled To Be Certain of Including, at Various Probability Levels, at Least One Copy of an Allele Present at Various Frequencies in the Area To Be Sampled   Number of Alleles Sampled for the Following Allele Frequency in Target Area Probability .1 .05 .02 .01 .001 .99 43 88 228 458 4,603 .95 27 59 148 298 2,994 .9 22 45 114 229 2,301 .8 15 31 79 160 1,608 .5 7 13 34 69 691 to ensure the inclusion of an allele that is present, at various frequencies, within the target area. For alleles that are present at a frequency as low as p = .001, a total sample of 4,603 gametes (approximately 100 gametes from each of 50 populations) will ensure (p = .99) that at least 1 copy of any such rare allele is included. Thus, the detection of widespread rare alleles seems attainable for many species. Alleles of the third conceptual class (common, localized) are alleles that occur in only one or a few habitats, where they reach a high frequency. They may be biologically specialized alleles that enhance adaptation only in certain habitats. These are often the class of alleles of most interest to breeders, because breeders are concerned with improving performance in the specialized habitats of their own ecogeographical regions. Widespread, common alleles have almost certainly been introduced into all habitats, whereas introduction of the localized common alleles of special habitats is likely to have been sporadic. Such considerations led Marshall and Brown (1975) to suggest that locally common alleles are, at least conceptually, the key class of alleles in formulating sampling strategies, whether for capturing the maximal amount of useful variation within populations (within practical limits of sample size) or determining the distribution of environmentally influenced genetic variation within species. Very large numbers of accessions have been accumulated in germplasm collections for many different species and species groups, and these large numbers in themselves have often been taken to indicate that collections include all or nearly all potentially useful genetic variability. However, existing germplasm collections suffer to a greater or lesser extent from four important deficiencies. First, records made

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies at collection sites have often been inadequate in various respects (for example, a description of ecological features of sites, the sizes of samples, and the manner in which samples were taken). Also, clerical and other errors made subsequent to collection have led to inaccuracies in records, with the result that the quality of the sample cannot be evaluated. Second, the numbers of individuals taken from individuals sites have often been very small, so that potentially useful alleles may have been missed in the original sampling of many sites. Third, few or no individuals have been taken from some ecogeographical areas, especially the less accessible sites within areas; hence, species with rare but locally frequent alleles may have been missed in particular. Fourth, the management of accessions within germplasm collections has often been inadequate, with the result that there has been a decay of genetic variability within accessions. EMPIRICAL STUDIES OF ALLELIC DIVERSITY AND ENVIRONMENTALLY INFLUENCED GENETIC DIFFERENTIATION Two aspects of genetic variation are of major importance in formulating sampling strategies: (1) the numbers and the frequencies of alleles at a number of representative loci in single populations (which is important in determining the numbers of individuals to sample within single populations) and (2) on patterns of genetic and evolutionary differentiation among populations (which is important in determining the number and distribution of populations to be sampled). The allelic and genotypic variabilities within and among populations of Avena barbata, a heavily self-pollinating wild relative of cultivated oats, and allelic variability within and among indigenous cultivated races of maize (Zea mays) are used as examples. A Wild, Predominantly Self-Pollinating Species Although samples of A. barbata taken from single sites (populations) have usually been small, and the sampling variances have consequently been large, there have been a few studies of wild populations in which large numbers of individuals from numerous single populations representing a diversity of ecological situations have been examined. A. barbata, the slender wild oat, ranks among the most thoroughly studied of wild species. Large samples have been examined from specific collection sites from throughout the range of the species, including southwestern Asia and the eastern Mediterranean Basin, where A. barbata is endemic (Kahler et al., 1980); Spain, where it is naturalized (Garcia et al., 1989), and California, where the species

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies rapidly became a major component of the annual flora and a highly useful forage species subsequent to its introduction from Spain about two centuries ago (Clegg and Allard, 1972). In the study of Garcia et al. (1989), 4,011 individuals (approximately 100 from each of 42 sites in Spain) were assayed for 14 isozyme loci representing nine enzyme systems. Among the 14 loci, 2 were entirely invariant (only one allele was observed in the total sample of 8,022 gametes) and 3 other loci were entirely invariant in 41 of the 42 populations sampled (each of these 3 loci was weakly polymorphic for a single infrequent allele in the same single population). All of these 5 loci were entirely invariant in California but weakly polymorphic for 1 to 3 or more infrequent (p < .01) or rare (p < .001) alleles in southwestern Asia. Allelic profiles were closely similar for 4 additional isozyme loci; these 4 loci differed from the 5 invariant or nearly invariant loci discussed above primarily because many of the Spanish populations were weakly polymorphic for 1 to 3 or more infrequent or rare alleles at each locus and that both the southwestern Asian and Californian populations also tended to be slightly more variable for rare or infrequent alleles. Thus, the gene pools of southwestern Asia, Spain, and California can be characterized as nearly identical in allelic compositions for these 9 invariant or weakly polymorphic loci. Nearly all of the 42 Spanish populations were moderately to highly variable for the 5 remaining isozyme loci among the 14 studied. In the majority of Spanish populations, allelic profiles for these 5 variable loci typically featured a single predominant allele (p > .7) plus 1 allele present at a low frequency (p = .01 to .10) and several infrequent (p < .01) or rare (p < .001) alleles. Occasionally, allelic profiles within populations featured a single predominant allele at a very high frequency (p > .98) plus 1 or 2 infrequent or rare alleles. Alternatively, allelic profiles occasionally featured 2 or 3 alleles present at intermediate frequencies (p = .2 to .6) together with several infrequent or rare alleles. As was the case with the 9 nearly invariant loci, the southwestern Asian populations were more variable than the Spanish populations for these 5 moderately to highly variable loci. The empirical results presented above indicate that the alleles of A. barbata fall into three categories relevant to sampling. Alleles that are highly frequent (p > .6) in any single population or region are also nearly always highly frequent in all population and all regions. These predominant alleles appear to be the wild-type alleles of classical genetics and the class one (common, widespread) alleles postulated by Marshall and Brown (1975). Alleles that are rare (p < .001) in any population or region are also nearly always rare in all populations and regions. These alleles

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies apparently contribute significantly to high levels of adaptation in very few, if any, environments; they evidently remained rare in populations because selection removed them at the same rate at which they were produced by recurrent mutational events. They appeared to be the deleterious mutants of classical genetics, and thus, they fit into either the class two (rare, widespread) or the class four (rare, localized) allele of Marshall and Brown (1975). Because of practicalities of sampling, however, it is probably not worthwhile in practice to attempt to distinguish between these two conceptual classes. Finally, there are alleles whose frequencies vary widely from population to population. The pattern of distribution most commonly observed for these alleles was one in which a given allele was infrequent (p < .01) or rare (p < .001) in the majority of populations but present at low to moderate frequencies (p = .01 to .05) in an occasional population. However, in a few cases a given allele was observed in many different populations but at frequencies that varied from p < .01 to p = .5. Alleles whose frequencies vary from population to population thus appear to fit into Marshall and Brown's common, localized category (class three), although the fit appears to require somewhat flexible definitions of what constitutes common and localized. Among the 137 alleles that have been recorded for these 14 loci of A. barbata, about 10 percent (1 allele for each locus) were consistently highly frequent and omnipresent. This class of alleles (Marshall and Brown's class one alleles) therefore contributed relatively little to either allelic variability within or environmentally influenced genetic differentiation among populations. The majority of the 137 alleles (about 65 percent) were consistently rare (p < .001) and were usually found in only 1 or few populations. Because of their infrequency, these rare alleles (classes two and four of Marshall and Brown) contributed little to allelic variability within or among ecogeographic populations of A. barbata. About 35 (26 percent) of the 137 alleles were found in most but not all populations and regions, often reaching frequencies of between .01 and .1 and occasionally as high as .5. Such variations in allelic frequencies were often correlated with readily observable and measurable environmental factors (rainfall, temperature, slope, exposure, soil type). Alleles whose frequencies vary widely from population to population (class three of Marshall and Brown) therefore appear to be responsible for most of the allelic variation that exists within and among populations of A .barbata. The gene pools of different populations of A. barbata within a region and those of different regions are closely similar in allelic content when they are compared on a locus-by-locus basis. In contrast, the arrays of multilocus genotypes found in different regions and

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies different populations, including closely neighboring populations occupying different habitats, often differ strikingly. Similarly, arrays of multilocus genotypes found in colonial gene pools are often very different from those found in ancestral gene pools; for example, multilocus allelic complexes adapted to contrasting extremely hot and arid versus cool and moist habitats in California have not been found in Spain or southwestern Asia (Allard,1988; Garcia et al., 1989). Isozyme Variation Within and Among Cultivated Races of Maize, An Outbreeding Species Mexico is the homeland of teosinte, the only close relative of maize (Wilkes, 1967); Mexico is also the cradle of maize domestication (Mangelsdorf, 1974). Patterns of isozyme variation have been studied in races of maize from all parts of Mexico, as well as from South and North America and the West Indies. Consequently, Mexican maize provides a convenient standard against which to compare patterns of isozyme variability within and among races of maize from various geographical areas and within and among other cultivated outcrossing species. Doebley et al. (1985) examined 12 plants (24 gametes) from each of 93 collections of maize from Mexico and 1 collection from Guatemala. Their total sample size was 1,128 individuals (2,256 gametes). The 94 collections represented 34 races (Hernandez and Alanis, 1970; Wellhausen et al., 1952). Each plant was analyzed for 13 enzyme systems; 163 alleles representing 23 distinguishable loci were observed (an average of 7.09 alleles per locus). Table 4-2 gives the distribution of the 163 alleles in frequency classes based on the total sample and also the number of races in which alleles of each frequency class were observed. A single highly frequent allele (p > .6) was observed for 20 among the 23 loci (p > .9 for 10 loci); the frequencies of the most common alleles at the 3 remaining loci were .432,.402, and .255. All alleles that were present at a frequency of p > .6 were ubiquitous, occurring in all 34 races (Table 4-2). Among the 163 alleles, 20 (12 percent) fell into this category and they can reasonably be placed in the common, widespread class (class one) of Marshall and Brown (1975). In contrast, 93 alleles (57 percent), which were present in the total sample at a frequency of p < .01, were nearly always confined to a single race. Thus, nearly all rare alleles appear to fit into class two or class four of Marshall and Brown (1975). The 50 remaining alleles (31 percent), which were present in the total sample at frequencies of .01 to .5, occurred in many but not all populations (Table 4-2). Other studies (Bretting et al., 1987, 1990; Goodman and Stuber, 1983) show

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies TABLE 4-2 Numbers of Alleles Observed in Various Frequency Classes and the Numbers of Maize Races in Which Individual Alleles Were Observed Frequency Class (p) Number of Alleles in Frequency Class Number of Allele-Race Combinations Observed     Mean Range >.6 20 34 34–34 .3–.5 6 33 31–34 .2–.3 2 29 29–30 .1–.2 8 25 9–31 .05–.1 9 20 10–23 .01–.05 25 9 2–19 .001–.01 77 2 2–7 <.001 16 1 1–1 NOTE: A sample of 2,256 gametes (1,128 individuals) from 34 Mexican races of maize was used. SOURCE: Data derived from Doebley, J. F., M. M. Goodman, and C. W. Stuber. 1985. Isozyme variation in races of maize from Mexico. Am. J. Bot. 72(5):629– 639. that the frequencies of such alleles tend to vary over a wide range from race to race. Thus, alleles that vary in frequency from race to race appear to fit into Marshall and Brown's class two. Doebley et al. (1985) reported that 88 percent of the total allelic variability resides within races and 12 percent resides among the 34 Mexican races of maize; they attributed a large part of this variability to differences in the frequencies of alleles that are neither ubiquitous nor rare. Principal component and cluster analyses showed that variation among races was continuous and there were no well-defined race complexes; weakly defined differentiated groups were apparent (high-elevation races, northern and northwestern races, southern and western low-elevation dent and flour maize races), but in general, races of maize were not sharply differentiated. Overall, the observed allelic variability within races of maize (a cultivated outbreeding species) is thus remarkably similar to that observed within populations of A. barbata , a wild-type inbreeding species. SAMPLE SIZES FOR EACH COLLECTION SITE The first step in determining the optimum partition of resources within and among sites in the regions(s) from which samples are to be

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies collected is to determine, at some acceptable level of probability, the optimum number of seeds or other propagules needed to capture potentially useful alleles at single collection sites. It is wasteful of resources if the sample sizes taken from single sites are too small to capture potentially useful alleles present at the site. The time and resources required to travel to a site, record ecological and other relevant information about the site, send the collected materials through customs, and so on are also wasted. However, it is also wasteful of resources if the collector takes samples from single collection sites larger than those needed to capture the alleles and provide for long-term storage in germplasm banks or the distribution of accessions to users. Marshall and Brown (1975:62–63) argued that the great majority of common alleles or allelic combinations (whether widespread or local) presumably represent adaptive variants maintained in populations by some form of balancing selection (Dobzhansky, 1970).… Consequently, common variants are likely to be of far greater interest to plant breeders than rare variants.… [Hence] the aim of plant exploration can be defined as the collection of at least one copy of each variant occurring in the target populations with frequency greater than 05. If only 2 alleles, a1 and a2, are present at a locus at frequency p, (a1) = .95 and p2 (a2) = .05 at a collection site, a sample of 59 gametes will provide at least 1 copy of each of the 2 alleles with a 95 percent certainty (Equation 4-1). If more than 2 alleles per locus are present at a frequency of q > .05, the sample sizes required to achieve this goal are not drastically larger; for example, for a locus with an allelic profile of .8, .05, .05, .05, a random sample of 80 gametes will achieve the objective of capturing at least 1 copy of each of the 4 alleles with a probability of p = .98. Considerations such as these led Marshall and Brown (1975:73) to the following conclusion: "In most circumstances, sample size should not exceed 50 plants per population and in no circumstance is it desirable to collect more than 100 plants per population." There are, however, two important additional practical considerations that the collector must take into account: the need to capture lower-frequency alleles in sampling, and how large a sample should be. Capturing Lower-Frequency Alleles Experimental results indicate that many localized alleles are present at local frequencies of no greater than p = .05, and these variants may be of interest to collectors. Capturing an allele (p = .95) present at a

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies frequency of p = .01 at a single site requires a sample size of N = 298 gametes, which is 5 times as large as that necessary for an allele present at a frequency of p = .05. Sample of 100 individuals (200 gametes), the maximum recommended by Marshall and Brown (1975), may thus often fail to capture some of the potentially useful alleles at a single site. The usual procedure, however, is to collect more than a single sample per location, which greatly increases the probability of capturing rare, but locally important alleles. A single gametic sample of size 160 has only an 80 percent probability of capturing an allele with a frequency of .01, but 2 such samples have a 96 percent probability of capturing the allele. There are strong advantages to collecting more samples per location rather than collecting more individuals per sample. Perhaps the most important of these is that any individual sample may be chosen from a population which, by accident such as local drift (the "bottleneck" effect), may not even possess the allele(s) of interest. It is also important that the alleles sampled should represent the population, hence one should choose from distinct individuals rather than choose two peas from the same pod or 300 sorghum seeds from a single head. Alternatively, an increase in sample sizes to 200 or more individuals (400 or more gametes) substantially increases the probability of capturing potentially useful alleles if present, usually without a significant increase in the resources expended at the site. This is because a large proportion of the total time, effort, and expense involved in collecting usually goes for traveling from location to location, inspecting and choosing the sites to be sampled at each location, and recording the ecological features of the sites rather than for actually collecting seeds or other propagules from the location. However, given that 400 gametes are to be sampled at a small village, four samples of 100 gametes each are more likely to capture alleles of local importance than a single sample of 400, unless the alleles are distributed very uniformly throughout the village. Size of the Sample Additional practical considerations include the size of the sample that (1) can be collected and transported from plants that produce very large seeds or from plants that must be propagated vegetatively, or (2) may be limited by problems of transporting bulky materials or survival during transport. Ideally, the sample size from each population unit should be large enough to provide not only for conservation but also for distribution to users without the need for immediate

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies multiplication, if field seed quality is acceptable. Ex situ multiplication is difficult for many species, nearly always costly in terms of resources and time, and nearly always hazardous to genetic integrity. Consequently, if feasible, sufficient seeds or other propagules should be collected from each site to provide for both long-term storage and immediate use. If this is not feasible, the sample from each site should, if at all possible, be large enough so that the first multiplication (regeneration) will meet both of these goals. The sample should always, except under severe conditions, be adequate to provide sufficient regenerated seeds for long-term storage. The identification of appropriate populations to sample is usually relatively straightforward for annual, cultivated crop species such as cereal grains, pulses, and oil-seed crops. Farmers harvest seeds in bulk, usually by field, and set aside part to be sown as the next crop. Seed numbers are very large, and the mixing during harvesting and cleaning virtually guarantees that all large samples from a given seed source will include all potentially useful genetic variability. A field or a farm seeded from a common source is the unit of sampling. It is CAPTURING LOW-FREQUENCY ALLELES The relationship between the size and number of collection samples and the probability of recovering low-frequency alleles is illustrated by the following. Suppose at one diploid locus there are 2 alleles a1 and a2 that occur in a population at frequencies of p1 = .90 and p2 = .10 that a sample of 10 zygotes or seeds ( N = 20 gametes) is assayed. Even with a sample this small it is virtually certain (p > .9999) that at least 1 copy of allele a1 will be included, that is, allele a1 can be detected in every sample of N = 20 gametes. The probability is also high (p = .88) that the less frequent allele, a2, will also be detected; that is, in a group of 10 different samples there will be a copy of allele a2 found in 8 or 9 of those samples (where each sample is N = 20 gametes). If, however, alleles a1 and a2 are present in the population at frequencies p1(a1) = .95 and p2(a2) = .05, the probability of detecting allele a1 increases slightly, but the probability of detecting allele a2 decreases to .64; that is, only about two-thirds of samples of size N = 20 gametes are expected to include a copy of a2. If p1 and p2 are .99 and .01, or .999 and .001, respectively, the probabilities that samples of size N = 20 gametes will include at least 1 copy of allele a2 fall to .18 and .02, respectively, that is, only about 2 of 10 and 2 of 100 samples of size N = 20 gametes are expected to contain allele a2. The table below gives the probabilities

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies often possible to purchase large enough quantities of seeds from such a field to satisfy all needs associated with conservation or short-term distribution to users if collection can be timed with harvest. Market samples should be avoided because their origins may be obscure, they may be mixtures from different regions, or in rare cases, they may represent imported seeds. How many seeds or other propagules should collectors take under such ideal conditions? Assume that a sample from each collection site is to be distributed to a base collection, a back-up collection, five active collections, and 50 samples for immediate distribution to users. Assume further that the size of each sample must be 50 seeds, the minimal number suggested by Marshall and Brown (1975) on the basis of sampling considerations alone. This requires that the original sample must be no smaller than 2,850 seeds. However, some will be lost during transport or passage through quarantine and perhaps half of all individuals distributed for long-term storage will produce few or no progeny. Hence, samples distributed for these various purposes ideally should be at least twice as large as indicated by that the less frequent allele a2, will be detected in sample sizes of 20, 40, 80, or 160 gametes when it is present in a population at frequencies of .10, .05, .01, and .001. The use of multiple or larger samples greatly increases the probability of capturing an allele. The formula for N samples is -(1- p)N. For example, an allele with the frequency of .01 has a probability of .18 of being found in 1 sample size of 20. Finding the same allele in a collection of 8 samples of size 20 is 1 - (1-.18)8 = .80 probability, which is the same probability as finding it in 1 sample size of 160. To reach a probability of .96 in capturing an allele with a frequency of .1 would require 16 samples of size 20 or 2 samples of size 160. Probabilities That the Less Frequent Allele (a2) Present at a Diploid Locus at Various Frequencies Will Be Detected in Samples of N = 20, 40, or 160 Gametes per Population or Site   Probability for a Sample Size (N) of Frequency 20 40 80 160 .10 .88 .99 >.99 >.99 .05 .64 .87 .98 >.99 .01 .18 .33 .55 .80 .001 .02 .04 .08 .15

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies sampling considerations alone. At harvest, sample sizes of several thousands of seed or other propagules are not difficult or impracticable to obtain from a single field. Sufficiently large samples may, however, be out of reach when there are limited numbers of individuals at the collection site, when seeds or other propagules are bulky and difficult to transport, or when the reproductive unit is actively growing somatic tissue. In such cases, the collector should attempt to collect the equivalent of no fewer than 60 gametes (30 diploid individuals) to satisfy minimal sampling requirements and should arrange for the first multiplication to be large, stress-free, and carried out promptly. Natural populations of wild relatives are often distributed in more or less distinct patches over a much wider range of habitats than their domesticated descendants. For both wild relatives and cultivated plants, several ecological and population factors must be taken into account in deciding where samples should be collected. Ecological factors include slope, soil type and drainage, and differences in the flora and fauna in the various habitats. The most important population factors are the mating system and the mobility of the species (especially the mobilities of pollen and seeds in plants); these factors often have very large effects on within-population organization as well as on genetic differentiation. The goal of capturing useful alleles is most likely to be met by taking stratified samples from diverse sites. Stratified sampling is recommended even if the target area appears to be ecologically homogeneous. It may be troublesome to obtain adequately large samples, especially from wild-type populations, when (a) few individuals exist, (b) many individuals produce very few seeds or other propagules, or (c) the collecting season is short (for example, because of seed shattering) and varies in time from year to year. Limited reproductive capacity is most likely encountered in more severe, harsher habitats or during years with limited rainfall. Under such conditions, repeated visits will be necessary to obtain adequate samples. NUMBER OF SITES TO SAMPLE Allelic and genotypic diversity is not distributed evenly over the range of most species. The collection sites should represent as many distinctive environments as possible within the collector's resources. The total number of sites from which samples can be obtained depends on factors such as the size of the region to be sampled, the quality of transportation, the terrain, the length of the collecting season within different habitats in the region, the time required for collection

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies at each site, and the amount of cooperation available from local collaborators. Careful advanced planning is important to ensure optimal coverage at the proper time. The mating system of the target species can also be a guide to the optimum number of stratified sites from which samples should be obtained. Allelic and genotypic variability tends to be more uniformly distributed for outbreeders than it is for inbreeders. In either case, the number of distinct sites sampled should be the maximum possible, provided that the samples taken at individual sites are large enough to capture potentially useful local genetic variability. RECOMMENDATIONS The scientific bases for the efficient collection, preservation, and distribution of plant genetic resources are all well understood. Particular attention is directed to the following recommendations concerning the size of samples to be collected, strategies for sampling genetic diversity from different ecogeographical regions, and the need for back-up storage of collections. All major germplasm collections must be protected against catastrophic loss by back-up storage at other institutions. Duplication of accessions at one or more institutions provides security against loss. Cooperating institutions do not need to test or regenerate seeds in back-up collections. Rather, primary collection managers should ensure that the samples sent to back-up collections are viable. More efforts are needed to back up collections. Strategies for sampling genetic diversity of crop species should be based on an understanding of the ecogenetic structure of species and populations. Materials as they now exist in germplasm collections, including the best managed collections, are rarely adequate for study of population genetics of species. Information on the distribution of the allelic and genotypic variations within and among populations is not available for the majority of domesticated plants or animals or their wild relatives. Therefore, in formulating sampling strategies at present, there is no alternative other than to take advantage of information that is available from those species that have been studied most thoroughly and to extrapolate from that information to species about which little is known. The size of collected samples ideally should always be sufficiently large to minimize or eliminate the need for regeneration prior to storage. Repeated regenerations result in loss of alleles and sometimes of entire samples and should be avoided whenever possible. However,

OCR for page 131
Managing Global Genetic Resources: Agricultural Crop Issues and Policies field-collected seeds may not meet minimum standards of viability for base collections. In such cases it is essential to minimize the genetic shifts or losses that can accompany regeneration (see Chapter 5). The size of a collected sample should, where practical, be adequate for deposition in both base and back-up collections. Duplicate samples should be supplied to host country institutions, base collections, and back-up collections. Clear responsibility for back-up collections needs to be provided, along with adequate passport data to enable the recovery of the entire collection in case of natural or civil disaster at the base site. When collecting seed, samples should be chosen from as wide an ecogeographic and ethnological base as is possible. Given a fixed number of seeds to be collected, it is better to collect from more sites than to collect more seeds at fewer sites. Prior to collecting, ecogeographic and sociological or ethnological planning should identify the range of locations most desirable to sample.