Cover Image

HARDBACK
$59.00



View/Hide Left Panel

4
Human Adaptations to Diet, Subsistence, and Ecoregion Are Due to Subtle Shifts in Allele Frequency

ANGELA M. HANCOCK,* DAVID B. WITONSKY,* EDVARD EHLER,* GORKA ALKORTA-ARANBURU,* CYNTHIA BEALL, AMHA GEBREMEDHIN,§ REM SUKERNIK,|| GERD UTERMANN,# JONATHAN PRITCHARD,*** GRAHAM COOP,*†† AND ANNA DI RIENZO*‡‡

Human populations use a variety of subsistence strategies to exploit an exceptionally broad range of ecoregions and dietary components. These aspects of human environments have changed dramatically during human evolution, giving rise to new selective pressures. To understand the genetic basis of human adaptations, we combine population genetics data with ecological information to detect variants that increased in frequency in response to new selective pressures. Our approach detects SNPs that show concordant differences in allele frequencies across populations with respect to specific aspects of the environment. Genic and especially nonsynonymous SNPs are overrepresented among those most strongly correlated with environmental variables. This provides genome-wide evidence for selection due to changes in ecoregion, diet, and subsistence.

*

Department of Human Genetics, University of Chicago, Chicago, IL 60637;

Department of Anthropology and Human Genetics and Department of Biology and Environmental Studies, Charles University, Prague, 128 00 Czech Republic;

Department of Anthropology, Case Western Research University, Cleveland, OH 44106;

§

Department of Internal Medicine, Addis Ababa University, Addis Ababa, Ethiopia;

||

Laboratory of Human Molecular Genetics, Department of Molecular and Cellular Biology, Institute of Chemical Biology and Fundamental Medicine, Russian Academy of Sciences, Novosibirsk, 630090 Russia;

#

Institute for Medical Biology and Human Genetics, Medical University of Innsbruck, 6020 Innsbruck, Austria;

**

Howard Hughes Medical Institute, University of Chicago, Chicago, IL 60637; and

††

Section for Evolution and Ecology and Center for Population Biology, University of California, Davis, CA 95616.

‡‡

To whom correspondence should be addressed. E-mail: dirienzo@uchicago.edu.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 63
4 Human Adaptations to Diet, Subsistence, and Ecoregion Are Due to Subtle Shifts in Allele Frequency AnGelA M. hAnCoCK,* DAviD B. WiTonsKy,* eDvArD ehler,*† GorKA AlKorTA-ArAnBUrU,* CynThiA BeAll,‡ AMhA GeBreMeDhin,§ reM sUKerniK,|| GerD UTerMAnn,# JonAThAn PriTChArD,*** GrAhAM CooP,*†† AnD AnnA Di rienZo*‡‡ human populations use a variety of subsistence strategies to exploit an exceptionally broad range of ecoregions and dietary components. These aspects of human environments have changed dramatically during human evolution, giving rise to new selective pressures. To understand the genetic basis of human adaptations, we combine population genetics data with ecological information to detect variants that increased in fre - quency in response to new selective pressures. our approach detects snPs that show concordant differences in allele frequencies across populations with respect to specific aspects of the environment. Genic and especially nonsynonymous snPs are overrepresented among those most strongly correlated with environmental variables. This provides genome-wide evidence for selection due to changes in ecoregion, diet, and subsistence. *Department of human Genetics, University of Chicago, Chicago, il 60637; †Department of Anthropology and human Genetics and Department of Biology and environmental stud- ies, Charles University, Prague, 128 00 Czech republic; ‡Department of Anthropology, Case Western research University, Cleveland, oh 44106; §Department of internal Medicine, Ad- dis Ababa University, Addis Ababa, ethiopia; ||laboratory of human Molecular Genetics, Department of Molecular and Cellular Biology, institute of Chemical Biology and Funda- mental Medicine, russian Academy of sciences, novosibirsk, 630090 russia; #institute for Medical Biology and human Genetics, Medical University of innsbruck, 6020 innsbruck, Austria; **howard hughes Medical institute, University of Chicago, Chicago, il 60637; and ††section for evolution and ecology and Center for Population Biology, University of California, Davis, CA 95616. ‡‡To whom correspondence should be addressed. e-mail: dirienzo@uchicago.edu. 

OCR for page 63
 / Angela M. Hancock et al. We find particularly strong signals associated with polar ecoregions, with foraging, and with a diet rich in roots and tubers. interestingly, several of the strongest signals overlap with those implicated in energy metabolism phenotypes from genome-wide association studies, including snPs influ- encing glucose levels and susceptibility to type 2 diabetes. Furthermore, several pathways, including those of starch and sucrose metabolism, are enriched for strong signals of adaptations to a diet rich in roots and tubers, whereas signals associated with polar ecoregions are overrepre - sented in genes associated with energy metabolism pathways. M odern humans evolved in Africa approximately 100–200 kya (White et al., 2003), and since then human populations have expanded and diversified to occupy an exceptionally broad range of habitats and to use a variety of subsistence modes. There is wide physiologic and morphologic variation among populations, some of which was undoubtedly shaped by genetic adaptations to local environments. however, identifying the polymorphisms underlying adaptive pheno- types is challenging because current patterns of human genetic variation result not only from selective but also from demographic processes. Previous studies examined evidence of positive selection by scanning genome-wide snP data using approaches that are generally agnostic to the underlying selective pressures. These studies detected outliers on the basis of differentiation of allele frequencies between broadly defined populations (Barreiro et al., 2008; Coop et al., 2009), extended regions of haplotype homozygosity (voight et al., 2006; Wang et al., 2006; Pickrell et al., 2009), frequency spectrum-based statistics (Carlson et al., 2005; Williamson et al., 2007), or some combination of these methods (sabeti et al., 2007; Jakobsson et al., 2008). These approaches are well suited to detect cases in which selection quickly drove an advantageous allele to high frequency, thereby generating extreme deviations from genome-wide patterns of variation. however, selection acting on polygenic traits may lead to subtle shifts in allele frequency at many loci, with each allele mak - ing a small contribution to the phenotype [see Pritchard et al. (2010) for a discussion]. recent genome-wide association studies (GWAs) support this view in that most traits are associated with many variants with small effects and involve a large number of different loci (Manolio et al., 2009). Given that most phenotypic variation is polygenic, adaptations due to small changes in allele frequencies are likely to be widespread. Detection of beneficial alleles that evolved under a polygenic selec - tion model may be achieved by an approach that simultaneously consid - ers the spatial distributions of the allele frequencies and the underlying selective pressures. such an approach was used in the past to identify

OCR for page 63
Human Adaptations to Diet, Subsistence, and Ecoregion /  several paradigmatic examples of human adaptations. For instance, the similarity between the distributions of endemic malaria and those of the thalassemias and sickle cell anemia led to the hypothesis that disease car- riers were at a selective advantage where falciparum malaria was common (haldane, 1949; Allison, 1954). More recent studies of candidate genes support roles for selection on energy metabolism (hancock et al., 2008), sodium homeostasis (Thompson et al., 2004; young et al., 2005), and the ability to digest lactose from milk (Bersaglieri et al., 2004; Tishkoff et al., 2007b) and starch from plants (Perry et al., 2007). Taken together, these examples advance a model whereby exposures to new or intensified selec- tive pressures resulted in physiologic specializations. here, we develop and apply an approach that uses information about underlying selective pressures while also controlling for the important effect of population structure in shaping the spatial distribution of ben- eficial alleles. our approach allows us to detect subtle but concordant changes in allele frequencies across populations that live in the same geographic region but that differ in terms of ecoregion, main dietary component, or mode of subsistence. RESULTS We used genotype data for 61 human populations, including the 52 populations in the human Genome Diversity Project Panel (li et al., 2008), 4 hapMap Phase iii populations (luhya, Maasai, Tuscans, and Gujarati) (www.hapmap.org), and 5 additional populations (vasekela !Kung sampled in south Africa, lowland Amhara from ethiopia, naukan yup’ik and Maritime Chukchee from siberia, and Australian Aborigi- nes). For each of these populations, we gathered environmental data for four ecoregion variables (Fig. s1, available online at www.pnas. org/cgi/content/full/0914625107/DCsupplemental) and seven subsis- tence variables (comprising four subsistence strategies and three main dietary component variables; Fig. s2, available online at www.pnas. org/cgi/content/full/0914625107/DCsupplemental). For each snP and each environmental variable, we contrasted allele frequencies between the two sets of populations using a Bayesian lin - ear model method that controls for the covariance of allele frequencies between populations due to population history and accounts for differ- ences in sample sizes among populations. The statistic resulting from this method is a Bayes factor (BF), which is a measure of the support for a model in which a snP allele frequency distribution is dependent on an environmental variable in addition to population structure, relative to a model in which the allele frequency distribution is dependent on popula - tion structure alone. For subsequent analyses, we use a transformed rank

OCR for page 63
 / Angela M. Hancock et al. statistic based on the location of each snP in the overall distribution of BFs. Because we rank each snP relative to snPs within the same allele frequency range and from the same ascertainment panel, this transformed rank statistic allows us to make comparisons across snP sets. To conduct analyses for the two types of variables (ecoregion and subsistence) as a whole, we also calculated for each snP a minimum rank statistic across all of the variables within each category, which results in a summary statistic for ecoregion and subsistence, respectively. Genic and Nonsynonymous SNPs Are Enriched for Signals of Adaptations to Ecoregion and Subsistence As with any genome-wide scan for selection, there will be snPs that fall in the extreme tail of the distribution of the test statistic. Therefore, we asked whether two classes of snPs that are enriched for functional variation [i.e., genic and nonsynonymous (ns) snPs] are more common in the lower tail of the minimum rank distribution relative to snPs that are likely to be evolving neutrally (i.e., nongenic snPs). As shown in Table 4.1, the ratios of the proportions of both genic and ns snPs to the proportion of nongenic snPs are significantly greater than 1 across at least two tail cutoffs of the BF distribution (1% and 0.5%) for both variable categories. importantly, the enrichment of genic and ns snPs becomes progressively greater in the more extreme parts of the tail. Furthermore, consistent with the fact that a larger fraction of ns snPs compared with genic snPs have functional effects, there is a greater enrichment of ns snPs compared with genic snPs in the more extreme tail of the distribution. These pat- terns suggest that the tail of the BF distribution contains true targets of positive selection. Given that we observed evidence of selection for ecoregion and sub- sistence overall, we next asked which individual variables may be driving TABle 4.1 Proportions of Genic and ns snPs relative to the Proportion of nongenic snPs in the Tail of the Minimum rank Distribution Tail Cutoff Genic:nongenic ns:nongenic variable category 0.05 0.01 0.005 0.05 0.01 0.005 1.06a 1.17a 1.19a 1.20 a 1.58a 1.58b ecoregion 1.04c 1.11a 1.11b 1.60 a 1.87a subsistence 1.12 asupport from >99% of bootstrap replicate. bsupport from >97.5% of bootstrap replicate. csupport from >95% of bootstrap replicate.

OCR for page 63
Human Adaptations to Diet, Subsistence, and Ecoregion /  TABle 4.2 Proportions of Genic and ns snPs relative to the Proportion of nongenic snPs in the Tails of the individual variable Distributions Tail Cutoff Genic:nongenic ns:nongenic variable Category variable 0.05 0.01 0.005 0.05 0.01 0.005 1.06a 1.12b 1.14b 1.18a ecoregion Dry 1.02 1.33 1.05c 1.10a 1.19a 1.19a 1.54a 1.78a Polar 1.06a 1.11a 1.11c 1.15a humid temperate 1.14 1.17 humid tropical 1.01 1.05 1.08 1.06 1.28 1.25 1.32b 1.41b subsistence Agriculture 1.01 1.03 1.04 1.03 1.25a 1.46a Foraging 1.03 1.04 1.04 1.25 horticulture 1.00 0.99 1.00 1.13 1.00 0.89 1.13b 1.34c Pastoralism 1.01 1.05 1.05 1.33 1.37b Main dietary Cereals 1.04 1.06 1.10 1.04 1.12 component Fats, meat, and milk 1.03 1.09 1.07 1.13 1.14 1.29 1.06a 1.11a 1.13a roots and tubers 1.08 1.02 1.05 asupport from >99% of bootstrap replicate. bsupport from >97.5% of bootstrap replicate. csupport from >95% of bootstrap replicate. these signals. To this end, we examined the lower tails of the rank statistic distributions for each individual variable to determine which ones showed the strongest enrichment of genic and ns snPs. several ecoregion vari- ables exhibited a significant excess of genic and ns snPs with low rank statistics, with the strongest signals observed for polar domain (Table 4.2). Fewer individual subsistence variables had strong signals, but two vari - ables are worth noting: the foraging subsistence pattern and roots and tubers as the main dietary component. Fig. 4.1 (Figs. s3-s5, available online at www.pnas.org/cgi/content/full/0914625107/DCsupplemental) illustrates the importance of controlling for population structure to expose these signals, many of which are due to subtle, but consistent, allele fre - quency shifts across geographic regions. These shifts are detectable even in the face of a large effect of population structure in shaping the geographic distributions of allele frequencies. Two ns snPs have extremely high BFs (the highest in their respective frequency bins; Materials and Methods) and provide particularly convincing signals of adaptations to dietary specializations. A snP (rs162036) that is strongly correlated with a diet containing mainly the folate-poor roots

OCR for page 63
transformed allele frequency transformed allele frequency A C −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Amhara Amhara Bantu (Nor th) Bantu (South) San San Bantu (South) Masaai Masaai Bantu (Nor th) Vasekela Mbuti Biaka Mandenka Luyha Vasekela Mbuti Luyha Mandenka Biaka Yoruba Yoruba Russian Basque Orcadian Adygei Tuscan HGDP Bergamo Sardinian Tuscan HGDP Bergamo Sardinian French Tuscan HapMap Tuscan HapMap French Adygei Orcadian Basque Russian Palestinian Palestinian Bedouin Bedouin Druze Druze Mozabite Mozabite Burusho Pathan Kalash Brahui Hazara Makrani Sindhi Kalash Pathan Sindhi Balochi Balochi Gujarati Gujarati Makrani Hazara Brahui Burusho Lahu Lahu Naukan Yup’ik Dai the central 50% interval. Naxi She Mongola Yizu Maritime Chukchee Uygur Dai Han Cambodian Cambodian Uygur Naxi Han Tu Miaozu Miaozu Xibo Tujia She Xibo  / Angela M. Hancock et al. Oroqen Daur Tujia Japanese Yizu Mongola Yakut Hezhen Japanese Oroqen Daur Naukan Yup’ik Hezhen Yakut Tu Maritime Chukchee Australian Aborigines Papuan Papuan Australian Aborigines Melanesian Melanesian Pima Piapoco & Curipaco Maya Pima Karitiana Karitiana Piapoco & Curipaco Surui Surui Maya B transformed allele frequency D transformed allele frequency −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Amhara Mbuti Bantu (South) Biaka Bantu (Nor th) Yoruba Masaai Bantu (Nor th) Luyha Bantu (South) Mandenka Mandenka Yoruba Luyha Biaka Masaai Vasekela Vasekela Mbuti Amhara San San Basque Tuscan HGDP Bergamo Adygei Tuscan HGDP French Adygei Basque French Bergamo Tuscan HapMap Orcadian Sardinian Tuscan HapMap Orcadian Sardinian Russian Russian Druze Palestinian Mozabite Druze Palestinian Mozabite Bedouin Bedouin Balochi Gujarati Brahui Makrani Sindhi Brahui Pathan Pathan Gujarati Balochi Makrani Sindhi Kalash Burusho Burusho Kalash Hazara Hazara Lahu Naukan Yup’ik Uygur Maritime Chukchee Miaozu Dai Tujia Cambodian She Tu Han Yakut Dai Yizu Mongola Miaozu Cambodian Oroqen Yizu Lahu Daur Hezhen Naxi Daur Tu Japanese Japanese Han Yakut Tujia Xibo She Maritime Chukchee Uygur Naukan Yup’ik Mongola Hezhen Xibo Oroqen Naxi Melanesian Australian Aborigines Papuan Melanesian Australian Aborigines Papuan Surui Surui Pima Karitiana Piapoco & Curipaco Maya Maya Piapoco & Curipaco Karitiana Pima a given region that are part of the category of interest, and gray shading denotes transformed allele frequencies were computed by subtracting the mean allele members of the dichotomous category, and all other populations are denoted by vertical lines separate populations into one of seven major geographic regions Asia, oceania, and the Americas). Dark gray points denote populations that are between the two categories in the first region where both were present; then, light gray points. lines are drawn through the mean for the set of populations in and (D) dry ecoregions. snPs were polarized according to the relative difference frequency across populations. snPs with rank <10−4 are included in the plots. (from left to right: sub-saharan Africa, europe, Middle east, West Asia, east ponent roots and tubers, (B) the subsistence strategy foraging, and for (C) polar Patterns of variation in allele frequencies are shown for (A) the main dietary com- variables that showed the strongest enrichment of signal for genic and ns snPs. FiGUre 4.1 Transformed allele frequency plotted against population for the

OCR for page 63
Human Adaptations to Diet, Subsistence, and Ecoregion /  and tubers lies within the methionine synthase reductase (MTRR) gene, which activates the folate metabolism enzyme methionine synthase and is implicated in spina bifida (shaw et al., 2009). Perhaps the most interesting signal comes from a snP (rs4751995) in pancreatic lipase-related protein 2 (PLRP2) that results in premature truncation of the protein and is strongly correlated with the use of cereals as the main dietary component (Fig. 4.2). several lines of evidence support an important role for this protein in a plant-based diet. First, unlike other pancreatic lipases, PlrP2 hydrolyzes galactolipids, the main triglyceride component in plants (Andersson et al., 1996; sias et al., 2004). second, a comparative analysis found that the PlrP2 protein is found in nonruminant herbivore and omnivore pancre- ases but not in the pancreases of carnivores or ruminants (De Caro et al., 2008). our results show that the truncated protein is more common in populations that rely primarily on cereals, consistent with the hypothesis that this variant results in a more active enzyme (lowe, 2002; Berton et al., 2007) and represents an adaptation to a specialized diet. Previous analyses have used broad-scale population differentiation, measured by FST, to identify loci that show extreme allele frequency dif- ferences between populations and, hence, are candidate targets of natural selection. The approach used here is in some ways similar to an FsT-based approach, but it differs in several significant regards (see Discussion). To assess the importance of these differences, we compared our results with those from a simple FsT-based analysis. To this end, we calculated global FsT for each snP and compared these values with the minimum transformed rank statistics for ecoregion and subsistence. The correlations were extremely low (−0.024 and −0.034 for ecoregion and subsistence, Populations that specialize on cereals (left bar) Populations that do not specialize on cereals (right bar) FiGUre 4.2 Average frequencies for PrlP2 W358X (rs4751995) across populations in each major geographic region.

OCR for page 63
0 / Angela M. Hancock et al. respectively). Further, the amount of overlap in the tails of the distribu - tions (5%, 1%, and 0.5%) was slightly lower than that expected by chance for two independent distributions, suggesting that the environmental contrast approach used here differs from, and is therefore complementary to, a broad-scale FST approach. Clarifying the Biological Relevance of the Strongest Signals To identify the pathways that were targeted by selection, we asked whether there is an enrichment of signal for particular canonical path - ways. here, we focused on the individual variables with the strongest enrichment of genic relative to nongenic snPs: roots and tubers as the main dietary component and polar ecoregion. Because we found that proportionally more genic than nongenic snPs have strong correlations with environmental variables, an enrichment of signals for snPs in a particular gene set relative to nongenic snPs could simply reflect this global genic enrichment. Therefore, in this analysis, we examined the tail of the rank statistic distribution and asked whether the proportion of snPs from genes implicated in a given canonical pathway was greater than the proportion of genic snPs from all other genes. The two strongest pathway signals for roots and tubers are with starch and sucrose metabolism and folate biosynthesis (Table 4.3). in light of the fact that roots and tubers are mainly composed of starch and are poor in folates, it is plausible that variation in these pathways is advantageous in populations that rely heavily on these food sources. Among the genes with strong signals in this group, there are several involved in the degra- dation and synthesis of glycogen (GAA and GBE1). A gene coding for the cytosolic β-glucosidase (GBA3) contains several snPs strongly correlated with roots and tubers as the main dietary component. This liver enzyme hydrolyzes β-d-glucoside and β-d-galactoside, and it may be involved in the detoxification of plant glycosides, such as those contained in roots and tubers (de Graaf et al., 2001). several of the pathways with strong signals with polar ecoregion are involved in metabolism (e.g., pyruvate metabo- lism and glycolysis and gluconeogenesis) (Table 4.3). Among the genes in the pyruvate pathway, we observed particularly strong signals in the gene coding for mitochondrial malic enzyme 3 (ME3), which catalyzes the oxidative decarboxylation of malate to pyruvate. interestingly, the gene coding for another mitochondrial malic enzyme (ME2) also contains two snPs strongly correlated with polar ecoregion. These results suggest a link between cold tolerance and energy metabolism and point to specific variants that are likely to influence cold tolerance. Further, our findings are consistent with a previous study that found strong correlations between variants in genes implicated in energy metabolism and winter temperature

OCR for page 63
Human Adaptations to Diet, Subsistence, and Ecoregion /  TABle 4.3 Canonical Pathways enriched in the 1% and 5% Tails of the Minimum rank Distribution snPs in Pathway:other Genic snPs (tail cutoff) variable Category variable Description 0.05 0.01 0.005 5.91a 4.86a 2.38a ecoregion Polar Glycolysis and domain gluconeogenesis 7.04a 5.53a 2.61a Bile acid biosynthesis 6.92a 5.10a 2.72a Pyruvate metabolism 17.42a 12.94a 4.22a 3-Chloroacrylic acid degradation 3.42a 3.39a 1.86a Arginine and proline metabolism 2.72a 2.21a 1.61a subsistence roots and starch and sucrose tubers metabolism 4.62a 3.65a 2.41a Folate biosynthesis asupport from >99% of bootstrap replicate. (hancock et al., 2008) and with studies that show evidence for adaptation in mitochondrial DnA (ruiz-Pesini et al., 2004; Balloux et al., 2009). results of genome-wide association studies with diseases and other complex traits offer an opportunity to connect signals of selection with snPs influencing specific traits and diseases. To this end, we identified a subset of snPs with extremely strong correlations with environmental variables that were also strongly associated with traits from 106 GWAs (Table 4.4). We find that several snPs strongly correlated with subsis- tence, and main dietary component variables are associated with energy metabolism–related phenotypes [high-density lipoprotein cholesterol, electrocardiographic traits and QT interval (el-Gamal et al., 1995), fasting plasma glucose, and type 2 diabetes]. These signals include a snP in the type 2 diabetes gene KCNQ1, where we find that the risk allele is at higher frequency in populations where cereals are the main dietary component. DISCUSSION This genome-wide scan identified targets of adaptations to diet, mode of subsistence, and ecoregion. The environmental variables in our analy - sis were chosen to capture the striking diversity among populations in

OCR for page 63
TABle 4.4 snPs with the strongest signals of selection Among Those Associated with Phenotypic Traits in GWAs  information About Most significant environmental variable Disease/Trait Association Genetic region variable rank Trait P snP Type variable statistic Trait value Chr Position nearby Genes FADS2, FADS3 rs174570 ecoregion humid 2.00 × 10−5 lDl 4.00 × 10−13 11 61353788 tropical Total 2.00 × 10−10 ecoregion hDl cholesterol 4.00 × 10−6 TNXB, CREBL1 rs2269426 subsistence Fat, meat, 2.44 × 10−5 Plasma 3.00 × 10−6 6 32184477 milk eosinophil (MhC Class count iii) MADD, rs7395662 Foragers 5.92 × 10−5 hDl cholesterol 6.00 × 10−11 11 48475469 FOLH1 RPL21 rs10507380 Pastoral 4.07 × 10−4 electrocardiographic 8.00 × 10−6 13 26777526 traits MYC, rs9642880 Pastoral 4.57 × 10−4 Urinary bladder 9.00 × 10−12 8 128787250 BC042052 cancer KCNJ2 rs17779747 Main dietary roots and 1.11 × 10−4 QT interval 6.00 × 10−12 17 66006587 component tubers ZMAT4 rs2722425 roots and 2.20 × 10−4 Fasting plasma 2.00 × 10−8 8 40603396 tubers glucose KCNQ1 rs2237892 Cereals 1.49 × 10−4 Type 2 diabetes 1.70 × 10−42 11 2796327 noTes: Table contains snPs with an environmental rank less than 5 × 10−4 and a GWAs P value of less than 1 × 10−5. Chr, chromosome; lDl, low-density lipoprotein; hDl, high-density lipoprotein.

OCR for page 63
Human Adaptations to Diet, Subsistence, and Ecoregion /  ecoregion, diet, and subsistence. Much of this variation is related to major transitions that occurred during human evolutionary history, including the dispersal out of sub-saharan Africa to regions with different climates and the adoption of more specialized—often less diverse—diets (i.e., farm- ing and animal husbandry vs. foraging). our results aim to clarify the genetics underlying the adaptive responses to these transitions. Most human phenotypes, including adaptive traits like height and body proportions, are quantitative and highly polygenic (Manolio et al., 2009), and most human variation is shared across populations. Therefore, the same adaptive allele may often be independently selected in different geographic areas that share the same environment. The environmental aspects considered in this analysis changed dramatically over human evolutionary time. As a result, selection on standing—rather than new— alleles, which afford a faster adaptive response to environmental change (hermisson and Pennings, 2005), may have played a prominent role in adaptation to new environments. This proposal is supported by expecta - tions of selection models for quantitative traits (Falconer and MacKay, 1996), specifically that selection will generate small allele frequency shifts at many loci until the population reaches a new optimum (Pritchard et al., 2010). Whereas approaches that detect selection under a hard sweep model aim to identify loci that drove a new allele quickly to high frequency in the population (Pritchard et al., 2010), our approach is well suited to detect small shifts in the frequencies of beneficial alleles that have a broad geographic distribution [see hancock et al. (2010) for a more detailed dis- cussion]. For quantitative traits, the method we use may be particularly appropriate for understanding recent human adaptations. in this sense, our results fill an important gap and are useful for reconstructing the genetic architecture of human adaptations. some of our most interesting signals seem to be adaptations to dietary specializations. Although cultural adaptations certainly played an impor- tant role in our ability to diversify, there is strong evidence that genetic adaptations have been crucial as well. A previous genome-wide analysis of sequence divergence between species found evidence for ancient adap - tations along the human lineage in the promoters of nutrition-related genes along the human lineage (haygood et al., 2007). examples of more recent genetic adaptations that were integral for dietary specializations include variants near the lactase gene, which confer the ability for adults to digest fresh milk in agropastoral populations, and an increase in the num - ber of amylase gene copies in horticultural and agricultural populations (Bersaglieri et al., 2004; Perry et al., 2007; Tishkoff et al., 2007b; enattah et al., 2008). our results indicate that genetic adaptations to dietary special- izations in human populations may be widespread. in particular, we find signals of adaptations in populations that heavily depend on roots and

OCR for page 63
 / Angela M. Hancock et al. tubers, which are staple foods in places where cereals and other types of crops do not grow well (e.g., in regions with nutrient-poor soils and with frequent droughts). Given that roots and tubers are rich in carbohydrates, it is particularly compelling that the most significant gene set for popula- tions that depend on this food source is the starch and sucrose metabolism pathway. Further, roots and tubers are low in folic acid, a vitamin with an important role in newborn survival and health; accordingly, we find a strong signal for genes implicated in folic acid biosynthesis in populations that specialize on this food source. Additional signals with diet include those observed in populations that specialize on cereals, with snPs impli- cated in type 2 diabetes (Table 4.4) and in the hydrolysis of plant lipids. Foraging, or hunting and gathering, is the mode of subsistence that characterized human populations since their emergence in Africa until the transition to horticulture, animal farming, and intensive agriculture that occurred starting roughly 10,000 years ago (smith, 1995). With this transition, many aspects of human ecology dramatically changed, from diet and lifestyle to population densities and pathogen loads. Given that our hominin ancestors were foragers, the signal we observe in the contrast between forager and nonforager populations is likely to reflect adapta - tions to the less diverse, more specialized diets in horticulture, animal farming, and agriculture (larsen, 2003). our findings are consistent with the results of an analysis of the NAT2 drug metabolizing enzyme gene, which found a significant difference in the frequency of slow acetylator mutations between forager and nonforager (i.e., pastoral and agricultural) populations. These findings were interpreted as the result of the dimin- ished dietary availability of folates consequent to the subsistence and nutritional shift (luca et al., 2008). ecoregion classifications include information about climatic factors, vegetation, geomorphology, and soil characteristics (Bailey and hogg, 1986). Therefore, they provide an integrated view of many facets of human environments. interestingly, the strongest signal was observed for the polar domain classification and, to a lesser extent, for the dry and humid temperate domains. Although polar habitats presented diverse challenges to human survival, including cold temperature, low Uv radiation, and limited resources, our gene set enrichment analyses suggest that the sig - nals of selection in the polar domain tend to be due to alleles that conferred adaptations to cold stress. in fact, many of the gene sets significantly enriched for signals with the polar domain are directly relevant to energy metabolism and temperature homeostasis. Adaptations in these genes were probably critical in the establishment of stable human populations in the northernmost latitudes of europe and Asia. likewise, signals asso- ciated with the dry and humid temperate domains may reflect relatively ancient adaptations that occurred during the dispersal of anatomically

OCR for page 63
Human Adaptations to Diet, Subsistence, and Ecoregion /  modern human populations. The lack of a significant excess of signals associated with the humid tropical domain may be due to a combination of factors, including the fact that humans reentered the humid tropics outside Africa too recently to generate detectable new adaptations. in some ways our approach is similar to previous analyses based on FsT, but there are two important differences. First, we compare popula- tions on the basis of environmental variables rather than their geographic origin, thus providing greater power to detect allele frequency differences that track the underlying selective pressure. second, unlike other analyses of spatial patterns of variation, we use a test statistic (the BF) that detects a signal relative to a null model that captures aspects of human population structure. Taken together, these two features of our approach allow us to detect novel loci where snPs show subtle, but consistent, patterns across populations. As a result, our findings differ substantially from the results of previous analyses based on broad-scale population differentiation. The overlap in the tails from global FsT and the minimum ranks for subsistence and ecoregion, respectively, are slightly less than expected by chance. A possible caveat to the results presented here is that they are due solely to background selection, whereby the elimination of strong delete- rious alleles continually arising in genic regions effectively reduces the effective population size of these regions compared to the less constrained nongenic regions. As a result, genic regions may be expected to experi- ence higher rates of genetic drift and to exhibit greater differentiation between subdivided populations compared with neutrally evolving loci (Charlesworth et al., 1997; hu and he, 2005). Therefore, purifying rather than positive selection could potentially account for the excess of genic snPs strongly correlated with environmental variables. Although we cannot formally rule out this possibility, we note that two features of our data suggest that background selection does not entirely account for the observed enrichment. one is that the enrichment of genic and ns snPs becomes more pronounced in the more extreme lower tails of the BF dis- tribution, as expected if at least some of the snPs were indeed targets of positive selection. The other feature is that the enrichment of ns snP is quantitatively greater than the enrichment of genic snPs; because a larger fraction of ns snPs affect gene function compared with genic snPs, this is the pattern expected if at least some of the ns snPs increased in frequency because of a selective advantage. our results extend upon and are complementary to results of previous scans for natural selection in humans. By conducting multiple contrasts between populations that differ with respect to ecoregion or subsistence to identify genetic variants that show concordant changes in allele fre-

OCR for page 63
 / Angela M. Hancock et al. quencies across populations, we find a set of adaptive snPs that differs compared with previous analyses that were agnostic to the underlying selective pressure. Further, because the snPs we identify tend to have a global distribution and to show subtle, but consistent, differences in allele frequencies across populations, loci we identify are likely to represent cases of selection on standing variation. As a result, the findings presented here represent an important step toward clarifying the genetic basis of human adaptations. MATERIALS AND METHODS Environmental Variables ecoregion data were obtained for each population on the basis of coor- dinates where samples were collected, except for the vasakela !Kung and the Gujarati, who had recently relocated. For these populations, we used coordinates of their most recent homeland. The individuals who were sampled from the !Kung population were known to have recently relo- cated to schmidtsdrift, south Africa, from the Angola/namibia border, so we used coordinates that reflected their location before this migration. each population was classified into one of four ecoregion domains, which are defined according to a combination of ecologically important aspects of climate. Therefore, the ecoregion variables are closely related to climate, but they may be a more informative representation of climatic variation. The ecoregion domains comprise polar, humid temperate, humid tropical, and dry. We classified each population on the basis of the coordinates of the population using Bailey’s ecoregion Map (Bailey and hogg, 1986). When available, data from Murdock (1967) were used to classify populations according to their main mode of subsistence and dietary specialization. in cases in which Murdock did not have information about a population, we obtained information from the Encyclopedia of World Cultures (levinson, 1991–1996). We classified each population into one of four subsistence categories (foraging, horticultural, agricultural, or pastoral) and into one of three categories based on the main dietary com - ponent (cereals; roots and tubers; or fat, meat, or milk). each population was classified into subsistence and main dietary component categories by two independent researchers, and the small number of discrepancies that were found were resolved by further research. For the five populations that were genotyped by our group, individuals who oversaw collection gave input for classification.

OCR for page 63
Human Adaptations to Diet, Subsistence, and Ecoregion /  Detecting Signals Between SNPs and Dichotomous Environmental Variables To assess evidence for selection related to each dichotomous environ- mental variable, we contrasted the allele frequencies for each snP across populations that differ with respect to the environmental variable. More specifically, we used a Bayesian linear model method that controls for population history by incorporating a covariance matrix of populations and accounts for differences in sample size among populations. This method yields a BF that is a measure of the weight of the evidence for a model in which an environmental variable has an effect on the distribution of the variant relative to a model in which the environmental variable has no effect on the distribution of the variant. on the basis of these BFs, for each snP and each environmental variable, we calculated a transformed rank statistic that was scaled to be between 0 and 1 (with 0 and 1 cor- responding to the highest and lowest BF, respectively); this transformed rank statistic is sometimes referred to as an empirical P value. Calculat- ing this transformed rank statistic allowed us to control for some aspects of snP ascertainment and differences in allele frequencies across snPs. The illumina 650y platform used for genotyping is made up of three panels of tagging snPs that were ascertained in different ways (eberle et al., 2007). To calculate the transformed rank statistic for each snP for a given variable, we found the rank of the snP relative to all other snPs in the same ascertainment panel and within the same allele frequency bin, where there were 10 allele frequency bins, based on the global derived allele frequency. To summarize the evidence for selection for each snP for the two cat- egories of variables (subsistence and ecoregion), we calculated a minimum rank statistic by finding the minimum of the transformed rank statistics across all subsistence and ecoregion variables, respectively. Using these minimum rank statistics, we could ask questions about the evidence of selection for subsistence and for ecoregion overall. Assessing the Evidence for an Excess of Functional SNPs in the Tail of the Distribution To determine whether the lower tail of the rank statistic distribution contains an excess of snPs enriched for function, compared with that expected by chance, we calculated the proportions of genic and ns snPs relative to the proportion of nongenic snPs in the tail. rather than arbi- trarily choosing a single tail cutoff, we examined the enrichment at three tail cutoffs (5%, 1%, and 0.5%). To assess significance for an observed excess, we used a bootstrap resampling technique to obtain confidence intervals on the estimated excess. Because positive selection can result

OCR for page 63
 / Angela M. Hancock et al. in increased linkage disequilibrium near a selected variant, we bootstrap resampled across 500-kb segments of the genome. For each of 1,000 boot - strap replicates, we calculated the proportion of genic and ns snPs rela- tive to the proportion of nongenic snPs in the tail of the distribution. We consider an excess significant for a given tail cutoff if at least 95% of the bootstrap replicates support an excess of snPs enriched for function. Comparison of Results from Environmental Contrasts and FST We calculated global FsT values (Weir and Cockerham, 1984) for the complete set of 61 populations. Then, for each snP, we calculated a trans- formed rank statistic as we had done for the environmental variable contrasts. next, we calculated spearman correlation coefficients between FsT values and the minimum transformed rank statistic from the environ- mental contrast analyses. in addition, we assessed the amount of overlap in the tails of the distributions for FsT and environmental contrasts relative to chance. Canonical Pathway Analysis To determine whether there was an enrichment of signal for a particu- lar canonical pathway, we used a method similar to that used to test for an excess of genic and ns snPs relative to nongenic snPs in the tails of the test statistic distribution. here, we compared the proportion of snPs from a given pathway with the proportion of all other genic snPs in the tail of the minimum rank distribution and of the transformed rank distri - butions for the individual variables with the strongest genic enrichment. To assess significance for the findings and to ensure that the results are not driven by one or a few genomic regions, we applied the same bootstrap approach described above. The lists of genes included in each of the 438 canonical pathways were obtained from the Molecular signatures Data- base (subramanian et al., 2005). Comparison with GWAS Results We downloaded the Catalog of Published Genome-Wide Association studies (hindorff et al., 2009) on July 14, 2009, which includes information about snPs with reported associations with P < 1 × 10−5. We filtered this database for snPs found on the illumina humanhap650y platform; there were entries for 800 unique autosomal snPs implicated in 61 traits. From among these snPs, we identified a set of snPs with extremely low rank statistics (<5 × 10−4) for each of the subsistence and ecoregion variables. Given that most GWAs are performed in populations of european ances-

OCR for page 63
Human Adaptations to Diet, Subsistence, and Ecoregion /  try, we binned the snPs in the illumina panel on the basis of the allele frequency in europeans rather than the global allele frequency to calculate the transformed rank statistics. ACKNOWLEDGMENTS We thank members of the Di rienzo laboratory, John novembre, and Molly Przeworski for helpful discussions during the course of this proj- ect; and Molly Przeworski for thoughtful comments on the manuscript. This work was supported by national institutes of health (nih) Grants DK56670 and GM79558 and an international Collaborative Grant from the Wenner-Gren Foundation (to A.D.r.). A.M.h. was supported in part by American heart Association Graduate Fellowship 0710189Z and by nih Genetics and regulation Training Grant GM07197. G.C. was supported in part by a sloan research Fellowship. J.K.P. acknowledges support from the howard hughes Medical institute.

OCR for page 63