Cover Image

HARDBACK
$59.00



View/Hide Left Panel

8
Genome-wide Patterns of Population Structure and Admixture Among Hispanic/Latino Populations

KATARZYNA BRYC,* CHRISTOPHER VELEZ, TATIANA KARAFET, ANDRES MORENO-ESTRADA,*§ ANDY REYNOLDS,* ADAM AUTON,*|| MICHAEL HAMMER, CARLOS D. BUSTAMANTE,*§** AND HARRY OSTRER**

Hispanic/Latino populations possess a complex genetic structure that reflects recent admixture among and potentially ancient substructure within Native American, European, and West African source populations. Here, we quantify genome-wide patterns of SNP and haplotype variation among 100 individuals with ancestry from Ecuador, Colombia, Puerto Rico, and the Dominican Republic genotyped on the Illumina 610-Quad arrays and 112 Mexicans genotyped on Affymetrix 500K platform. Intersecting these data with previously collected high-density SNP data from 4,305 individuals, we use principal component analysis and clustering methods FRAPPE and STRUCTURE to investigate genome-wide patterns of African, European, and Native American population structure within and among Hispanic/Latino populations. Comparing autosomal, X and Y chromosome, and mtDNA variation, we find evidence of a significant sex bias in admixture proportions consistent with disproportionate contribution of European male and Native American female ancestry to present-day populations. We also find that patterns of linkage disequi-

*

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14850;

Human Genetics Program, Department of Pediatrics, New York University School of Medicine, New York, NY 10016;

Arizona Research Laboratories Division of Biotechnology and Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721; and

§

Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305.

||

Present address: Wellcome Trust Centre for Human Genetics, Oxford OX3 7BN, UK.

**

To whom correspondence may be addressed; e-mail: cdbustam@stanford. edu or harry. ostrer@nyumc.org.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 147
8 Genome-wide Patterns of Population Structure and Admixture Among Hispanic/Latino Populations KATArZynA BryC,* ChrisToPher veleZ,† TATiAnA KArAFeT,‡ AnDres Moreno-esTrADA,*§ AnDy reynolDs,* ADAM AUTon,*|| MiChAel hAMMer,‡ CArlos D. BUsTAMAnTe,*§** AnD hArry osTrer†** hispanic/latino populations possess a complex genetic structure that reflects recent admixture among and potentially ancient substructure within native American, european, and West African source populations. here, we quantify genome-wide patterns of snP and haplotype variation among 100 individuals with ancestry from ecuador, Colombia, Puerto rico, and the Dominican republic genotyped on the illumina 610-Quad arrays and 112 Mexicans genotyped on Affymetrix 500K platform. inter- secting these data with previously collected high-density snP data from 4,305 individuals, we use principal component analysis and clustering methods FrAPPe and sTrUCTUre to investigate genome-wide patterns of African, european, and native American population structure within and among hispanic/latino populations. Comparing autosomal, X and y chromosome, and mtDnA variation, we find evidence of a signifi- cant sex bias in admixture proportions consistent with disproportionate contribution of european male and native American female ancestry to present-day populations. We also find that patterns of linkage disequi - *Department of Biological statistics and Computational Biology, Cornell University, ithaca, ny 14850; †human Genetics Program, Department of Pediatrics, new york University school of Medicine, new york, ny 10016; ‡Arizona research laboratories Division of Bio- technology and Department of ecology and evolutionary Biology, University of Arizona, Tucson, AZ 85721; and §Department of Genetics, stanford University school of Medicine, stanford, CA 94305. ||Present address: Wellcome Trust Centre for human Genetics, oxford oX3 7Bn, UK. **To whom correspondence may be addressed; e-mail: cdbustam@stanford. edu or harry. ostrer@nyumc.org. 

OCR for page 147
 / Katarzyna Bryc et al. libria in admixed hispanic/latino populations are largely affected by the admixture dynamics of the populations, with faster decay of lD in populations of higher African ancestry. Finally, using the locus-specific ancestry inference method lAMP, we reconstruct fine-scale chromosomal patterns of admixture. We document moderate power to differentiate among potential subcontinental source populations within the native American, european, and African segments of the admixed hispanic/ latino genomes. our results suggest future genome-wide association scans in hispanic/latino populations may require correction for local genomic ancestry at a subcontinental scale when associating differences in the genome with disease risk, progression, and drug efficacy, as well as for admixture mapping. T he term “hispanic/latinos” refers to the ethnically diverse inhab- itants of latin America and to people of latin American descent throughout the world. Present-day hispanic/latino populations exhibit complex population structure, with significant genetic contribu - tions from native American and european populations (primarily involv- ing local indigenous populations and migrants from the iberian peninsula and southern europe) as well as West Africans brought to the Americas through the trans-Atlantic slave trade (sans, 2000; s. Wang et al., 2008). These complex historical events have affected patterns of genetic and genomic variation within and among present-day hispanic/latino popu- lations in a heterogeneous fashion, resulting in rich and varied ancestry within and among populations as well as marked differences in the contri- bution of european, native American, and African ancestry to autosomal, X chromosome, and uniparentally inherited genomes. Many key demographic variables differed among colonial latin Amer- ican populations, including the population size of the local pre-Columbian native American population, the extent and rate at which european set- tlers displaced native populations, whether or not slavery was intro- duced in a given region, and, if so, the size and timing of introduction of the African slave populations. There were also strong differences in ancestry among social classes in colonial (and postcolonial) populations with european ancestry often correlating with higher social standing. As a consequence, present-day hispanic/latino populations exhibit very large variation in ancestry proportions (as estimated from genetic data) not only across geographic regions (sans, 2000; s. Wang et al., 2008), but also within countries themselves (seldin et al., 2007; silva-Zolezzi et al., 2009). in addition, the process of admixture was apparently sex-biased and preferentially occurred between european males and Amerindian and/or African females; this process has been shown to be remarkably

OCR for page 147
Population Structure and Admixture Among Hispanic/Latino Populations /  consistent among countries and populations including Argentina (Dipierri et al., 1998), ecuador (González-Andrade et al., 2007), Mexico (Green et al., 2000), Cuba (Mendizabal et al., 2008), Brazil (Marrero et al., 2007), Uruguay (sans et al., 2002), Colombia (Carvajal-Carmona et al., 2003), and Costa rica (Carvajal-Carmona et al., 2003). The rich diversity of variation in ancestry among hispanic/latino populations, coupled with consistent differences among populations in the incidence of chronic heritable diseases, suggests that hispanic/latino populations may be very well suited for admixture mapping (smith et al., 2001; González Burchard et al., 2005). For example, differences in relative european ancestry proportions correlate with higher susceptibility in Puerto ricans to asthma as compared with Mexicans (salari et al., 2005). Data have also shown an increased risk of breast cancer in latinas with greater european ancestry (Fejerman et al., 2008) and an interplay between African ancestry and cardiovascular disease and hypertension in Puerto ricans from Boston (lai et al., 2009). hispanic/latinos are also likely to play an increasingly important role in multi- and transethnic genetic studies of complex disease. Genome-wide scans have identified candidate markers for onset of type 2 diabetes in Mexican-Americans from Texas (hayes et al., 2007) as well as a region on chromosome 5 associated with asthma in Puerto ricans (Choudhry et al., 2008). Quantifying the relative contributions of ancestry, environment (including socioeconomic status), and ancestry by environment interac - tion to disease outcome in diverse hispanic/latino populations will also be critical to applying a genomic perspective to the practice of medicine in the United states and in latin America. For example, whereas european ancestry was associated with increased asthma susceptibility in Puerto ricans (salari et al., 2005), it was also shown that the effect was moder- ated by socioeconomic status (Choudhry et al., 2006). This suggests that quantifying fine-scale patterns of genomic diversity among diverse U.s. and non-U.s. hispanic/latinos may be critical to the efficient and effective design of medical and population genomic studies. A fine-scale population genomics perspective may also provide a powerful means for understand- ing the roles of ancestry, genetics, and environmental covariates on disease onset and severity (González Burchard et al., 2005). here, we introduce a larger, high-density snP and haplotype dataset to investigate historical population genetics questions—such as variation in sex-biased ancestry and genome-wide admixture proportions within and among latino populations—as well as provide a genomic resource for the study of population substructure within putative european, African, and native American source populations. our dataset includes three latino populations that are underrepresented in whole-genome analyses, namely, Dominicans, Colombians, and ecuadorians, as well as Mexicans

OCR for page 147
0 / Katarzyna Bryc et al. and Puerto ricans, the two largest hispanic/latino ethnic groups in the United states. This allows comparison of patterns of population structure and ancestry across multiple U.s. hispanic/latino populations. our dense snP marker panel is formed by the intersection of two of the most com - monly used genotyping platforms, allowing for the inclusion of dozens of native American, African, and european populations for ancestry infer- ence. our work expands on high-density population-wide genotype data from the international hapMap Project (hapMap) (international hapMap Consortium, 2005; Frazer et al., 2007), the human Genome Diversity Panel (hGDP) (rosenberg et al., 2002), and the Population reference sample (PoPres) (nelson et al., 2008) that have representation of Mexicans but not other hispanic/latino groups either from the Caribbean or from south America, with a resulting gap for analyzing admixture in those populations. This project, therefore, represents an important step toward comprehensive panels for U.s.-based studies that can more accurately reflect the diversity within various hispanic/latino populations. RESULTS Population Structure We applied the clustering algorithm FRAPPE to investigate genetic structure among hispanic/latino individuals using a merged data set with over 5,000 individuals with european, African, and native American ancestry genotyped across 73,901 snPs common to the Affymetrix 500K array and the illumina 610-Quad panel (Materials and Methods). FRAPPE implements a maximum likelihood method to infer the genetic ancestry of each individual, where the individuals are assumed to have originated from K ancestral clusters (Tang et al., 2005). The plots for K = 3 and K = 7 are shown in Fig. 8.1 and for all other values of K in Fig. s1 (available online at www.pnas.org/cgi/content/full/0914618107/DCsupplemen- tal) K = 3. We observed clustering largely by native American, African, and european ancestry, with the hispanic/latino populations showing genetic similarity with all of these populations. however, significant pop- ulation differences exist, with the Dominicans and Puerto ricans showing the highest levels of African ancestry (41.8% and 23.6% African, sDs 16% and 12%), whereas Mexicans and ecuadorians show the lowest levels of African ancestry (5.6% and 7.3% African, sDs 2% and 5%) and the highest native American ancestries (50.1% and 38.8% native American, sDs 13% and 10%). We also found extensive variation in european, native Ameri- can, and African ancestry among individuals within each population. A clear example could be observed in the Mexican sample, in which ancestry proportions ranged from predominantly native American to predomi-

OCR for page 147
Population Structure and Admixture Among Hispanic/Latino Populations /  Hispanic/ Native European African Latino American 1.0 0.8 0.6 K=3 0.4 0.2 0.0 1.0 0.8 K=7 0.6 0.4 0.2 0.0 Mexico Ecuador Colombia Puerto Rico Dom Republic Sardinian Tuscan S Europe Italian Adygei Basque SW Europe SE Europe French W Europe C Europe NW Europe NNE Europe Orcadian Russian Biaka Pygmy Mbuti Pygmy Bantu S. Africa Bantu Kenya Mandenka Yoruba S an Pima Nahua Maya Quechua Ay./Quechua Colombian Karitiana Surui FiGUre 8.1 FRAPPE clustering illustrating the admixed ancestry of hispanic/ latinos shown for K = 3 and K = 7. individuals are shown as vertical bars shaded in proportion to their estimated ancestry within each cluster. native American populations are listed in order geographically, from north to south. nantly european (with generally low levels of African ancestry). similar results were found in Colombians and ecuadorians, whereas Dominicans and Puerto ricans showed the greatest variation in the African ancestry (Fig. 8.1). interestingly, at K = 7, we were able to capture signals of conti- nental substructure such as a southwest to northeast gradient in europe and a native American component that is absent in the two Amazonian indigenous populations (Karitiana and surui) but that substantially con- tributes to all other studied latino populations. We also note that several of the individuals from the Maya and Quechua native American samples (and to a lesser extent nahua and Pima) from the human Genome Diver- sity Panel (CePh-hGDP) show moderate levels of european admixture, consistent with previous studies of these populations (Jakobsson et al., 2008). interestingly, this is not the case for the Aymara and Quechua samples genotyped by Mao et al. (2007). We also undertook principal component analysis (PCA) of the auto- somal genotype data from hispanic/latino and putative ancestral popu- lations using the smartpca program from the software package eigenstrat (Fig. 8.2A) (Patterson et al., 2006a). The first two principal components of the PCA strongly support the notion that the three ancestral populations contributing to the hispanic/latino genomic diversity correspond exactly to native American, european, and African ancestry. The hispanic/latino populations showed different profiles of ancestry, as exemplified by the fit- ting of ellipses to the covariance matrix of each population’s first two PCs (Fig. 8.2C). subsequent PCs showed substructure within Africa, native

OCR for page 147
 / Katarzyna Bryc et al. FiGUre 8.2 Principal component analysis results of the hispanic/latino indi- viduals with europeans, Africans, and native Americans. PC1 vs. PC2 scatterplots based on autosomal markers (Upper Left) and based on X chromosome markers (Upper Right). ellipses are fitted to the PCA results on the autosomes (Lower Left) and to results from the X chromosome markers (Lower Right).

OCR for page 147
Population Structure and Admixture Among Hispanic/Latino Populations /  Americans, and europeans (Fig. s2, available online at www.pnas.org/ cgi/content/full/0914618107/DCsupplemental). PCA on the X chromo- some markers (Fig. 8.2B) showed a similar pattern, although because there are only 1,500 markers, this PCA had greater variance, which is illustrated in the fitted ellipses as well (Fig. 8.2D). We also ran the Bayesian clustering algorithm STRUCTURE in “assign- ment mode” (Falush et al., 2003), and used a training set of europeans, Africans, and native Americans to estimate ancestral allele frequencies and assess admixture proportions within and among the hispanic/latino populations. Using STRUCTURE analysis of the autosomes (Fig. 8.3, Upper) and the X chromosome (Fig. 8.3, Lower), we found that, again, Puerto ricans and Dominicans showed the greatest proportion of African ancestry whereas Colombians, ecuadorians, and Mexicans showed exten- sive variation in european and native American ancestry among indi- viduals. We calculated lD decay curves for all populations with at least 1.0 0.8 Structure autosome ancestry 0.6 0.4 0.2 0.0 250 Chromosome 1 ancestry (Mb) 200 150 100 50 0 1.0 Structure X chromosome ancestry 0.8 0.6 0.4 0.2 0.0 Dominican Mexico Ecuador Colombia Puerto Republic Rico European ancestry African ancestry Native American ancestry FiGUre 8.3 Genome-wide and locus-specific ancestry estimates for Mexicans, ecuadorians, Colombians, Puerto ricans, and Dominicans. shown for K = 3, clustering of the hispanic/latino individuals on the autosomes (Top) and on the X chromosome (Bottom). individuals are shown as vertical bars shaded in propor- tion to their estimated ancestry within each cluster. local ancestry at each locus is shown for each individual on chromosome 1 (Middle). The X chromosome shows greater native American ancestry and greater variability in African ancestry, with reduced european ancestry.

OCR for page 147
 / Katarzyna Bryc et al. B A Ancestral populations Admixed populations African American Mexico Ecuador 0.4 0.4 Colombia Puerto Rico Dominican Republic Sardinian Italian 0.3 0.3 Adygei Linkage disequilibrium Linkage disequilibrium Basque French Orcadian Russian C Europe 0. 2 0. 2 NNE Europe NW Europe S Europe SE Europe SW Europe W Europe 0.1 0.1 Biaka Pygmy MbutiPygmy BantuKenya Mandenka Yoruba Nahua 0.0 0.0 Maya 0 20 40 60 80 0 20 40 60 80 Physical distance (kb) Physical distance (kb) FiGUre 8.4 linkage disequilibrium, genotype r2 estimated by PlinK, by popu- lation as a function of physical distance (Mb). (Left) native American, european, and African populations. (Right) hispanic/latino populations. scale is the same. 10 individuals, choosing subsets of 10 individuals, and averaging more than 100 random subsets of the data. Patterns of decay of lD were con- sistent with previously published results (Jakobsson et al., 2008), with native American populations showing the highest levels of lD and Afri- can populations the lowest (Fig. 8.4A). interestingly, the hispanic/latino populations demonstrated rates of decay of lD that correlated strongly with the amount of native American, european, and African ancestry (Fig. 8.4B). specifically, the populations with the most native American ancestry, Mexican and ecuadorian, exhibited higher levels of linkage disequilibrium among snP markers, whereas the populations with the highest proportions of African ancestry, the Dominican and Puerto rican samples, had the lowest levels of lD. Locus-Specific Ancestry To reconstruct local genomic ancestry at a fine scale, we used the ancestry deconvolution algorithm lAMP (sankararaman et al., 2008), allowing for a three-way admixture and focused on the four hispanic/ latino populations genotyped on the illumina 610-Quad platform— Dominicans, Colombians, Puerto ricans, and ecuadorians (Materials and Methods). Because this same snP panel had also been genotyped across the hGDP samples (1,043 individuals from 53 populations), the merged dataset containing more than 500,000 markers provided a unique resource for investigating the extent of subcontinental ancestry among diverse hispanic/latino populations. We found that individual average ances-

OCR for page 147
Population Structure and Admixture Among Hispanic/Latino Populations /  tries are in agreement with FRAPPE and STRUCTURE results in which ecuadorians have the highest native American proportions, followed by Colombians (showing greater european contribution), and with Puerto ricans and Dominicans showing the highest African ancestry—specially Dominicans, who show very low contribution from native Americans (Fig. 8.1). We also used the PCA-based methods of Bryc et al. (2010) to infer ancestry at each locus for the samples genotyped on the Affymetrix 500K, which included more than 100 Mexican samples genotyped by the PoPres project (nelson et al., 2008) and diverse native American popula- tions genotyped by Mao et al. (2007). The local admixture tracks for each individual are in large agreement with the genome-wide average ancestry proportions (Fig. 8.3, Middle). To investigate the genetic relationships among admixed hispanic/ latino populations and putative ancestral groups, we compared patterns of population divergence among the inferred segments of european, Afri- can, and native American ancestry and corresponding putative source populations using Wright’s FsT measure. specifically, we used lAMP to reconstruct for each individual in our dataset, segments of european, African, and native American ancestry across both the maximal snP data- set for all of the admixed and putative source population individuals (i.e., either the 610K illumina for Puerto rican, ecuadorian, Columbian, and Dominican or 500k for Mexicans from Guadalajara) as well as ~70k snPs common to both platforms. To calculate FsT at a given snP for a given pair of populations, we included only individuals with unambiguous ancestry assignment (i.e., individuals with two european-, two native American–, or two African-origin chromosomes). one potential confounder for this analysis is that sample sizes differ substantially among subpopulations within major continental regions (e.g., in the native American set, we have sample sizes that range from n = 7 for Colombian indigenous Americans in hGDP to n = 29 for nahua from Mexico in the Mao et al. dataset). To minimize the potential bias of differences in sample size, we randomly selected n = 7 individuals from all potential subpopulations and recom- puted Wright’s FsT. As seen in Table 8.1, we found that consistent with historical records, our results show that African segments of the hispanic/ latino populations are more closely related to the Bantu-speaking popula- tions of West Africa than other populations. specifically, we found that the Colombians and ecuadorians are most closely related to the Kenyan Bantu populations, whereas the Puerto ricans and Dominicans are closest to the yoruba from nigeria. likewise, european segments show the lowest FST values when compared with southwest european populations (individu- als from spain and Portugal), as well as French and italian individuals. native American segments of the hispanic/latino individuals show the least genetic differentiation with Mesoamerican (e.g., Maya and nahua),

OCR for page 147
 / Katarzyna Bryc et al. TABle 8.1 Ancestry-specific FsT Distances Between hispanic/latino Populations and Different Putative source Populations African segments of the Genome (%) Bantu Bantu Biaka Man- Mbuti yri Kenya s. Africa Pygmy denka Pygmy Col 3.191 3.375 6.520 3.677 11.217 3.263 DoM 1.564 1.476 4.657 1.419 8.877 0.913 eCU 6.098 6.883 10.143 6.400 14.702 6.481 Pri 2.500 2.543 5.761 2.384 10.216 2.176 european segments of the Genome (%) Adygei Basque european europe europe europe europe europe ese C nne nW s se Col 1.836 1.351 1.389 0.978 1.253 1.240 1.033 1.020 DoM 1.560 1.128 1.071 0.691 0.919 0.940 0.705 0.775 eCU 1.669 1.456 1.225 1.012 1.212 1.100 1.005 1.005 Pri 1.811 1.530 1.392 1.062 1.345 1.251 1.107 1.181 Mexico 1.014 0.784 0.559 0.335 0.438 0.442 0.193 0.307 native American segments of the Genome (%) Aymara Colom- Karitiana Maya nahua Pima Quechua surui bian Col 4.005 5.296 9.099 4.724 3.614 8.562 3.432 13.803 DoM 5.142 5.868 9.060 4.262 3.601 9.310 3.147 13.736 eCU 4.244 5.799 9.178 5.446 4.147 9.193 3.079 13.765 Pri 5.872 6.618 10.120 6.624 4.795 10.578 5.169 15.093 Mexico 2.397 4.185 8.197 1.417 0.572 5.112 2.086 11.061 noTe: results based on ~70k overlapping snPs between Affymetrix and illumina platforms and equalizing population sample sizes down to seven individuals per population.

OCR for page 147
Population Structure and Admixture Among Hispanic/Latino Populations /  europe europe French italian orcadian russian sardinian Tuscan sW W 0.863 1.080 0.880 0.885 1.410 1.648 1.550 1.050 0.537 0.730 0.613 0.610 1.093 1.413 1.270 0.825 0.838 1.104 0.799 0.845 1.417 1.369 1.607 0.925 0.916 1.155 0.940 0.879 1.508 1.820 1.566 1.041 0.122 0.265 0.270 0.271 0.793 0.882 0.852 0.336

OCR for page 147
 / Katarzyna Bryc et al. Chibchan (e.g., Colombian), and Andean (e.g., Quechua) populations. The closest relationship is clearly observed between Mexicans from Guadala - jara and nahua indigenous individuals. Sex Bias in Ancestry Contributions We used the STRUCTURE ancestry estimates on the autosomes and X chromosome to estimate native American, european, and African ances- try proportions of each hispanic/latino individual. We then compared the estimates of ancestry for each population on the autosomes vs. on the X chromosome [Fig. 8.5 and Figs. s3 and s4 (available online at www. pnas.org/cgi/content/full/0914618107/DCsupplemental)]. Whereas the native American ancestry was significantly higher on the X chromosome than on the autosomes (including those populations with reduced native American ancestry, i.e., Puerto ricans and Dominicans), the autosomal vs. X-chromosome difference was more attenuated with regard to African ancestry. This reduced deviation is present even in those hispanic/latino populations analyzed whose non-european ancestry was principally FiGUre 8.5 Boxplots comparing autosomal vs. X-chromosome ancestry propor- tions by population, shown for european ancestry (Left), native American ances- try (Center), and African ancestry (Right). Filled boxes correspond to autosomal ancestry estimates; open boxes show X-chromosome ancestry estimates. Median (solid line), first and third quartiles (box) and the minimum/maximum values, or to the smallest value within 1.5 times the iQr from the first quartile (whis- kers). For each paired comparison of X chromosomes and autosomes, median native American ancestries are consistently higher on the X chromosome in all hispanic/latino populations sampled, and european ancestries are lower across all populations.

OCR for page 147
Population Structure and Admixture Among Hispanic/Latino Populations /  native American in origin (i.e., Mexicans and ecuadorians). Furthermore, greater native American ancestry on the X chromosome in Puerto ricans did not necessarily imply greater Amerindian ancestry on the autosomes. This finding is similar to those observed by analyzing fine-scale genome pattern of population structure and admixture among African Americans, West Africans, and europeans (lind et al., 2007). Finally, we used snP and microsatellite genotyping to identify the canonical y chromosome and mtDnA haplotypes for each of the his- panic/latino individuals that we genotyped. Details of the loci and clas - sifications are found in Tables s1 and s2 (available online at www.pnas. org/cgi/content/full/0914618107/DCsupplemental). We found an excess of european y chromosome haplotypes and a higher proportion of native American and African mtDnA haplotypes, consistent with previous stud- ies (Fig. 8.6). in addition, we found several non-european y chromosomal haplotypes with most likely origins from north Africa and the Middle east. We observed that African-derived haplotypes were the predomi - nant origin of mtDnA in Dominicans (17 of 27 individuals), matching the greater African vs. native American origins of this population on the autosomes and X chromosomes. however, in Puerto ricans we did not find evidence of a high African female contribution. The predominant y chromosomal origins in the Puerto ricans sampled were european and B A FiGUre 8.6 Comparison of mtDnA and y chromosome haplotypes. each indi- vidual is represented by a point within the triangle that represents the autosomal ancestry proportions. The most probable continental location for each individual’s haplotype is designated by the shade of the point. The y chromosome contains a disproportionate number of european haplotypes, whereas the mtDnA has a high proportion of native American, slightly more African haplotypes, and fewer european haplotypes, consistent with a sex bias toward a great european male and native American/African female ancestry in the hispanic/latinos.

OCR for page 147
0 / Katarzyna Bryc et al. African; but, in contrast, 20 of 27 Puerto rican individuals had mitochon- drial haplotypes of native American origin, suggesting a strong female native American and male european and African sex bias contribution. overall, in all of the hispanic/latino populations that we analyzed, we found evidence of greater european ancestry on the y chromosome and higher native American ancestry on the mtDnA and X chromosome consistent with previous findings (Dipierri et al., 1998; Green et al., 2000; sans et al., 2002; Carvajal-Carmona et al., 2003; González-Andrade et al., 2007; Marrero et al., 2007; Mendizabal et al., 2008). DISCUSSION our work has important implications for understanding the popula - tion genetic history of latin America as well as ancestry of U.s.-based hispanic/latino populations. As has been previously documented, we found large variation in the proportions of european, African, and native American ancestry among Mexicans, Puerto ricans, Dominicans, ecuadorians, and Colombians, but also within each of these groups. These trends are a consequence of variation in rates of migration from ancestral european and African source populations as well as population density native Americans in pre-Columbian times (sans, 2000). We found that Dominicans and Puerto ricans in our study showed the highest levels of African ancestry, consistent with historical records. european settlers to island nations in the Caribbean basin largely displaced native American populations by the early to mid-16th century and concurrently imported large African slave populations for large-scale colonial agricultural pro - duction (largely of sugar). in contrast, Colombia has wider geographic differences ranging from Caribbean coasts to Andean valleys and moun- tains, which could explain the enrichment of African ancestry in some individuals and not in others, likely representing the differences in origin within Colombia. Finally, Mexico and ecuador are two continental coun- tries that had high densities of native Americans during pre-Columbian times; as expected, the individuals from these two countries show the highest degree of native American ancestry. our findings clearly show that the involuntary migration of Africans through the slave trade appears to have left a clear trace in hispanic/latino populations proximal to these routes. From the FST analysis, we found that the high-density genotype data that we have collected is quite informative regarding the personal genetic ancestry of admixed hispanic/latino individuals. specifically, we found that individuals differ dramatically within and among populations and that we can reliably identify subpopulations within major geographic regions (i.e., europe, Africa, and the Americas) that exhibit lower pairwise

OCR for page 147
Population Structure and Admixture Among Hispanic/Latino Populations /  FST (and, therefore, higher genetic similarity) to the inferred european, African, and native American segments for the 212 individuals studied. We found, for example, that nahua showed the lowest FST in Mexicans, consistent with the observation that the nahua are one of the largest native American populations in this region and are likely to have con- tributed to the genomes of admixed individuals in Mexico (as opposed, for instance, to the Mexican Pima who fall outside the Mesoamerican cultural region and show considerably higher levels of differentiation). We also found that the lowest FST for the African regions of the Dominican and Puerto rican genomes are with the yoruba, a Bantu-speaking West African population that has been shown to be genetically similar to the African segments of African Americans sampled in the United states (Bryc et al., 2010). Although we have limited native American popula- tions and hispanic/latino sample sizes and, thus, the differences in FST with different subcontinental populations suggest that there exists a rea- sonably strong signal of which present-day populations are most closely related to the ancestral populations that contributed ancestry to each of the hispanic/latino populations. When comparing inferred continental ancestry of the X and y chro- mosomes and mitochondrial vs. the autosomal genome, we observed an enrichment of european y-chromosome vs. autosomal genetic material, and a greater percentage of both native American and African ancestry on the X-chromosomes and mtDnA compared with the autosomes for the hispanic/latino individuals in this study. This suggests a predominance of european males and native American/African females in the ancestral genetic pool of latinos, consistent with previous studies. A particularly interesting observation from our work on sex-biased admixture is that the pattern exists not only within populations but among hispanic/latino populations as well. in all populations studied, there is an enrichment of native American ancestry both on the X chromosome and mtDnA com- pared with the autosomes. This would suggest a greater female native American contribution to the genome of latinos. A different result was obtained in relation to African ancestry. We found a smaller difference between mean African ancestry on the X chromosome and the autosomes, compared with the difference in native American ancestry. Furthermore, unlike in native American ancestry, we found an overwhelming repre- sentation of native American mtDnA haplogroups in Puerto ricans, even though non-european ancestry on the autosomes was largely African. it is important to note that this observation does not necessarily undermine the model of sex-biased admixture among european male and African females in the founding of hispanic/latino populations, espe- cially when one considers the predominance of european y chromosomes in all groups studied. however, it suggests that admixture between euro-

OCR for page 147
 / Katarzyna Bryc et al. pean males and Amerindian/African females has been a complex process in the formation of the various hispanic/latino populations. specifically, a reduced X vs. autosome mean African ancestry compared with native American ancestry suggests a more balanced gender contribution in the hispanic/latino genome by individuals of African ancestry. in the case of Puerto ricans, the only way that one can reconcile greater African ances- try on the X chromosome vs. what would be expected on mitochondrial data would be through transmission of X chromosomes independent of mitochondrial transmission, which is plausible biologically only via males. Caution, however, should be exercised before considering such conclu- sions as concrete; unlike X chromosomes, which can recombine and thus represent haplotypes derived from thousands of individuals, mitochon - drial DnA represents just a sole distant ancestor among these thousands. Thus, a larger mtDnA sample would be necessary compared with X chro- mosomes to have similar confidence that a cohort would accurately reflect the presumed diversity of ancestry in the population as a whole. The y chromosomal results also demonstrate the insufficiency of the paradigm of european males and native American/African females to capture the complexity within the latin American populations. For exam- ple, we find y chromosomal haplotypes in hispanic/latinos with pre- sumed origins in the Middle east and northern Africa. Given that histori- cal documentation suggests that most of the non-African and non–native American contribution to admixed hispanic/latino populations is from southwest europe, this suggests that the contemporary populations inher- ited these y chromosomes from europeans who, in turn, were descended from Middle eastern or north African men. several historical events could have led to the acquisition by europeans of non-european haplotypes, perhaps during the period of the roman empire when the Mediterranean sea behaved as a conduit (not a physical barrier) between europe, the Middle east, and north Africa or by sephardic Jews or Moorish Muslims during the european Middle Ages/islamic Golden Age. Alternatively, the presence of non-european y chromosomal haplotypes originating from the Middle east and north Africa could represent the result of iberian Jews and Muslims (themselves admixed) fleeing the peninsula for new World territories in response to discriminatory policies that strongly pressured both communities at the termination of the reconquista. essentially, the diversity of haplotypes in the y chromosomes in latinos reflects not only population dynamics from the 15th century onward, but also the histori - cal trends of population movement occurring across the Atlantic during centuries prior. The marked genetic heterogeneity of latino populations shown in this study, as previously suggested by other surveys of genetic ancestry (Mao et al., 2007; Price et al., 2007; s. Wang et al., 2008) has important

OCR for page 147
Population Structure and Admixture Among Hispanic/Latino Populations /  implications for the identification of disease-associated variants that dif - fer markedly in frequency among parental populations. in their study of 13 Mestizo populations from latin America, for example, s. Wang et al. (2008) suggested that admixture mapping in hispanic/latino populations may be feasible within a two-population admixture framework, since the mean African ancestry in Mestizo populations is typically low (<10%) (s. Wang et al., 2008). Although this is true for hispanic/latino popula- tions with origins in the continental landmass of the Americas (such as the populations studied by s. Wang et al.), our results show that this may not apply to latino populations with origins in the Caribbean, as their African ancestry proportion is considerably higher and is highly variable among individuals, suggesting an extensive three-way admixture and representing additional challenges for admixture mapping. likewise, we find subtle but reproducible differences in subcontinental ancestry among hispanic/latino individuals, suggesting that even a three-way admixture model may not be sufficient to accurately model the dynamic population genetic history of these populations. Another observation with important implications for designing asso - ciation studies is the large variation in individual admixture estimates within certain latino populations (e.g., Mexicans, Colombians, and ecua- dorians). one could expect such outcome when collecting samples from U.s.-based latino communities, which in turn may come from different locations within their countries of origin (e.g., Colombians and ecuador- ians). however, within the Mexican sample, which has been collected in a single sampling location (i.e., Guadalajara, Mexico), we also observed large variation in european vs. native American admixture proportions. our findings are in agreement with previous studies on genetic ancestry from Mexico City (Martinez-Marignac et al., 2007; s. Wang et al., 2008), supporting the idea that such urban agglomerations, in which a large number of epidemiological studies are likely to take place, continue to host a wide range of genetic variability among individuals that may self- identify as individuals from the same population. Therefore, particular attention should be paid to carefully matching representative cases and controls, as well as to carefully control for ancestry when performing association studies using hispanic/latino populations. We hope that our dense genome-wide admixture analysis has allowed greater insight into the population dynamics of multiple hispanic/latino populations and that it will provide a resource for designing next-generation epide- miological studies in these communities, opening the possibility of better understanding the genetic makeup of this growing segment of the U.s. population.

OCR for page 147
 / Katarzyna Bryc et al. MATERIALS AND METHODS Datasets We genotyped 100 individuals with ancestry from Puerto rico, the Dominican republic, ecuador, and Colombia on illumina 610K arrays. We extracted 400 european, 365 African American, and 112 Mexican samples from the GlaxosmithKline PoPres project, which is a resource of nearly 6,000 control individuals from north America, europe, and Asia geno- typed on the Affymetrix GeneChip 500K Array set (nelson et al., 2008). We randomly sampled 15 individuals from each european country where possible, or the maximum number of individuals available otherwise, to select the PoPres european individuals to be included in our study. Further description of sampling locations, genotyping, and data quality control are available elsewhere (nelson et al., 2008). We include 165 and 167 individuals from the hapMap project from the CeU and yri popula- tions, thinned to the same snP set (Frazer et al., 2007). We also include all european, native American, and African individuals from the hGDP genotyped on illumina 610K arrays (Jakobsson et al., 2008). Finally, we include all native American populations from the Mao et al. (2007) study genotyped on Affymetrix 500K arrays. For each dataset, we used annota - tion information to determine the strand on which the data were given and to map all Affymetrix and illumina marker ids to corresponding dbsnP reference ids [rsids]. snPs without valid rsids were excluded from analy- sis. each dataset was then converted to the forward strand to facilitate merging of the data. Data from the various platforms were merged using the PlinK toolset, version 1.06 (Purcell et al., 2007). likewise, nonmissing genotype calls that showed disagreement between datasets were omitted. Demographic data for all individuals included in this study are available on GenBank. All samples were approved by institutional review board protocols from their respective studies. Data Quality Control The hapMap ii release 23, hGDP, Mao et al., and PoPres samples were genotyped and called according to their respective quality control procedures (Frazer et al., 2007; Mao et al., 2007; Jakobsson et al., 2008; nelson et al., 2008). our final merged dataset contains 73,901 snPs with genotype missingness of <0.1 and <0.05 individual missingness across 5,104 individuals.

OCR for page 147
Population Structure and Admixture Among Hispanic/Latino Populations /  Population Structure We used the software FRAPPE, which implements an expectation- maximization algorithm for estimating individual membership in clusters (Tang et al., 2005). This algorithm is more computationally efficient than other MCMC methods, allowing it to analyze many more markers than, for example, STRUCTURE (Falush et al., 2003; Tang et al., 2005). After thinning markers to have r2 < 0.5 in 50 snP windows, shifting and recal- culating every 5 snPs, we ran FRAPPE on all 64,935 remaining markers for 5,000 iterations. We also assessed admixture proportions for the hispanic/ latino individuals using STRUCTURE on a reduced dataset of 5,440 mark- ers after thinning for MAF > 0.2 and with a minimum separation of 400 kb between markers. We used the F model with UsePoPinFo = 1 to update allele frequencies using only the ancestral individuals, with 5,000 burn-in and 5,000 iterations (Falush et al., 2003). We also used all 1,518 snPs on the X chromosome for the same analysis of the X chromosome ancestry. Principal component analysis was conducted using a dataset thinned to have r2 < 0.8 in 50 snP windows, leaving 69,212 snPs for analysis using the package smartpca from the software eigenstrat. ellipses were fitted fol- lowing the means and 1 sD of the variance–covariance matrix of the PC1 and PC2 scores of each population. For local ancestry estimation, we used the software lAMP in lAM- PAnC mode providing allele frequencies for the hGDP West Africans, europeans, and native Americans as ancestral populations (sankararaman et al., 2008). A total of 552,025 snPs were included in the analysis, and configuration parameters were set as follows: mixture proportions (alpha) = 0.2, 0.4, 0.4; number of generations since admixture (g) = 20; recombina- tion rate (r) = 1e−8; fraction of overlap between adjacent windows (off- set) = 0.2; and r2 threshold (ldcutoff) = 0.1. local ancestry estimation for the Mexican individuals was performed using the two-way PCA-based method described in Bryc et al. (2010) for both the full illumina 610K and the Affymetrix 500K datasets, in 10 snP windows. only native Americans with <0.01 european ancestry (as estimated from FRAPPE results) were used as the ancestral native American individuals within their respective datasets. FsT was calculated between native American, european, and African regions of the hispanic/latino individuals and the respective con- tinental populations using a C++ implementation of Weir and Cockerham’s (1984) FST weighed equations as previously published. To eliminate bias in estimation of FST due to european ancestry shown in some of the native Americans, we also removed regions showing european ancestry within any of the native Americans showing >0.01 european ancestry, using the same local ancestry estimation procedure as described for the Mexican individuals. Furthermore, to avoid any potentially confounding effect of sample size, we used a random sample of 7 (the minimum sample size of

OCR for page 147
 / Katarzyna Bryc et al. the native American populations) individuals per non-hispanic/latino population to calculate pairwise FST. MAF was set at a threshold >0.1 in the populations compared by FST calculations. ACKNOWLEDGMENTS We thank Mariano rey for support of the project; Peter Gregersen, Carole oddoux, and Annette lee for technical assistance; and Marc Pybus for valuable programming support during part of the analyses. This work was supported by the national institutes of health (Grant 1r01GM83606) as part of the national institute of General Medical sciences research fund- ing programs. Genotype data from 100 hispanic/latinos have been depos- ited in the Gene expression omnibus (Geo) series record Gse21248.