. "8 Genome-wide Patterns of Population Structure and Admixture Among Hispanic/Latino Populations--Katarzyna Bryc, Christopher Velez, Tatiana Karafet, Andres Moreno-Estrada, Andy Reynolds, Adam Auton, Michael Hammer, Carlos D. Bustamante, and Harry Ostrer." In the Light of Evolution IV: The Human Condition. Washington, DC: The National Academies Press, 2010.
The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
In the Light of Evolution Volume IV: The Human Condition
We used the software FRAPPE, which implements an expectation-maximization algorithm for estimating individual membership in clusters (Tang et al., 2005). This algorithm is more computationally efficient than other MCMC methods, allowing it to analyze many more markers than, for example, STRUCTURE (Falush et al., 2003; Tang et al., 2005). After thinning markers to have r2 < 0.5 in 50 SNP windows, shifting and recalculating every 5 SNPs, we ran FRAPPE on all 64,935 remaining markers for 5,000 iterations. We also assessed admixture proportions for the Hispanic/Latino individuals using STRUCTURE on a reduced dataset of 5,440 markers after thinning for MAF > 0.2 and with a minimum separation of 400 kb between markers. We used the F model with USEPOPINFO = 1 to update allele frequencies using only the ancestral individuals, with 5,000 burn-in and 5,000 iterations (Falush et al., 2003). We also used all 1,518 SNPs on the X chromosome for the same analysis of the X chromosome ancestry. Principal component analysis was conducted using a dataset thinned to have r2 < 0.8 in 50 SNP windows, leaving 69,212 SNPs for analysis using the package smartpca from the software eigenstrat. Ellipses were fitted following the means and 1 SD of the variance–covariance matrix of the PC1 and PC2 scores of each population.
For local ancestry estimation, we used the software LAMP in LAMPANC mode providing allele frequencies for the HGDP West Africans, Europeans, and Native Americans as ancestral populations (Sankararaman et al., 2008). A total of 552,025 SNPs were included in the analysis, and configuration parameters were set as follows: mixture proportions (alpha) = 0.2, 0.4, 0.4; number of generations since admixture (g) = 20; recombination rate (r) = 1e–8; fraction of overlap between adjacent windows (offset) = 0.2; and r2 threshold (ldcutoff) = 0.1. Local ancestry estimation for the Mexican individuals was performed using the two-way PCA-based method described in Bryc et al. (2010) for both the full Illumina 610K and the Affymetrix 500K datasets, in 10 SNP windows. Only Native Americans with <0.01 European ancestry (as estimated from FRAPPE results) were used as the ancestral Native American individuals within their respective datasets. FST was calculated between Native American, European, and African regions of the Hispanic/Latino individuals and the respective continental populations using a C++ implementation of Weir and Cockerham’s (1984) FST weighed equations as previously published. To eliminate bias in estimation of FST due to European ancestry shown in some of the Native Americans, we also removed regions showing European ancestry within any of the Native Americans showing >0.01 European ancestry, using the same local ancestry estimation procedure as described for the Mexican individuals. Furthermore, to avoid any potentially confounding effect of sample size, we used a random sample of 7 (the minimum sample size of