Scholars have studied language relationships within a cladistic framework since at least the early 19th century (Atkinson and Gray, 2005), and given the parallels in linguistic and genetic change over time, it is not unreasonable to use linguistic affiliations as a way of grouping individuals for genetic study. Several studies have demonstrated a correlation between linguistic and genetic variation, including cases in Europe (Cavalli-Sforza and Feldman, 1981; Piazza et al., 1995), Asia (Karafet et al., 2001), the Pacific (Merriwether et al., 1999; Robledo et al., 2003; Scheinfeldt et al., 2006; Friedlaender et al., 2008), and the Americas (Smith et al., 2000; Malhi et al., 2001; Eshleman et al., 2004; Wang et al., 2007). The main difficulty in these studies lies in the interpretation of linguistic similarities among populations. Whereas language sharing obviously results from some degree of contact among peoples, the horizontal transmission of language can occur with little to no genetic exchange. Likewise, there can be genetic exchange with little or no linguistic exchange. Therefore, the degree of correlation between genetic and linguistic variation varies depending on the populations being studied.
Studies of genetic variation within Africa, as mentioned above, have found extensive amounts of genetic variation relative to non-Africans owing to the fact that the “out of Africa” bottleneck significantly reduced genetic variation in non-Africans; however, most genetic studies of African populations are limited by the number of population samples included. More recent work has improved the understanding of genetic variation in Africa with a survey of genome-wide genetic variation in geographically and ethnically diverse African samples (Tishkoff et al., 2009). Tishkoff et al. (2008) analyzed 1,327 genome-wide autosomal microsatellite and insertion/deletion polymorphisms in 121 African population samples and a comparative sample of 1,394 non-Africans. The authors (Tishkoff et al., 2009) studied population structure and relationships using the program STRUCTURE (Pritchard et al., 2000), among other phylogenetic analyses. The STRUCTURE program uses a model-based Bayesian clustering approach to identify genetic subpopulations and assign individuals probabilistically to these subpopulations on the basis of their genotypes, while simultaneously estimating ancestral population allele frequencies. The program STRUCTURE places individuals into K clusters, where K is chosen in advance and is varied across independent runs, and individuals can have membership in multiple clusters (Pritchard et al., 2000). Tishkoff et al. (2009) inferred 14 ancestral population clusters globally as well as within Africa and found that the African samples cluster geographically as well as linguistically and ethnically (Table 5.1). In addition to the STRUCTURE analysis, the authors (Tishkoff et al., 2009) constructed a