within the Niger-Kordofanian and Nilo-Saharan language families after correction for geographic distances. To further explore the relationship among genetic and linguistic variation in Africa, we used the published dataset of genome-wide data from Tishkoff et al. (2009) that includes 103 population samples (n ≥ 10) that speak languages representing all four African language families. We first performed a Mantel test to determine to what extent genetic and linguistic distances are correlated within language families. Not surprisingly, all three tests showed that linguistic and genetic distances were significantly correlated (with 100,000 permutations): Niger-Kordofanian, r = 0.32, P = 9.99−6; Nilo-Saharan, r = 0.29, P = 9.99−6; and Afroasiatic, r = 0.27, P = 9.99−6 (the linguistic relationships among the Khoesan speakers are not clearly understood and therefore did not permit the construction of a linguistic distance matrix needed to perform a Mantel test); and the correlation coefficient is >25% in all three tests.
Because we and others (Tishkoff et al., 2009) have established a significant correlation between linguistic affiliation and genetic variation within three of the African language families, we wanted to explore to what degree samples plotted by genetic distance cluster by language family. We used multidimensional scaling (MDS) to construct a two-dimensional plot of a pairwise genetic distance matrix taken from the above-mentioned 103 population samples (Tishkoff et al., 2009). Consistent with the mtDNA and NRY studies discussed above (Wood et al., 2005; Hassan et al., 2008), our genome-wide analysis of microsatellite data shows that populations generally cluster on the basis of both geographic region and linguistic classification. Fig. 5.3 demonstrates that populations generally separate by linguistic affiliation along dimension 1. Dimension 2 separates the SAK speakers from all other Africans including the eastern Khoesan speakers, the Hadza and Sandawe, that cluster closely with other eastern Africans. Another interesting pattern that emerges in the MDS plot that is consistent with previous work (Tishkoff et al., 2009) is the clustering of the Afroasiatic Chadic speakers with the Nilo-Saharan speakers, which may reflect a past language shift (Tishkoff et al., 2009).
Because the distribution of language families in Africa roughly follows a geographic distribution (Fig. 5.1), we also performed MDS within geographic regions that include at least three language families. In central Africa (Fig. 5.4), the samples cluster by language family with a few notable exceptions. For example, the Fulani who are nomadic pastoralists that speak a Niger-Kordofanian language and reside across central and western Africa do not cluster with other Niger-Kordofanian-speaking populations. Moreover, the Fulani are distinguished from other African samples at K = 14 in Tishkoff et al.’s (2009) STRUCTURE analysis. Morphological analyses of the Fulani have been interpreted to suggest a Middle Eastern