Hudson et al. (13) revealed no significant departures from equilibrium neutral models. There were, however, several surprising and interesting aspects of the sequence variation that suggested the recent action of natural selection. First, among the 22 Fast alleles sequenced, it was found that 9 were identical in sequence for the entire 1410 bp long region examined. This sequence was designated the Fast A haplotype. Second, all 19 Slow alleles were identical to each other and differed from the Fast A haplotype by a single nucleotide, the nucleotide that accounts for the amino acid difference between the Fast and Slow forms of the enzyme. This site will be referred to as the Fast/Slow site. Thus, the Slow/Fast polymorphism is clearly not an old balanced polymorphism; in fact, the Slow allele is obviously very recently derived from the Fast A haplotype. Summarizing, we find that about half of a random sample of sequences at this locus would consist of sequences that are nearly identical to each other, and the other half of the sample would be much more heterogeneous, differing from each other at roughly 20 sites of 1,410. This pattern of variation was demonstrated to be highly incompatible with an equilibrium neutral model (11).
Our working hypothesis is that a rare variant (perhaps a new mutation) has recently and rapidly increased in frequency to around 50%. As it increased in frequency, the haplotype in which it was embedded was pulled up in frequency at the same time. Although selection on the Fast/Slow site might have driven the Slow allele to its present frequency, such selection by itself cannot account for the observed high frequency of the Fast A haplotype. Thus, selection on some other site would appear to be involved. It should be noted that the putative polymorphic site upon which selection acts is not necessarily in the region sequenced, but must be tightly linked to it.
Such a selective event, whereby a rare variant is driven to intermediate frequency, could potentially affect a large region of DNA. (We will refer to such an event as a partial selective sweep.) Calculations of Kaplan et al. (14) suggest that a selection coefficient equal to 0.01 can sweep away variation at sites up to 10,000 bp from the site of selection (assuming rates of recombination that are typically observed in D. melanogaster). Fig. 2 shows the patterns of variation to be expected before, immediately after, and some period after, such a partial selective sweep. Immediately after the rise in frequency of the previously rare variant, all chromosomes bearing the selected variant will be identical across essentially the whole region, which was swept along with the selected variant. As time progresses, the “selected haplotype” (the haplotype in which the selected variant arose), will slowly be broken up by recombination events. Eventually, after much time has passed and further mutations accumulate, if the variation at the selected site is maintained by balancing polymorphism, a peak of linked variation should emerge. This pattern would emerge at a point in time much later than that shown in Fig. 2.
From the Sod data of Hudson et al. (11), we know that the region partially swept of variation is bigger than 1,410 bp. In addition, it appears that some recombination has occurred between the sequences since the putative partial selective sweep.
To further investigate this putative selective history, additional lines were sequenced at the Sod locus and at three tightly linked regions. We were particularly interested in assessing the size of the region that had been swept along with the selected site and to assess the amount of recombination and mutation that has occurred since the partial selective sweep. With this additional information, inferences can be made about the strength of selection and the time since the partial sweep occurred. Details of this survey will appear elsewhere, but we will summarize the preliminary results here.
In this study, 15 lines of D. melanogaster from El Rio vineyard (Lockeford, San Joaquin County, California) and the Canton S strain of D. melanogaster were sequenced at the Sod locus and three neighboring regions. [The Sod locus of six of these lines, designated here 112, 565F, 581F, 255S, 510S, and 438S, were also sequenced in the earlier study (11).] The three neighboring regions, denoted 2021, 6Kbr3r, and 1819, are located approximately 12.7 kb upstream of Sod, 3.7 kb downstream of Sod, and 19.2 kb downstream of Sod, respectively. Fig. 3 shows the locations and sizes of each of these regions.
The polymorphic sites in this sample are indicated in Fig. 4. It is important to note that the Sod locus shows a similar pattern of variation to that observed in the earlier study. The four Slow alleles sequences are not, however, identical in this sample, but consist of two sequences identical to the Slow alleles found earlier and two other sequences that each differ from the other Slow allele sequences at a single site. Of 12 Fast alleles, 5 are the Fast A sequence, 2 more differ by a single site from Fast A, 2 others differ by 3 sites from Fast A, and the 3 others differ considerably from Fast A. Two Fast alleles in