gene sequences are known as haplotypes. Given the frequencies of the different haplotypes in the parental generation, the numbers of the different haplotypes in the offspring generation will be multinomially distributed. Regardless of how much variation existed in the founding population, the population under this model will eventually arrive at a state in which every individual carries the same sequence. This process of random change in the frequencies of the different haplotypes is referred to as genetic drift, and it eventually results in the population becoming monomorphic.
Next, mutation is introduced into the model. Let it be supposed that the replication process that generates an offspring copy of the gene from its parent has some error rate, so that each offspring differs from its parent at a Poisson-distributed number of sites in the gene sequence. If this model is run for many generations, the pattern of genetic variation asymptotically approaches a stationary distribution resulting from a stochastic balance between mutation, which generates variation in the population, and genetic drift, which tends to eliminate variation. Many properties of this stationary distribution are known. Also, many properties of samples drawn from this stationary distribution are known. In the model as it has been defined here, all individuals are in some sense equivalent. For example, all individuals have the same distribution of offspring number with expectation equal to 1. All the genetic variation is said to be selectively neutral, and the model is referred to as a neutral model. In generalizations of this model, some sequence variants may have a systematic tendency to produce more offspring than others, and the frequency of such variants will tend to increase. These are models of evolution with natural selection.
The Wright-Fisher neutral model is a particular case of a more general class of neutral models in which all parents are equivalent; these are referred to as “exchangeable models.” In these models, the distribution of offspring number need not be Poissonian. In the limit of large populations and a low mutation rate, the models’ stationary properties depend on a single compound parameter, Nu/v, where u is the mutation rate and v is the variance of offspring number. Despite the simplicity of this model, in which there is no selection, no geographic structure, no variation in population size, and no recombination, the probabilities of sample configurations of sequences under this model are difficult to calculate. Strobeck (1983) first described recursions for these probabilities for the case where only two or three haplotypes are present in a sample. The difficulty of obtaining sample configuration probabilities led to the use of summaries of the data, with an inevitable loss of information. Only in the last 10 years have full likelihood approaches been developed. Griffiths and Tavaré (1994a, 1995) were the first to find a practical method to esti-