terns of genetic divergence between species. Population geneticists have asked, What are the important forces that determine the amount and nature of genetic variation in populations, the spatial distribution of this variation, the distribution of variation across the genome, and the evolutionary changes that occur over short and long timescales? The process that has shaped this variation within and between species is a complex one involving a complex genome and a complex, spatially and temporally varying environment. It is certain that stochasticity is an important aspect of the process. The rapidly growing database of DNA polymorphism and divergence studies from a variety of organisms, including humans and other primates, provide an exciting opportunity to learn about the evolutionary history of populations and the evolutionary processes that have resulted in the patterns of variation that we observe in extant populations. The difficulty is that even very simple models of this process lead to challenging mathematical problems.
Some examples of current approaches and the mathematical challenges facing us are described here. To be concrete and to avoid an overly vague description of the problems, a very specific population genetic model of sequence evolution will be described. The particular model, the Wright-Fisher model, has a long and rich history, but it is not necessarily the most realistic or tractable for every purpose, and it is only one of many models that might have been considered here.
The Wright-Fisher model assumes discrete generations (as opposed to a model with distinct age classes and overlapping generations, which would be more realistic for some populations, including humans). The focus is on a particular segment of the genome, referred to as a gene, and it is first assumed that no recombination or mutation occurs. To begin, it is assumed as well that population size (N) is constant and that there is no spatial structure. A haploid model is also assumed, which means that each individual carries just one copy of the gene. (Humans are in fact diploid, which means that each individual carries two copies of each gene, a maternal and a paternal copy.)
In the Wright-Fisher model, successive generations are produced as follows. Each of the N individuals of the offspring generation is produced by replicating, without error, the gene sequence of a randomly drawn individual of the parental generation. Each offspring individual is assumed to be generated independently from the parental population in this manner. The number of offspring of any particular individual of the parent generation is thus a random variable, being the result of N independent Bernoulli trials, with probability of success equal to 1/N. In large populations, the number of offspring of any individual would approximately follow a Poisson distribution with mean 1. If it is supposed that the parents do not all have identical gene sequences, then their distinct