FIGURE 3. The effects of sampling bias on phylogenetic reconstruction. The tree on the left shows a hypothetical population of 16 isolates that each differ from their ancestor by one unique mutation. The four trees to the right show the original tree overlaid with the tree that would result from sampling only half of the total population. The tree constructed of sampled sequences is shown in black, with the terminal branches as thicker lines. Clumped sampling causes a decrease in the total genetic variation sampled. The mutations not captured in the sample would have been assigned only to internal branches, as shown by the symbol X. As a result, the proportion of mutations assigned to the internal and terminal branches changes with sampling dispersion, but not at the same rate (shown in the line plot at the bottom). Without knowledge of where a sample lies on such a continuum, there is no way to derive the expected proportion of mutations that should be assigned to the terminal and internal branches of a phylogenetic tree.
mutations assigned to the terminal branches of the eight-isolate trees was greatly influenced by the degree to which the sampled isolates were dispersed or clustered. The dispersed sample shows a 27.3% excess of mutations assigned to the terminal branches, the clumped sample has a 13% deficit.
The magnitude of the excess or deficit depends not only on the degree of dispersion, but also on the proportion of the total population sampled.