In the example in Fig. 3, 50% of the total population was sampled. If we were to increase the size of the total population from 16 to 64 and again sample only eight isolates, we would be sampling 12.5% of the total population. This is close to the percent of isolates (7%) that we sequenced based on results from HI tests. Sampling eight dispersed isolates of 64 results in a 47.5% excess of mutations on the terminal branches, a much greater excess than the 27.3% shown in Fig. 3. Thus, even though we do not know the actual distribution of genetic variation present in nature during the time span included in our study, and therefore do not know exactly how we sampled that variation, the magnitude of excess mutations assigned to the terminal branches of the tree in Fig. 1 is consistent with our sampling bias: we have sampled only a fraction of circulating viral strains and have done so in a consciously dispersed manner.
We found evidence suggesting that approximately 59 nonsilent substitutions assigned to the terminal branches of the HA tree in Fig. 1 were caused by HM mutations occurring in the set of 22 codons known to undergo HM mutation in chicken eggs in the laboratory. We have no way of identifying which 59 particular substitutions were HM except that they are among the 105 nonsilent substitutions assigned to branches attaching sequences of egg-cultured isolates to the tree. We found no evidence to suggest that HM mutations are occurring at the other 307 codons in the HA1. The majority of the excess mutations that were assigned to terminal branches of the HA tree are most likely simply the result of sampling bias. Detailed antigenic and genetic analysis of viruses collected during influenza surveillance is purposefully biased toward sequencing antigenically dissimilar strains in an effort to identify new antigenic and genetic variants that may signal the need to update the vaccine. Thus, viral isolates that are antigenically very similar to the predominant antigenic variant that circulates during a particular influenza season are sequenced less often than are antigenically variant strains.
The 59 apparently HM mutations represent 7.9% of the 745 nonsilent substitutions that occurred over the time period sampled. Thus, there is good reason for concern about HM mutations if one wants to draw inferences about evolution from this or any similarly affected data set. Culture in live cells is necessary for the propagation not only of viruses, but for many bacteria, such as the obligately intracellular rickettsial and chlamydial bacteria, as well. Laboratories involved in influenza surveillance have long been attuned to the presence of HM mutations. However, people obtaining influenza sequences from public databases might not suspect that the sequences could contain laboratory artifacts. In our previous