analyses of these data (Bush et al., 1999a, b) we dealt with this problem by removing all mutations assigned to terminal branches (70% of the total) from our analyses. Our results indicate that HM mutations are confined to the 22 HM codons, thus, we could take a less drastic approach in the future. For instance, we could assign missing data codes to the HM codons when sequences are obtained from egg-cultured isolates.
We have shown that the excess mutations that remain on the terminal branches after accounting for HM mutations is of a magnitude consistent with expectations given our sampling protocol. Despite our bias toward dispersed sampling, examination of Fig. 1 shows that our data set does contain a number of closely related isolates. To get an idea of how sensitive the calculation of percent excess mutations on the terminal branches is to the degree to which we sampled in a dispersed manner as opposed to clumped manner, we removed 10 of 357, or 2.8%, of the most genetically divergent isolates from our original data set, and constructed a new tree (not shown). The excess of replacements on the terminal branches was reduced from 40% to 32%. Removing 38, or 10.6%, of the most genetically divergent isolates reduced the excess to 28%. Thus the presence of even small numbers of genetically divergent isolates accounts for much of the excess of mutations assigned to the terminal branches of the HA tree.
Unlike the excess mutations on terminal branches caused by sampling bias, the excess caused by HM change could cause problems in studies of how influenza viruses evolve as they replicate in their human hosts. For instance, in our previous work identifying codons under positive selection, we examined the ratio of nonsilent to silent substitutions (Bush et al., 1999a). If an isolate was sequenced shortly after a new amino acid replacement became fixed in a laboratory culture, sequencing viruses from that culture might fail to show silent substitutions that also had occurred during passage but that had been lost in the selective sweep. After fixation of the HM nonsilent substitution, silent substitutions once again would begin to accumulate. Because we do not know the exact circumstances under which HM mutations occurred in our data set relative to the time at which particular isolates were sequenced, we cannot make any predictions about the relative frequencies of nonsilent or silent substitutions in the HM codons as compared with the non-HM codons. However, we can examine the frequencies of nonsilent and silent substitutions in the two codon sets to learn more about how HM codons differ from the non-HM codon set. The HM codons showed significantly greater numbers of nonsilent substitutions than expected (Table 6). As shown in Table 1, eight of the 22 HM codons are among those we previously identified as being under positive selection to change the amino acid they encode. One interpretation of this result is that some of the HM codons are under selection to change the amino acid they encode to adapt to