frequency in the laboratory because of genetic drift, for at least 22 HA1 codons an increase in frequency is thought to reflect a response to selective pressure for growth in embryonated chicken eggs (Bush et al., 1999a). Such HM mutations most likely will appear on a phylogenetic tree as an additional mutation on a terminal branch, which is the branch attaching the sequence from a viral isolate to the tree. Phylogenetic reconstruction is based on similarity at all 329 codons. A HM mutation will alter only one of the 329 codons. Thus, a sequence of an isolate containing a HM mutation would in most cases still be most similar to the sequence from that isolate's closest relative. The effect of the HM mutation on the phylogenetic tree would be an increase in the length of the terminal branch joining the sequence from the egg-cultured isolate to the tree rather than a change in the point at which the branch is attached to the tree (Fig. 2).
The 22 suspected HM codons (Table 1) make up only 6.7% of the 329 codons in the HA1 domain, yet they account for 36.0% of the amino acid replacements across the HA tree in Fig. 1. Codons other than the set of 22 HM codons also may be found to undergo HM mutation with future study. There is thus great potential for error in inference if one assumes that HM mutations reflect evolution of influenza viruses within the human host. Here we test for the presence of HM mutations in our data set by examining the distribution of mutations in the HM and non-HM codons between branches attaching sequences from egg-cultured and cell-cultured isolates to the tree.
The second hypothesis to explain why we observed excess mutations assigned to the terminal branches of the HA tree is sampling bias. Our sequencing efforts are largely a contribution toward the World Health Organization influenza surveillance program. A priority in influenza surveillance is the identification of antigenically novel isolates from which previous infection with epidemic strains or prior immunization would not protect. The first level of screening for antigenic variants is the HA inhibition (HI) test, in which viral isolates are tested against postinfection ferret antiserum containing antibodies against HA from currently circulating strains of human influenza. We preferentially sequence the HA1 of isolates that appear, on the basis of the HI test, to be antigenically different from known circulating strains.
We illustrate how a bias against sequencing closely related viruses affects phylogenetic reconstruction in Fig. 2. In this hypothetical example, the tree on the left depicts the total population and each branch represents a single unique mutation. The tree on the right was constructed from a subset of eight relatively unrelated isolates. One of the 22 mutations used to construct the right-hand tree (on branch 4) reflects a HM change. Of the remaining 21 mutations, 15 are assigned to the eight terminal branches, and the remaining six mutations are assigned to the six internal branches.