identify new variants that may signal the need to update the vaccine. This bias produces an excess of mutations assigned to terminal branches simply because an isolate with no close relatives is by definition attached to the tree by a relatively long branch. Simulations show that the magnitude of excess mutations we observed in the hemagglutinin tree is consistent with expectations based on our sampling protocol. Sampling bias does net affect inferences about evolution drawn from phylogenetic analyses. However, if possible, the excess caused by host-mediated mutations should be removed from studies of the evolution of influenza viruses as they replicate in their human hosts.
It is well known that some pathogenic microbes undergo adaptation in response to laboratory culture. Host-mediated (HM) mutations have been particularly well studied in the influenza A virus (Robertson, 1993). However, this phenomenon has been documented in many other viruses, such as HIV, Japanese encephalitis virus, hepatitis A, and Sendai virus as well (Graff et al., 1994; Sawyer et al., 1994; Cao et al., 1995; Itoh et al., 1997). Molecular evolution studies using such sequences thus risk drawing inferences about the adaptation of the pathogen to its natural host from data containing laboratory artifacts. Additional problems may result from analysis of data sets that do not represent random samples of natural pathogen populations, or for which the sampling design is unknown. Here we determine the extent to which HM mutations and a known sampling bias affect studies of influenza A evolution.
Recent phylogenetic reconstruction of the evolution of human influenza A hemagglutinin (HA) of the H3 subtype revealed a 40% excess of amino acid replacements assigned to the terminal branches of the tree (Bush et al., 1999a). The 40% excess of coding changes on terminal branches was calculated by using expectations based on the relative number of internal and terminal branches of the tree in Fig. 1. This observation was made in the course of identifying codons at which mutation appeared to have been adaptive in evading the human immune system. Because we used phylogenetic trees to model HA evolution (Bush et al., 1999b), it was critical for our analyses that the excess mutations not be caused by evolutionary processes other than the ongoing evolution of the virus during replication in the human host. We proposed a number of hypotheses to explain the excess, but did not explore them in detail. Instead we simply deleted all mutations assigned to terminal branches from our analyses. In this paper we have tested two hypotheses that help to explain our observation.
The first hypothesis is that the excess consists of mutations that were either not present or were at low frequency in the viral sample when isolated from its human host. Although such mutations may increase in