While the complete genomic sequences of a few variola virus isolates are available, the overall scope of such information remains limited. As noted in Chapter 5, the complete genome DNA of variola major virus Bangladesh-1975 (GenBank #L22579) has been sequenced from clones with about sixfold redundancy. The variola major virus India-1967 (GenBank X69198) genome, except for a small region at each DNA terminal, and the variola minor alastrim virus Brazil-1966 genome (EMBL Y167080) also have been sequenced, with about twofold redundancy. The samples in the CDC and VECTOR repositories do not, however, represent a complete archive of characterized strains from the different outbreaks in recent history.*
Although the sequences of the above strains are not entirely identical, they are nearly so. Direct sequence comparison of the Bangladesh-1975 and India-1967 strains shows that the viruses are 99.2 percent identical throughout the entire genome (see Figure 9-1) [6, 36, 38]. While in one sense this finding argues for relatively little variability, that conclusion should be tempered by the following considerations. Most of the differences are clustered in the terminal regions of the viral genome. Those regions contain genes that frequently are not essential for viral replication, yet typically are associated with pathogenesis, interact with the immune system, and affect virulence and host range. While only 18 of 200 proteins in the entire genome differ significantly between the Bangladesh and India strains, 7 of 30 open reading frames at the left terminus and 8 of 22 open reading frames near the right terminus show variation between the two viruses . It must be remembered that a very minor change—a single base addition or deletion or a single amino acid coded by a gene—can lead to profound effects in the corresponding proteins that determine variations in virulence. Moreover, available sequence data have been derived from plaque-purified isolates whose DNA was cloned into plasmids, and there are sparse or no data on heterogeneity within individual isolates, the effect of cloning in bacteria, or the heterogeneity in strains other than those discussed above.
The issue of heterogeneity can be addressed using different strategies, such as multiple plaque-purified clones from the same isolate, or a complete catalogue of sequences from the left and right terminal regions of the genomes from strains with quite different clinical histories or epidemiological descriptions. Limited studies have shown that long-distance PCR and RFLP analysis of specific amplifications occasionally does not produce the restriction pattern predicted in the published sequence obtained from cloned DNA fragments [26, 39–41]. Specifically, within one PCR-amplified fragment where, say, four restriction sites with a given enzyme would have been predicted, only three are found with the corresponding adjustments in size. This discrepancy may be the result of poor long-distance PCR copy fidelity, unappreciated heterogeneity within the