Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 1
Sequence-Based Classiﬁcation of Select Agents: A Brighter Line Summary In 2006, the National Science Advisory Board for Biosecurity (NSABB) released a report, Addressing Biosecurity Concerns Related to the Synthesis of Select Agents (NSABB 2006), which considered the effects of synthetic biology and DNA synthesis technology on biosecurity and the current Select Agent Regulations. The principal concerns that it addressed were that DNA synthesis technology is rapidly diminishing barriers to acquisition of pathogens, because an increasing variety of organisms may be instantiated by whole genome synthesis, rather than by transfer of samples of existing organism stocks or cultures; Natural variation and intentional genetic modification blur the boundaries around any discrete list based on taxonomic names Synthetic biology may enable the accidental or deliberate creation of entirely novel pathogens unrelated to current ones. One of the NSABB recommendations proposed that a group of experts from the scientific community be assembled to determine if an alternative framework based on predicted features and properties encoded by nucleic acids, such as virulence or pathogenicity, can be developed and utilized in lieu of the current finite list of specific agents and taxonomic definitions. (NSABB 2006) Thus, the present study was initiated with the title “Scientific Milestones for the Development of a Gene Sequence-Based Classification System for Oversight of Select Agents” on the basis of this recommendation. The committee was specifically charged with identifying:
OCR for page 2
Sequence-Based Classiﬁcation of Select Agents: A Brighter Line the scientific advances that would be necessary to permit serious consideration of developing and implementing an oversight system for Select Agents that is based on predicted features and properties encoded by nucleic acids rather than a relatively static list of specific agents and taxonomic definitions. (Appendix A) It is implicit in the charge that a “predictive oversight” system is not now feasible. It is also implicit that “gene sequence-based classification,” is synonymous with “predict[ing] features and properties encoded by nucleic acids.” However, it soon became clear that the committee was confronted by two quite different tasks, one of which is feasible and one is not. It is possible to classify a new sequence as belonging within a group of known sequences; it is not feasible to predict the function(s) that sequence encodes. Thus, it is essential to distinguish sequence-based classification from sequence-based prediction of biological function. A sequence-based prediction system for oversight of Select Agents is not possible now and will not be possible in the usefully near future. Select Agent is not a biological term; rather it is a regulatory designation. Some properties historically considered in assigning an organism to the Select Agent list are not biological properties, and therefore, can never be determined from the organism’s genome sequence. High-level biological phenotypes—such as pathogenicity, transmissibility, and environmental stability—cannot plausibly be predicted with the degree of certainty required for regulatory purposes, either now or in the foreseeable future. Reliable prediction of the hazardous properties of pathogens from their genome sequence alone will require an extraordinarily detailed understanding of host, pathogen, and environment interactions integrated at the systems, organism, population, and ecosystem levels. It is a prediction problem of the greatest complexity. Biology is not binary. Microorganisms are not either “potential weapons of mass destruction” or “of no concern.” No single characteristic makes a microorganism a pathogen, and no clear-cut boundaries that separate a pathogen from a non-pathogen. Pathogenic microorganisms are not defined by taxonomy; it is common for a given microbial species to have both pathogenic and non-pathogenic representatives. An agent has multiple biological attributes, and the degree to which these are expressed fall along a spectrum for each biological characteristic;1 consequently, agents present varying degrees of risk. 1 For example, one microorganism may be highly virulent, but poorly transmissible from person to person, whereas another agent may spread easily, but produce only mild illness.
OCR for page 3
Sequence-Based Classiﬁcation of Select Agents: A Brighter Line For the foreseeable future, the only reliable predictor of the hazard posed by a biological agent will be actual experience with that agent. Synthetic genomics and the natural complexity of biology increasingly present challenges to biosecurity and biosafety that need to be addressed well before prediction of biological function will be feasible. There is a need to provide increased clarity—for investigators, biohobbyists, synthesis companies, and law enforcement officials—about which DNA sequences are subject to the Select Agent Regulations and which are not. Currently, the boundaries around the taxonomic names of Select Agents on the list are unclear. How similar should two sequences be for them to be given the same name? It is also unclear how much (which parts) of an agent must be present for it to be considered a Select Agent. When should a sequence be regarded as a non-functional “genomic fragment” as opposed to a “complete” agent subject to the Select Agent Regulations? It might also be desirable to provide information and oversight for “sequences of concern” that are not themselves Select Agents, but potentially could be used to produce a threat. To make it harder for people with nefarious intent to develop pathogens or toxins as weapons or as tools for bioterror without detection. To avoid the accidental, inadvertent, or ill-advised production of hazardous constructs by well-meaning investigators. A gene sequence-based classification system for Select Agents and a yellow flag biosafety system for “sequences of concern” could be developed with current technologies.2 A classification system could provide much needed clarification regarding application of the Select Agent Regulations. For the purposes of regulation, a discrete taxonomic list of Select Agents, augmented by sequence-based classification to better circumscribe taxonomic distinctions blurred by natural and synthetic variation and modification, is a reasonable strategy to maintain for the foreseeable future. Sequence-based classification is strictly operational—a set of tools for drawing decision boundaries around known sequences that do or do not belong to a desired classification. Those tools are used 2 As noted throughout this report, the classification and “yellow flag” system are presented as proposals for consideration; they should not be read as recommendations.
OCR for page 4
Sequence-Based Classiﬁcation of Select Agents: A Brighter Line now for robust and automatic classification of gene sequences into usefully annotated sequence families. An operational definition of a complete Select Agent would not predict whether a sequence encodes a functional pathogen. Sequence-based classification strategies would more sharply define the Select Agent Regulations to deal with issues raised by DNA synthesis and natural variation, and would thus establish a “brighter line”: an unambiguous procedure for deciding when a genome sequence is assigned one of the taxonomic names on the Select Agent list. The problem of classifying a sequence as a complete Select Agent genome (subject to the Select Agent Regulations) has two dimensions: (a) content3—how much sequence (how many parts) must be present to distinguish a potentially complete “infectious form” of an agent from a non-covered “genomic fragment” or “non-infectious component”; and (b) distance—how similar must the sequences of each of those parts be to an actual Select Agent sequence for the same Select Agent taxonomic name to be assigned to the synthetic organism. For each Select Agent, given a minimal parts list (content) and a profile-based classification system for each part (distance), the classification system could be tested, benchmarked, and challenged against known genome sequences. Once developed, the system could be updated to reflect the state of the art of biology and computation and to be correctly harmonized with the Select Agent list. A “yellow flag” biosafety system could provide a means of guidance and oversight for “sequences of concern.” The yellow flag system would function as an extension of biosafety; however, because it is not regulatory, it could also provide information relevant to biosecurity in a more dynamic and timely fashion than the Select Agent Regulations. The best way to deal with the unquantifiable threat of novel synthetic pathogens is through enhancements to the laboratory and clinical biosafety measures already established for dealing with the threat of emerging natural pathogens. The yellow flag system would comprise four main elements; a centralized biosafety sequence database, annotation of the sequences as empirical evidence of the function of the genes encoded by the sequences is acquired, a process for review and assessment of the 3 As discussed in Chapter 3, content could be defined by a single gene, such as in the case of a regulated toxin.
OCR for page 5
Sequence-Based Classiﬁcation of Select Agents: A Brighter Line evidence to determine the disposition of sequence of concern, and a yellow flag for sequences that are deemed “of concern.” The sequence-based classification presented by the committee is technologically feasible and may improve the current system; however, such a system does have limitations and potential adverse consequences. Therefore, we do not specifically recommend that it be implemented. Rather, we make two recommendations: The sequence space around each discrete taxonomic name on the Select Agent list needs to be clearly defined, so that Select Agent status can be unambiguously determined from a genome sequence (for example, by a DNA synthesis company). The sequence space should be broad enough to include the plausible modifications and chimeras that experts reasonably believe probably also act as Select Agents, without encompassing existing non-Select Agents. A sequence-based classification system could address this problem, and should be considered and weighed against the cost and complexity of implementing this technological augmentation to the current Select Agent Regulations. The committee identified specific milestones or focus areas that would aid in developing and implementing a sequence-based classification system and could yield information to improve prediction of function from sequence and to enhance understanding of infectious disease. Near-term milestones include: A sequence database with a Select Agent focus. A necessary precondition of a classification system is to have a number of representative sequences that belong to each desired classification, and a number of the most closely related sequences that do not belong. A comprehensive sequence database would thoroughly cover naturally occurring genetic variation based on geographic distribution, ecological or laboratory adaptations, and those associated with clinical severity or attenuation. The database would include not only Select Agent sequences, but also a representative set of near-neighbors for each Select Agent. An expanding sequence database of all biology. There are massive gaps in our knowledge of the genetic characteristics of much of the biological world. Such a sequence database could be used to help to identify “sequences of concern” that may be appropriate
OCR for page 6
Sequence-Based Classiﬁcation of Select Agents: A Brighter Line to monitor in the yellow flag system, in the interests of biosafety or biosecurity. Stratification of the Select Agent list. Several recent advisory panels have recommended stratification or reduction of the Select Agent list, to “focus the highest scrutiny on those agents that are indeed of greatest concern” and we are in agreement with that recommendation. Prioritizing the Select Agent list on the basis of risk would make any sequence-based approach to oversight more feasible. Long-term areas of research include: Protein structure and function; Gene expression and regulation; Pathogenic mechanisms Animal models of disease Data and information management for systems biology Synthetic biology Metagenomics and phylogenomic, including the human microbiome The near-term milestones and long-term research aim either to expand the general frontiers of biological knowledge or to apply existing knowledge to the Select Agent Regulations. Our committee was deeply uncomfortable with research programs that would seek to expand knowledge solely for the purpose of improving the Select Agent Regulations. Developing the ability to predict Select Agent pathogenicity from genome sequence raises serious dual-use concerns, because prediction and design go hand in hand. Accurate computational prediction of Select Agent characteristics from genome sequences enables computational design and optimization of bioweapon genome sequences. Predicting phenotype from genotype and improving public health by increasing our understanding of pathogenicity are two major goals of biology. It does not seem wise to make special plans for an effort in predicting the characteristics of Select Agents, in advance of other important frontiers of biological knowledge. It is more prudent to base the Select Agent Regulations on the current state of biological knowledge, as an applied problem, not a basic research problem. Predictive successes in the general biology research community should be passively monitored. Once biology in general approaches the goal of determining pathogenicity from sequence, then it would be appropriate to consider putting in place a predictive oversight system to identify Select Agent properties from a novel genome sequence. That time may not come for decades, and may be more than a century away. In the meantime, the technology and knowledge base for sequence-based classification exist now. Even a classification system can present dual-use issues, because implementing the system usefully requires that the information be
OCR for page 7
Sequence-Based Classiﬁcation of Select Agents: A Brighter Line shared. Listing the “parts” of a Select Agent and identifying other “sequences of concern” entirely on the basis of their potential to be dangerous when incorporated into a synthetic construct disseminates knowledge that could theoretically facilitate the design of a synthetic pathogen by a “bad actor.” However, inasmuch as the knowledge would be based on the current published state of the art (and on pathogen sequences that are already widely available in Genbank), any additional dual-use concerns are not nearly as grave. The Select Agent Regulations strive to balance a need for regulating access to the most dangerous pathogens with the need to minimize the regulatory burden on basic biological research aimed at monitoring, understanding, treating, and preventing disease. If the Select Agent Regulations are too burdensome, they may diminish long-term safety. Our report stops short of recommending the implementation of any specific sequence-based system for defining Select Agents; it was not in our charge, and we were not properly constituted to estimate the costs, benefits, or risks associated with any specific implementation program. We do find that the sequence-based classification system and yellow flag system are technologically feasible, but we have not carefully examined their cost or their effects on basic research or national security. We have made no argument that the positive aspects of using such systems to clarify a sequence-based definition of the discrete taxonomic names on the Select Agent list would outweigh any negative aspects of adding layers of complexity in the regulatory framework. Our principal finding is that sequence-based prediction of Select Agent properties is not feasible, now or in the foreseeable future; any dedicated research effort solely for this purpose is likely to have only negative consequences. When the committee’s report was in the final stages of completion, the White House issued on July 2, 2010, a new Executive Order, “Optimizing the Security of Biological Select Agents and Toxins in the United States.” Although the committee did not have time to consider fully the implications of this Executive Order, it notes that several issues are particularly relevant to this report; these are briefly discussed in Box 1.2.
OCR for page 8
Sequence-Based Classiﬁcation of Select Agents: A Brighter Line This page intentionally left blank.