National Academies Press: OpenBook
« Previous: Workshop Overview
Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Appendix A



Bruce Budowle,1,2Sarah E. Schmedes,1,2and Randall S. Murch1,3

The Challenge

Eliminating the threat of terrorist or criminal attacks with microorganisms or toxin weapons is a continual challenge for biodefense and biosecurity programs. The task is difficult for several reasons: (1) the relative ease of access to a variety of effective source materials (Srivatsan et al., 2008) and options for the delivery of a bioweapon, (2) the minute quantities of materials that can be transferred and yet still be effective, (3) the difficulties in detection and analysis of microbiological evidence, and (4) the lack of well-defined approaches regarding credible inferences that can be made from microbial forensic evidence given extant data. At the onset of an event, it may be difficult to distinguish between a deliberate attack and a naturally occurring outbreak of an infectious disease (Morse and


1 Institute of Applied Genetics, University of North Texas Health Science Center, Fort Worth, TX.

2 Department of Forensic and Investigative Genetics, University of North Texas Health Science Center, Fort Worth, TX.

3 Virginia Tech, National Capital Region, Arlington, VA.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Budowle, 2006; Morse and Khan, 2005). Even if evidence strongly supports the hypothesis of a deliberate attack, it may still be very difficult to attribute the attack with certainty to those responsible (i.e., attribution). Attempts to resolve the crime will require advanced methods for characterizing microbial agents, as well as a combination of traditional investigation and intelligence gathering activities.

The Approach

In response to the need to determine the nature of the threat and the source of the weapon and to identify those who perpetrated the crime, the scientific community rose to the occasion beginning in 1996 and developed the field of microbial forensics. Microbial forensics is the scientific discipline dedicated to analyzing evidence from a bioterrorism act, biocrime, hoax, or inadvertent microorganism/toxin release for attribution purposes (Budowle et al., 2003, 2005a; Köser et al., 2012; Morse and Budowle, 2006). Another goal can be to support analysis of potential bioweapons capabilities for counter-proliferation, treaty verification, and/or interdiction. A forensics investigation initially will attempt to determine the identity of the causal agent and/or source of the bioweapon in much the same manner as in an epidemiological investigation. The epidemiological concerns are identification and characterization of specific disease-causing pathogens or their toxins, their modes of transmission, and any manipulations that may have been performed intentionally to increase their effects against human, animal, or plant targets (Morse and Budowle, 2006; Morse and Khan, 2005). A microbial forensics investigation proceeds further in that evidence is characterized to assist in determining the specific source of the sample, as individualizing as possible, and the methods, means, processes, and locations involved to determine the identity of the perpetrator(s) of the attack or to determine that an act is in preparation. A systems analysis may be able to determine the processes used to generate the weapon or how it was delivered, which also can help inform the investigation and attribution decision. The ultimate goal is attribution—to identify the perpetrator(s) or to reduce the potential perpetrator population to as few individuals as possible so investigative and intelligence methods can be effectively and efficiently applied to “build the case” (Figure A1-1).

Forensic Targets

Microbial forensic evidence may include the microbe, toxin, nucleic acids, protein signatures, inadvertent microbial contaminants, stabilizers, additives, dispersal devices, and indications of the methods used in a preparation. In addition, traditional types of forensic evidence may be informative and should be part of the toolbox of potential analyses of evidence from an act of bioterrorism or biocrime. Traditional evidence includes fingerprints, body fluids and tissues, hair, fibers, documents, photos, digital evidence, videos, firearms, glass, metals, plastics, paint, powders, explosives, tool marks, and soil. Other types of relevant

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


FIGURE A1-1 The microbial forensics attribution continuum.

evidence must be considered to exploit avenues to better achieve attribution, including proteins and chemical signatures. These types of signatures can only be obtained from crimes where the weaponized material or delivery device is found; they have little use in covert attacks where the biological agent is derived from the victims. Many of these methods are based on sound technologies and are complementary. They can be combined to identify signatures of sample growth, processing, and chronometry (Morse and Budowle, 2006). Matching of sample properties can help to establish the relatedness of disparate incidents. Furthermore, mismatches might have exclusionary power or signify a more complex causal relationship between the events under investigation. The results of these analyses can provide information on how, when, and/or where microorganisms were grown and weaponized. While the goal of a microbial forensic analysis is to characterize a sample such that it can be traced to a unique source or at least eliminate other sources, it is unlikely that microbial forensic evidence alone is currently adequate to meet this goal.

Emerging Science and Technology

To enhance attribution capabilities with microbial evidence, considerable attention is being invested in molecular genetics, genomics, and bioinformatics. These fields are essential to microbial species/strain identification, fine genome variation, virulence determination, pathogenicity characterization, possible genetic engineering, and attaining source attribution to the highest degree possible. The various tools that have been, or are being, developed in these areas will help to narrow the potential sources from which the pathogen used in an attack may have originated. Indeed, sequencing of an entire genome has been demonstrated

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

as feasible in epidemiological investigations, such as the recent studies of outbreaks of E. coli O104:H4 in Germany and cholera in Haiti (Brzuszkiewicz et al., 2008; Chin et al., 2011; Grad et al., 2012; Hasan et al., 2012; Hendriksen et al., 2011; Mellmann et al., 2011; Rasko et al., 2011; Rohde et al., 2011). In addition, metagenomics studies may become foundational on describing diversity and endemicity. Endemicity becomes important when the relationship between microbes or their genetic residues in samples collected from a site of interest and microbes in the environmental background need to be defined. While the inferential capacity of microbial forensics genetics has yet to reach its full power, the phenomenal new generations of sequencing technology and the concomitant developments for bioinformatics capabilities to handle and extract the explosion of data offer potentials for enhancing microbial forensic investigations. Indeed, the science and technology supporting microbial forensics are advancing at an inconceivable rate. For example, in 2002 in response to the anthrax letter attack, whole genomes of a few isolates were sequenced using shotgun sequencing by TIGR (Budowle et al., 2005b; NRC, 2009; Ravel et al., 2009; Read et al., 2002, 2003). That seemingly nominal analysis, by today’s capabilities, cost approximately $250,000 for one genome, took several weeks, and was unable to characterize but a few samples. Today, such enterprises are a fraction of the cost (and continue to drop dramatically), are becoming more automatable, and provide gigabases and terabytes of data in a matter of days (Bentley et al., 2008; Holt et al., 2008; Loman et al., 2012; MacLean et al., 2009; Margulies et al., 2005).

Given the enhanced capabilities of nucleic acid sequencing of microbes the microbial forensics community will embrace these molecular tools. Although developments are needed, one can envision identification of microbes at the species, strain, and isolate levels being transformed using next- (or better termed “current-”) generation sequencing (CGS). Fine genome detail could become available for routine microbial forensic use. Because CGS provides whole genome characterization capabilities with high depths of coverage (100s to 1,000s fold and beyond), the technology will serve a critical role for research, such as genetic diversity and endemicity studies via metagenomics, and become a rapid diagnostic tool initially when viable and culturable microbes are available. Indeed, whole genome sequencing will reduce the need for a priori design of assays directed at defined species. The technology should apply at some resolution level to any genome without knowledge of the target. In addition, whole genome sequencing offers the capability to evaluate a sample for indications of genetic engineering.

Current Realities

However, not all microbial forensic evidence will present itself in a manner where copious quantities of target are available. Some samples will be highly degraded and/or contaminated. Thus, there will be challenges to extract the most

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

information possible from limited materials and non-viable organisms. To meet these challenges, improved sample collection and extraction methods will be needed, nucleic acid repair methods will be sought, target amplification strategies such as whole genome amplification and selective target capture will be sought, and sequencing chemistries will be enhanced. Because of the throughput, CGS technologies can analyze multiple samples and not even begin to exploit the full throughput of the systems (Brzuszkiewicz et al., 2011; Cummings et al., 2012; Eisen, 2007; Hasan et al., 2012; Holt et al., 2008; Howden et al., 2011; Loman et al., 2012; MacLean et al., 2009; Relman, 2011; Rohde et al., 2011). However, the technology still is evolving and currently does not offer the sensitivity of detection to analyze low-quantity and low-quality DNA samples without some amplification approach prior to sequencing. Nonetheless, CGS is sufficiently mature to be considered useful for microbial forensic applications. Alternatively, technologies, such as mass spectrometry analyses of nucleic acids and real-time PCR, will continue to be used because they offer rapid detection (at species and strain levels) at substantially lower costs (Jacob et al., 2012; Kenefic et al., 2008; Sampath et al., 2005, 2009; U’ren et al., 2005; Vogler et al., 2008).

There are a number of CGS instruments and different chemistries. They include Miseq® System and Hiseq™ Sequencing Systems (Illumina, Inc., San Diego, CA), Ion Personal Genome Machine™ (PGM™) Sequencer, Ion Proton™ Sequencer and SOLiD® Systems (Life Technologies, Foster City, CA), and the 454 Genome Sequencer FLX and GS Junior Systems (Roche Diagnostics Corporation, Indianapolis, IN) (Bentley et al., 2008; Cummings et al., 2010; Loman et al., 2012; Margulies et al., 2005). In addition, single molecule detection platforms, such those from Pacific Bioscience (Chin et al., 2011; Eid et al., 2009) and possibly Oxford Nanopore (Branton et al., 2008) are on the horizon. Each system offers some advantages and limitations for sequencing that will need to be defined with considerations of library preparation, read length, and accuracy. The evaluations should be based on the needs of application-oriented laboratories and not necessarily those of a research laboratory. Initially, microbial forensics instruments will be maintained in controlled laboratory environments.

Library preparation is one of the critical limiting factors for transferring CGS technology from a research environment to that of an operational laboratory. Currently, only a few samples can be prepared at any given time. Thus, while the sequencing throughput of the platforms is high, a sufficient number of samples cannot be readily prepared in an appropriate amount of time to meet the full capacity of the system. Library preparation needs to be simplified. Haloplex (Agilent, Santa Clara, CA) is an example of a library preparation process that potentially can reduce the preparation work required ( This library preparation approach is a single-tube target amplification methodology that enables a large number of library samples to be prepared manually. The general process is: (1) restriction digest and denature the sample; (2) hybridize probes to targeted ends of the digested fragments; (3) circularize and ligate the

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

molecules; and (4) introduce bar codes and amplify the targets by polymerase chain reaction (PCR). Eventually with automation the process might accommodate the number of samples that may be encountered by high-throughput operational laboratories. As many as 96 bar codes are available, which fits well with the 96-well format and reduces the preparation time from 2 weeks or several days to 6 hours. However, currently Haloplex is not available for use with non-human nucleic acids. One constraint is that the Haloplex system employs restriction digestion of the DNA. The restriction enzymes can potentially cleave a target site of interest (either a single nucleotide polymorphism (SNP) site or within a repeat motif) and render the marker untypable. Unfortunately, the enzymes used in Haloplex are proprietary, and one cannot readily scan for the restriction sites that would be incompatible with the designated targets (although palindromes can be sought for potential sites that may be obliterated). Another strategy for simplifying library preparation and decreasing sample input is that of the Nextera XT DNA Sample Preparation Kit ( Strategies, such as the Haloplex system and the Nextera XT DNA kit, hold promise for simplifying and possibly automating library preparation.

Another factor to consider with CGS technology is sequencing read length and accuracy. Current read lengths for the most widely used CGS instruments typically do not exceed 200 bases, and when they do, the quality of base calling decreases substantially along the length of a read. Longer reads with higher accuracy are necessary. Advances in technology for some platform systems suggest that reads up to 400 bases will be feasible in 2012.

Another consideration of platform selection is for situations where rapid responses are required (such as in military operations, some pandemics, and bioterrorism acts). Initially, platforms will be placed in laboratories with controlled environments. One can envision the technology being taken to the field for immediate response and exigent circumstances. Robustness of the instrumentation, supply lines of reagents, and service support will be part of the decision process for the instrumentation/chemistry of choice. Fortunately, the technology and supporting interpretation tools continue to evolve and likely will become more robust.

Seeking More Power and Depth

For design and selection of systems and diagnostics, different diagnostic-based strategies can be considered. They can be based on the sample type, the sample matrix, the amount of work, or the question that one is attempting to address. The latter may be the best suited for conceiving workflow systems. The different scenarios should be considered where nucleic acid analyses may be applied, because these will help guide the needs for the microbial forensic community. They likely are (1) identification of species/strain (i.e., similar to epidemiological needs), (2) attribution, (3) genetic engineering, (4) sample-to-sample

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


FIGURE A1-2 A general overview of the work and information flow from sample to analysis to information developed based on use of second-generation sequencing technology.

comparisons, and (5) metagenomics for endemicity (or a modified metagenomics for sample characterization) (Figure A1-2).

Sample identification generally would be direct characterization to identify the agent for immediate determination of potential threat and probable cause to investigate further. The process of attribution would drill down to the finest resolution possible and make comparisons to other reference samples, databases, or repositories to reduce the possible sources from which the sample originated or to a recent common ancestor. Genetic engineering could be detected by whole genome sequencing.

Metagenomics studies have been performed on several platforms, and they will likely provide some foundational data on diversity and endemicity (Eisen, 2007; Relman, 2011; Tringe et al., 2005). The value could be searching various niches for select agents. Suppose that in every sample tested certain select agents are identified. Then there can be two consequences: one is that it may be more difficult to elucidate natural outbreaks versus intentional releases (although strain resolution may reduce the uncertainty); the second could be that such high resolution may be less informative at some threshold depth of coverage.

Most metagenomic work to date has been by exploiting a small, single sequence target (16s rRNA), at a very high depth of coverage (Rusch et al., 2007; Venter et al., 2004). These studies often cannot provide resolution beyond family to genus levels. Clearly such broad range definition will not enable

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

individualization or identify select agents. The anthrax investigation could have benefited from a modified metagenomics characterization. The putative common source of the material (RMR1029) was composed of a population of very similar cells. The colony morphological variants found in the evidence from the 2001 anthrax letter attacks were minority components and because of sample preparation and stochastic effects the minor variants potentially could be difficult to detect with PCR-based assays that were developed for the investigation. Because of the high depth of coverage with CGS, the population of low-level variants may be more readily detected, especially if an amplification enrichment step was included that focused only on the known variant sites that defined the morphology types. Such high depth of coverage would substantially reduce the false-positive rate and improve confidence in the potential relationship of the most similar samples to focus investigative leads (Cummings et al., 2010). Indeed, the depth of coverage could be in the millions. While exquisitely sensitive, platform- and chemistry-specific errors may confound interpretation, and thus thresholds of reliability may be necessarily invoked.

One could envision extending this population depth analysis, which in essence is a simplified metagenomic analysis, and exploiting the concept of using a multi-locus sequence typing (MLST) approach to provide a species-level identification capability (Maiden et al., 1998; Spratt, 1999). A few loci (perhaps the seven typically applied to MLST to 15) could be selected as a standard (e.g., for bacteria). If there is a combination of sufficiently stable sites and evolutionarily rapid sites, the loci could indicate species- to strain-level presence in mixed and metagenomic samples. Using the core seven used for MLST could allow some questions regarding time and place of isolation, host or niche, serotype, and some clinical or drug resistance profiles. This will not be a trivial process because each of the sites will not be physically linked. However, one could determine, if the complete set or a reasonable subset of targets are in a sample, whether there is confidence that a particular species or sets of species are present. In theory this approach could be extended to strain levels. There certainly is enough through-put to consider this capability. The potential already has been established with electrospray ionization mass spectrometry of targeted genes for rapid bacterial species identification (and even for viruses such as influenza). There are sufficient bacterial genomes that have been sequenced to test our hypothesis, and work is under way.

Inferences about the significance of genetic evidence may not reach the ultimate goal of attribution. The most confounding constraint on reaching the full power of attribution is scant data on diversity and endemicity. The vast diversity of the microbial world is unknown and will not be defined substantially with current approaches in the area where a biocrime or bioterrorist attack has occurred. This limitation is not the sole purview of the microbial forensics community; it plagues the epidemiologists as well. Another limitation that evidentiary samples will likely have is an unknown history. Lack of knowledge on how it was manipulated (e.g., number of passages, exposure to mutagenic agents, length of storage)

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

will complicate providing inferences about the significance or strength of sequencing results, especially because the distance between samples will be determined by the degree of similarity or dissimilarity. Indeed, even defining what is a “match” or “similar” may not be straightforward. Keim (personal communication) has stressed this uncertainty and proffered new terminology—a “member,” to the microbial forensic lexicon based on phylogenetics for the relationship of a sample to some reference samples. Regardless of the terminology used, some data will be needed to define the uncertainty of a “membership” or “association.” In 2006, the need for reconciliation between microbial genomics and systematics was described; microbial forensics and epidemiology were seen to offer useful, practical venues to frame the gaps and priorities (Buckley and Roberts, 2006). This challenge remains.

Some assessment of the strength or significance of an analytical result and subsequent comparison also is needed (Budowle et al., 2008; Chakraborty and Budowle, 2011). Of course, because of scant supporting data, such an endeavor will be challenging. Qualitative and/or quantitative statements of the significance of the finding will need to be developed. As an example, consider a forensic analysis of whole genome sequence data that compared two or more sequences, such as an evidence sample profile with that of a reference sample that may be considered a possible direct link or have a common ancestor. The evolutionary rates of the variants will need to be known. But perhaps as consequential, sequencing error and other factors could inflate the dissimilarity between samples and add a degree of “uncertainty” to some extent. Thus, efforts in defining and quantifying the error rates associated with each CGS platform and chemistry are critically important.

Beyond comparison of samples for identification purposes are inferences by whole genome sequencing of phenotypic (i.e., functional) properties of a microbe. For example, even with a whole genome sequence whether a microbe phenotypically displays antimicrobial resistance or susceptibility is still limited. Bacteria may contain multiple pathways, and how the different genes interact is far from being completely understood (Eisen, 2007; Köser et al., 2012; Relman, 2011). Substantial research will be needed such that genotype can be used reliably to predict phenotype.

Making Sense of Data

The ever-increasing amount of microbial genomic sequence data presents a variety of challenges related to the handling and storage of data and the development of bioinformatics methods that can accommodate such large numbers of whole genomes. Being able to analyze the vast amounts of data in a timely fashion is a key challenge to leveraging the power of these newer sequencing platforms. Software, hardware, and IT support may be the greatest barrier to use of CGS technology. It is unlikely that dedicated bioinformaticists will reside in every microbial forensics laboratory. Data cannot be sent to web-based clouds

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

and be analyzed because the results may be classified. Instead, some standardization and standard operating data analysis and interpretation approaches will be needed. Pipeline and interpretation software will need to be evaluated for reliability and seamless diagnostic flow without bioinformatics expert intervention. The output of results must be intelligible to the microbial forensic analyst as well. The ideal software should be a comprehensive tool(s) enabling microbe detection to determination of engineering.

The government should rely heavily on industry and well-established genome centers. The commercial competitive environment is driving down costs and improving informatics pipelines without the need for extensive investment. Leveraging these efforts will help meet the needs of microbial forensics more expeditiously than going it alone. The centers (to include the national laboratories) are evaluating platforms and chemistries and are generating data at unprecedented levels. They are providing solutions to massive data handling, including storage, curation of reference data, annotation, and data analysis.

Collection and databases are needed to house the microbial genomic data and when possible the accompanying meta-data. No standards yet exist for building databases to meet the needs of the microbial forensic community. Requirements for storage and retrieval of raw sequence data in microbial forensics cases and supporting inferential data must be developed. Given the high throughput and anticipated speed of analyses, it is conceivable that meaningful databases can be developed “on the fly” that better reflect the diversity where the crime was committed (to include the preparation laboratory to the crime scene).

The power of microbial forensics techniques, tools, software, and databases that are used need to be understood, and their limitations even more so need to be understood. To achieve this goal methods need to be validated, and validation should be a requisite of any forensic repertoire. Indeed the forensic sciences in general are facing well-deserved criticism for not necessarily having sound foundations and overstating the strength of the evidence (NRC, 2009). Attempts to attribute any attack to a person(s) or group should rely on acurate and credible results. The interpretation of such results might seriously impact the course or focus of an investigation, thus affecting the liberties of individuals or even being used as a justification for a government’s military response to an attack or threat of an attack. Therefore, the methods for collection, extraction, and analysis of microbial evidence that could generate key results need to be as scientifically robust as possible, so the methods can be high performing and the results defensible for decision makers and to the legal, international government, law enforcement, and scientific communities, as well as scrutiny by the media.

Validation Is Essential

Validation is frequently used to connote confidence in a test or process, but it may be better thought of as defining the limitation of a method, process, or assay (Budowle et al., 2003, 2006, 2008). It still is common for the term validation

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

to be used vaguely or to remain undefined when applied to process performance evaluation. The degree of validation varies from nominal to rigorous. The consequences of such varied requirements can be catastrophic if methods used in microbial forensic investigations are poorly constructed, under-developed, or generate results that are difficult to interpret. The validation process needs to be defined as to what is expected to be achieved by a validation study.

Validation determines the limits of a test. It does not mean that a test must be 100 percent accurate or have no cross-reactivity, false-positive results, or false-negative results to be considered useful. It is often thought of as a process applied to the analytical portion of a system. This concept is only partly correct. The limits that the methods can provide must be demonstrated and documented for all steps of the process to include sample collection, preservation, extraction, analytical characterization, and data interpretation. Furthermore, it is recognized that as new technologies and capabilities are developed to address the needs of the microbial forensics community, key principles and performance parameters including accuracy, precision, bias, reliability, sensitivity, and robustness will need to be determined. Robust quality assurance and data control systems are required to achieve confidence in results by diverse users of the information. It is imperative that both technical and interpretation limitations (and thus accuracy and error) be defined. Additionally, a key resource for microbial forensic research, validation, and analysis is access to well-defined and curated microbial collections and data sets that are as comprehensive as is possible to the task. This effort includes the structure, content, and quality of the data sets. While some collections have been started for use in research, or created for case-specific use, no comprehensive repository exists to support microbial forensics, and standards are not codified for meta-data and data curation.

The implications of highly technical data, epidemiological data, traditional evidence data, and investigative or intelligence information are complex and need to be appreciated for their strengths and limitations. Because scientific data can affect the decision-making process for retaliation, preemptive actions, and/or courtroom deliberations, it is imperative that those directly involved in microbial forensics or those who may use the results for investigative lead value or more direct associations be properly educated (or at least properly apprised) of the implications of such data. To meet this necessary goal, education and training are critical to disseminate the principles, development, and applications of the evolving field of microbial forensics. Educational strategies and programs need to be constructed and training programs developed on the varied scientific foundations that support microbial forensics.

If validation processes are not defined and not followed and proper training or communication is not provided, then it is possible that a false sense of confidence may be associated with a poor method or process or from a result of limited significance. There are myriad methods, processes, targets, platforms, and applications. Yet some basic requirements transcend individual differences in methods, and these can be reinforced by contextual description (Table A1-1).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Validation needs to be codified. Efforts are under way and should be applied equally across the user space.


Microbial forensics should embrace and validate newly developed and emerging molecular biology technologies and phylogenetics approaches, and pursue potential forensic information and comparative sources, such as might be achieved through metagenomics. Genetic analyses of microorganisms often are a powerful tool for differentiating species, isolates, and strains. Similar to human DNA forensic identification, DNA sequences of microorganisms can be used to identify and differentiate between isolates and strains of a single microbial species; however, nucleic acid–based identification is not as resolving with respect to source attribution in microbial forensics as with human DNA forensic analysis. The basic constituents of nucleic acids essentially are the same for bacteria and humans; however, unlike humans, bacteria, viruses, and fungi multiply rapidly in a clonal fashion and can readily share or exchange genetic material between and among species. These differences and uncertainties due to scant supporting data must be taken into consideration during analysis, interpretation, and reporting related to the findings derived from microbial genetic evidence. For the foreseeable future

TABLE A1-1 Validation Criteria List

• Sensitivity

• Specificity

• Reproducibility

• Precision

• Accuracy

• Resolution

• Reliability

• Robustness

• Specified samples

• Purity

• Input values

• Quantitation

• Dynamic range

• Limit of detection

• Controls

• Window of performance for operational steps of assay

• Critical equipment calibration

• Critical reagents

• Databases

NOTE: It is difficult to prescribe the criteria for validation of the variety of methods that may be considered. The list is provided for consideration and is not meant to be exhaustive.
SOURCE: Derived from Budowle et al. (2008).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

the ability of microbial forensics to establish that a sample collected from either a crime scene or a person of interest can be attributed to a known source to a high degree of scientific certainty will be limited. Therefore, the methods must be reliable and robust, and the uncertainty associated with any interpretation should be properly conveyed.

Microbial forensics experts and those who contribute in closely related fields need to work together to advance the science, to validate methods to scientific and legal standards, and to transition interpretation of results and conclusions from such analyses into something that can be used by the criminal justice system, the policy community, and other stakeholders. It is incumbent upon the microbial forensics community to make every effort to interpret and communicate objectively and effectively the advantages and limitations of both microbial forensics and traditional forensic science analyses. Consumers of microbial forensic information who incorporate this evidence into decision making should be provided accurate, reliable, credible, and defensible results, interpretations, and context.


Bentley, D. R., S. Balasubramanian, H. P. Swerdlow, G. P. Smith, J. Milton, C. G. Brown, K. P. Hall, D. J. Evers, C. L. Barnes, H. R. Bignell, J. M. Boutell, J. Bryant, R. J. Carter, R. K. Cheetham, A. J. Cox, D. J. Ellis, M. R. Flatbush, N. A. Gormley, S. J. Humphray, L. J. Irving, M. S. Karbelashvili, S. M. Kirk, H. Li, X. Liu, K. S. Maisinger, L. J. Murray, B. Obradovic, T. Ost, M. L. Parkinson, M. R. Pratt, I. M. J. Rasolonjatovo, M. T. Reed, R. Rigatti, C. Rodighiero, M. T. Ross, A. Sabot, S. V. Sankar, A. Scally, G. P. Schroth, M. E. Smith, V. P. Smith, A. Spiridou, P. E. Torrance, S. S. Tzonev, E. H. Vermaas, K. Walter, X. Wu, L. Zhang, M. D. Alam, C. Anastasi, I. C. Aniebo, D. M. D. Bailey, I. R. Bancarz, S. Banerjee, S. G. Barbour, P. A. Baybayan, V. A. Benoit, K. F. Benson, C. Bevis, P. J. Black, A. Boodhun, J. S. Brennan, J. A. Bridgham, R. C. Brown, A. A. Brown, D. H. Buermann, A. A. Bundu, J. C. Burrows, N. P. Carter, N. Castillo, M. C. E. Catenazzi, S. Chang, R. N. Cooley, N. R. Crake, O. O. Dada, K. D. Diakoumakos, B. Dominguez-Fernandez, D. J. Earnshaw, U. C. Egbujor, D. W. Elmore, S. S. Etchin, M. R. Ewan, M. Fedurco, L. J. Fraser, K. V. Fuentes Fajardo, W. S. Furey, D. George, K. J. Gietzen, C. P. Goddard, G. S. Golda, P. A. Granieri, D. E. Green, D. L. Gustafson, N. F. Hansen, K. Harnish, C. D. Haudenschild, N. I. Heyer, M. M. Hims, J. T. Ho, A. M. Horgan, K. Hoschler, S. Hurwitz, D. V. Ivanov, M. Q. Johnson, T. James, T. A. Huw Jones, G.-D. Kang, T. H. Kerelska, A. D. Kersey, I. Khrebtukova, A. P. Kindwall, Z. Kingsbury, P. I. Kokko-Gonzales, A. Kumar, M. A. Laurent, C. T. Lawley, S. E. Lee, X. Lee, A. K. Liao, J. A. Loch, M. Lok, S. Luo, R. M. Mammen, J. W. Martin, P. G. McCauley, P. McNitt, P. Mehta, K. W. Moon, J. W. Mullens, T. Newington, Z. Ning, B. L. Ng, S. M. Novo, M. J. O’Neill, M. A. Osborne, A. Osnowski, O. Ostadan, L. L. Paraschos, L. Pickering, A. C. Pike, A. C. Pike, D. C. Pinkard, D. P. Pliskin, J. Podhasky, V. J. Quijano, C. Raczy, V. H. Rae, S. R. Rawlings, A. Chiva Rodriguez, P. M. Roe, J. Rogers, M. C. Rogert Bacigalupo, N. Romanov, A. Romieu, R. K. Roth, N. J. Rourke, S. T. Ruediger, E. Rusman, R. M. Sanches-Kuiper, M. R. Schenker, J. M. Seoane, R. J. Shaw, M. K. Shiver, S. W. Short, N. L. Sizto, J. P. Sluis, M. A. Smith, J. Ernest Sohna Sohna, E. J. Spence, K. Stevens, N. Sutton, L. Szajkowski, C. L. Tregidgo, G. Turcatti, S. Vandevondele, Y. Verhovsky, S. M. Virk, S. Wakelin, G. C. Walcott, J. Wang, G. J. Worsley, J. Yan, L. Yau, M. Zuerlein, J. Rogers, J. C. Mullikin, M. E. Hurles, N. J. McCooke, J. S. West, F. L. Oaks, P. L. Lundberg, D. Klenerman, R. Durbin, and A. J. Smith. 2008. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456:53-59.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Branton, D., D. W. Deamer, A. Marziali, H. Bayley, S. A. Benner, T. Butler, M. Di Ventra, S. Garaj, A. Hibbs, X. Huang, S. B. Jovanovich, P. S. Krstic, S. Lindsay, X. S. Ling, C. H. Mastrangelo, A. Meller, J. S. Oliver, Y. V. Pershin, J. M. Ramsey, R. Riehn, G. V. Soni, V. Tabard-Cossa, M. Wanunu, M. Wiggin, and J. A. Schloss. 2008. The potential and challenges of nanopore sequencing. Nature Biotechnology 26:1146-1153.

Brzuszkiewicz, E., A. Thurmer, J. Schuldes, A. Leimbach, H. Liesegang, F. D. Meyer, J. Boelter, H. Petersen, G. Gottschalk, R. Daniel. 2011. Genome sequence analyses of two isolates from the recent Escherichia coli outbreak in Germany reveal the emergence of a new pathotype: Entero-aggregative-haemorrhagic Escherichia coli (EAHEC). Archives of Microbiology 193:883-891.

Buckley, M., and R. J. Roberts. 2006. Reconciling microbial systematics and genomics. Report of a Colloquium of the American Academy of Microbiology, Washington, DC: ASM Press.

Budowle, B., S. E. Schutzer, A. Einseln, L. C. Kelley, A. C. Walsh, J. A. Smith, B. L. Marrone, J. Robertson, and J. Campos. 2003. Building microbial forensics as a response to bio-terrorism. Science 301:1852-1853.

Budowle, B., S. E. Schutzer, M. S. Ascher, R. M. Atlas, J. P. Burans, R. Chakraborty, J. J. Dunn, C. M. Fraser, D. R. Franz, T. J. Leighton, S. A. Morse, R. S. Murch, J. Ravel, D. L. Rock, T. R. Slezak, S. P. Velsko, A. C. Walsh, R. A. Walters. 2005a. Toward a system of microbial forensics: From sample collection to interpretation of evidence. Applied and Environmental Microbiology 71:2209-2213.

Budowle, B., M. D. Johnson, C. M. Fraser, T. J. Leighton, R. S. Murch, and R. Chakraborty. 2005b. Genetic analysis and attribution of microbial forensics evidence. Critical Reviews in Microbiology 31(4):233-254.

Budowle, B., S. E. Schutzer, J. P. Burans, D. J. Beecher, T. A. Cebula, R. Chakraborty, W. T. Cobb, J. Fletcher, M. L. Hale, R. B. Harris, M. A. Heitkamp, F. P. Keller, C. Kuske, J. E. LeClerc, B. L. Marrone, T. S. McKenna, S. A. Morse, L. L. Rodriguez, N. B. Valentine, and J. Yadev. 2006. Quality sample collection, handling, and preservation for an effective microbial forensics program. Applied and Environmental Microbiology 72(10):6431-6438.

Budowle, B., S. E. Schutzer, S. A. Morse, K. F. Martinez, R. Chakraborty, B. L. Marrone, S. L. Messenger, R. S. Murch, P. J. Jackson, P. Williamson, R. Harmon, and S. P. Velsko. 2008. Criteria for validation of methods in microbial forensics. Applied and Environmental Microbiology 74:5559-5607.

Chakraborty, R., and B. Budowle. 2011. Population genetic considerations in statistical interpretation of microbial forensic data in comparison with the human DNA forensic standard. In: Microbial Forensics, 2nd ed., edited by: B. Budowle, S. E. Schutzer, R. Breeze, P. S. Keim, and S. A. Morse. Amsterdam: Academic Press. Pp. 561-580.

Chin, C. S., J. Sorenson, J. B. Harris, W. P. Robins, R. C. Charles, R. R. Jean-Charles, J. Bullard, D. R. Webster, A. Kasarskis, P. Peluso, E. E. Paxinos, Y. Yamaichi, S. B. Calderwood, J. J. Mekalanos, E. E. Schadt, and M. K. Waldor. 2011. The origin of the Haitian cholera outbreak strain. New England Journal of Medicine 364:33-42.

Cummings, C. A., C. A. Bormann-Chung, R. Fang, M. Barker, P. Brzoska, P. C. Williamson, J. Beaudry, M. Matthews, J. Schupp, D. M. Wagner, D. Birdsell, A. J. Vogler, M. R. Furtado, P. Keim, and B. Budowle. 2010. Accurate, rapid, and high-throughput detection of strain-specific polymorphisms in Bacillus anthracis and Yersinia pestis by next-generation sequencing. BMC Investigative Genetics 1:5.

Eid, J., A. Fehr, J. Gray, K. Luong, J. Lyle, G. Otto, P. Peluso, D. Rank, P. Baybayan, B. Bettman, A. Bibillo, K. Bjornson, B. Chaudhuri, F. Christians, R. Cicero, S. Clark, R. Dalal, A. deWinter, J. Dixon, M. Foquet, A. Gaertner, P. Hardenbol, C. Heiner, K. Hester, D. Holden, G. Kearns, X. Kong, R. Kuse, Y. Lacroix, S. Lin, P. Lundquist, C. Ma, P. Marks, M. Maxham, D. Murphy, I. Park, T. Pham, M. Phillips, J. Roy, R. Sebra, G. Shen, J. Sorenson, A. Tomaney, K. Travers, M. Trulson, J. Vieceli, J. Wegener, D. Wu, A. Yang, D. Zaccarin, P. Zhao, F. Zhong, J. Korlach, and S. Turner. 2009. Real-time DNA sequencing from single polymerase molecules. Science 323:133-138.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Eisen, J. A. 2007. Environmental shotgun sequencing: Its potential and challenges for studying the hidden world of microbes. PLoS Biology 5(3): e82.

Grad, Y. H., M. Lipsitch, M. Feldgarden, H. M. Arachchi, G. C. Cerqueira, M. Fitzgerald, P. Godfrey, B. J. Haas, C. I. Murphy, C. Russ, S. Sykes, B. J. Walker, J. R. Wortman, S. Young, Q. Zeng, A. Abouelleil, J. Bochicchio, S. Chauvin, T. DeSmet, S. Gujja, C. McCowan, A. Montmayeur, S. Steelman, J. Frimodt-Møller, A. M. Petersen, C. Struve, K. A. Krogfelt, E. Bingen, F-X. Weill, E. S. Lander, C. Nusbaum, B. W. Birren, D. T. Hung, and W. P. Hanage. 2012. Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011. Proceedings of the National Academy of Sciences USA 109:3065-3070.

Hasan, N. A., S. Y. Choi, M. Eppinger, P. W. Clark, A. Chen, M. Alam, B. J. Haley, E. Taviani, E. Hine, Q. Su, L. J. Tallon, J. B. Prosper, K. Furth, M. M. Hog, H. Li, C. M. Fraser-Liggett, A. Cravioto, A. Hug, J. Ravel, T. A. Cebula, and R. R. Colwell. 2012. Genomic diversity of 2010 Haitian cholera outbreak strains. Proceedings of the National Academy of Sciences USA 109(29):E2010-E2017.

Hendriksen, R. S., L. B. Price, J. M. Schupp, J. D. Gillece, R. S. Kaas, D. M. Engelthaler, V. Bortolaia, T. Pearson, A. E. Waters, B. P. Upadhyay, S. D. Shrestha, S. Adhikai, G. Shakya, P. S. Keim, and F. M. Aarestrup. 2011. Population genetics of Vibrio cholerae from Nepal in 2010: Evidence on the origin of the Haitian outbreak. MBio 2(4):e00157-e00111.

Holt, K. E., J. Parkhill, C. J. Mazzoni, P. Roumagnac, F-X. Weill, I. Goodhead, R. Rance, S. Baker, D. J. Maskell, J. Wain, C. Dolecek, M. Achtman, and G. Dougan. 2008. High-throughput sequencing provides insights into genome variation and evolution in Salmonella typhi. Nature Genetics 40:987-993.

Howden, B. P., C. R. E. McEvoy, D. L. Allen, K. Chua, W. Gao, P. F. Harrison, J. Bell, G. Coombs, V. Bennett-Wood, J. L. Porter, R. Robins-Browne, J. K. Davies, T. Seemann, T. P. Stinear. 2011. Evolution of multidrug resistance during Staphylococcus aureus infection involves mutation of the essential two component regulator WalKR. PLoS Pathogens 7(11):e1002359.

Jacob, D., U. Sauer, R. Housley, C. Washington, K. Sannes-Lowery, D. J. Ecker, R. Sampath, R. Grunow. 2012. Rapid and high-throughput detection of highly pathogenic bacteria by Ibis PLEX-ID technology. PLoS One 7(6):e39928.

Kenefic, L. J., J. Beaudry, C. Trim, R. Daly, R. Parmar, S. Zanecki, L. Huynh, M. N. Van Ert, D. M. Wagner, T. Graham, and P. Keim. 2008. High resolution genotyping of Bacillus anthracis outbreak strains using four highly mutable single nucleotide repeat markers. Letters in Applied Microbiology 46:600-603.

Köser, C. U., M. J. Ellington, E. J. Cartwright, S. H. Gillespie, N. M. Brown, M. Farrington, M. T. G. Holden, G. Dougan, S. D. Bentley, J. Parkhill, and S. J. Peacock. 2012. Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLoS Pathogens 8(8):e1002824.

Loman, N. J., R. V. Misra, T. J. Dallman, C. Constantinidou, S. E. Gharbia, J. Wain, and M. J. Pallen. 2012. Performance comparison of benchtop high-throughput sequencing platforms. Nature Biotechnology 30(5):434-439.

MacLean, D., J. D. Jones, and D. J. Studholme. 2009. Application of “next-generation” sequencing technologies to microbial genetics. Nature Reviews Microbiology 7(4):287-296.

Maiden, M. C., J. A. Bygraves, E. Feil, G. Morelli, J. E. Russell, R. Urwin, Q. Zhang, J. Zhou, K. Zurth, D. A. Caugant, I. M. Feavers, M. Achtman, and B. G. Spratt. 1998. Multilocus sequence typing: A portable approach to the identification of clones within populations of pathogenic microorganisms. Proceedings of the National Academy of Sciences USA 95:3140-3145.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Margulies, M., M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben, J. Berka, M. S. Braverman, Y-J. Chen, Z. Chen, S. B. Dewell, L. Du, J. M. Fierro, X. V. Gomes, B. C. Godwin, W. He, S. Helgesen, C. H. Ho, G. P. Irzyk, S. C. Jando, M. L. I. Alenquer, T. P. Jarvie, K B. Jirage, J-B Kim, J. R. Knight, J. R. Lanza, J. H. Leamon, S. M. Lefkowitz, M. Lei, J. Li, K. L. Lohman, H. Lu, V. B. Makhijani, K. E. McDade, M. P. McKenna, E. W. Myers, E. Nickerson, J. R. Nobile, R. Plant, B. P. Puc, M. T. Ronan, G. T. Roth, G. J. Sarkis, J. F. Simons, J. W. Simpson, M. Srinivasan, K. R. Tartaro, A. Tomasz, K. A. Vogt, G. A. Volkmer, S. H. Wang, Y. Wang, M. P. Weiner, P. Yu, R. F. Begley, and J. M. Rothberg. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376-380.

Mellmann, A., D. Harmsen, C. A. Cummings, E. B. Zentz, S. R. Leopold, A. Rico, K. Prior, R. Szczepanowski, Y. Ji, W. Zhang, S. F. McLaughlin, J. K. Henkhaus, B. Leopold, M. Bielaszewska, R. Prager, P. M. Brzoska, R. L. Moore, S. Guenther, J. M. Rothberg, and H. Karch. 2011. Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next-generation sequencing technology. PLoS One 6(7):e22751.

Morse, S. A., and B. Budowle. 2006. Microbial forensics: Application to bioterrorism preparedness and response. Infectious Disease Clinics of North America 20:455-473.

Morse, S. A., and A. S. Khan. 2005. Epidemiologic investigation for public health, biodefense, and forensic microbiology. In: Microbial Forensics, edited by R. Breeze, B. Budowle, and S. Schutzer. Amsterdam: Academic Press. Pp. 157-171.

NRC (National Research Council). 2009. Strengthening forensic science in the United States: A path forward. Washington, DC: The National Academies Press.

Rasko, D. A., P. L. Worshamb, T. G. Abshireb, S. T. Stanley, J. D. Bannand, M. R. Wilson, R. J. Langham, R. S. Decker, L. Jianga, T. D. Reade, A. M. Phillippy, S. L. Salzberg, M. Pop, M. N. Van Ert, L. J. Kenefic, P. S. Keim, C. M. Fraser-Liggett, and J. Ravel. 2011. Bacillus anthracis comparative genome analysis in support of the Amerithrax investigation. Proceedings of the National Academy of Sciences USA 108(12):5027-5032.

Ravel, J., L. Jiang, S. T. Stanley, M. R. Wilson, R. S. Decker, T. D. Read, P. Worsham, P. S. Keim, S. L. Salzberg, C. M. Liggett, and D. A. Rasko. 2009. The complete genome sequence of Bacillus anthracis Ames “Ancestor.” Journal of Bacteriology 191:445-446.

Read, T. D., S. L. Salzberg, M. Pop, M. Shumway, L. Umayam, L. Jiang, E. Holtzapple, J. D. Busch, K. L. Smith, J. M. Schupp, D. Solomon, P. Keim, and C. M. Fraser. 2002. Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis. Science 296:2028-2033.

Read, T. D., S. N. Peterson, N. Tourasse, L. W. Baillie, I. T. Paulsen, K. E. Nelson, H. Tettelin, D. E. Fouts, J. A. Eisen, S. R. Gill, E. K. Holtzapple, O. A. Okstad, E. Helgason, J. Rilstone, M. Wu, J. F. Kolonay, M. J. Beanman, R. J. Dodson, L. M. Brinkac, M. Gwinn, R. T. DeBoy, R. Madpu, S. C. Daugherty, A. S. Durkin, D. H. Haft, W. C. Nelson, J. D. Peterson, M. Pop, H. M. Khouri, D. Radune, J. L. Benton, Y. Mahamoud, L. Jiang, I. R. Hance, J. F. Wiedman, K. J. Berry, R. D. Plaut, A. M. Wolf, K. L. Watkins, W. C. Nierman, A. Hazen, R. Cline, C. Redmond, J. E. Thwaite, O. White, S. L. Salzberg, B. Thomason, A. M. Friedlander, T. M. Koehler, P. C. Hanna, A. B. Kolstø, and C. M. Fraser. 2003. The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria. Nature 423:81-86.

Relman, D. A. 2011. Microbial genomics and infectious diseases. New England Journal of Medicine 365:347-357.

Rohde, H., J. Qin, Y. Cui, D. Li, N. J. Loman, M. Hentschke, W. Chen, F. Pu, Y. Peng, J. Li, F. Xi, S. Li, Y. Li, Z. Zhang, X. Yang, M. Zhao, P. Wang, Y. Guan, Z. Cen, X. Zhao, M. Christner, R. Kobbe, S. Loos, J. Oh, L. Yang, A. Danchin, G. F. Gao, Y. Song, Y. Li, H. Yang, J. Wang, J. Xu, M. J. Pallen, J. Wang, M. Aepfelbacher, and R. Yang. 2011. E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium 2011. Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. New England Journal of Medicine 365(8):718-724.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Rusch, D. B., A. L. Halpern, G. Sutton, K. B. Heidelberg, S. Williamson, S. Yooseph, D. Wu, J. A. Eisen, J. M. Hoffman, K. Remington, K. Beeson, B. Tran, H. Smith, H. Baden-Tillson, C. Stewart, J. Thorpe, J. Freeman, C. Andrews-Pfannkoch, J. E. Venter, K. Li, S. Kravitz, J. F. Heidelberg, T. Utterback, Y-H. Rogers, L. I. Falcón, V. Souza, G. Bonilla-Rosso, L. E. Eguiarte, D. M. Karl, S. Sathyendranath, T. Platt, E. Bermingham, V. Gallardo, G. Tamayo-Castillo, M. R. Ferrari, R. L. Strausberg, K. Nealson, R. Friedman, M. Frazier, and J. C. Venter. 2007. The Sorcerer II global ocean sampling expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biology 5:e77.

Sampath, R., N. Mulholland, L. B. Blyn, M. W. Eshoo, T. A. Hall, C. Massire, H. M. Levene, J. C. Hannis, P. M. Harrell, B. Neuman, M. J. Buchmeier, Y. Jiang, R. Ranken, J. J. Drader, V. Samant, R. H. Griffey, J. A. McNeil, S. T. Crooke, and D. J. Ecker. 2005. Rapid identification of emerging pathogens: Coronavirus. Emerging Infectious Diseases 11:373-379.

Sampath, R., N. Mulholland, L. B. Blyn, C. Massire, C. A. Whitehouse, N. Waybright, C. Harter, J. Bogan, M. S. Miranda, D. Smith, C. Baldwin, M. Wolcott, D. Norwood, R. Kreft, M. Frinder, R. Lovari, I. Yasuda, H. Matthews, D. Toleno, R. Housley, D. Duncan, F. Li, R. Warren, M. W. Eshoo, T. A. Hall, S. A. Hofstadler, and D. J. Ecker. 2009. Comprehensive biothreat cluster identification by PCR/electrospray-ionization mass spectrometry. Nature Reviews Microbiology 7(4):287-296.

Spratt, B. G. 1999. Multilocus sequence typing: Molecular typing of bacterial pathogens in an era of rapid DNA sequencing and the Internet. Current Opinion in Microbiology 2:312-316.

Srivatsan, A., Y. Han, J. Peng, A. K. Tehranchi, R. Gibbs, J. D. Wang, and R. Chen. 2008. High-precision, whole-genome sequencing of laboratory strains facilitates genetic studies. PLoS Genetics 4(8):e1000139.

Tringe, S. G., C. von Mering, A. Kobayashi, A. A. Salamov, K. Chen, H. W. Chang, M. Podar, J. M. Short, E. J. Mathur, J. C. Detter, P. Bork, P. Hugenholtz, and E. M. Rubin. 2005. Comparative metagenomics of microbial communities. Science 308:554-557.

U’ren, J. M., M. N. Vant, J. M. Schupp, W. R. Easterday, T. S. Simonson, R. T. Okinaka, T. Pearson, and P. Keim. 2005. Use of a real-time PCR TaqMan assay for rapid identification and differentiation of Burkholderia pseudomallei and Burkholderia mallei. Journal of Clinical Microbiology 43:5771-5774.

Venter, J. C., K. Remington, J. F. Heidelberg, A. L. Halpern, D. Rusch, J. A. Eisen, D. Wu, I. Paulsen, K. E. Nelson, W. Nelson, D. E. Fouts, S. Levy, A. H. Knap, M. W. Lomas, K. Nealson, O. White, J. Peterson, J. Hoffman, R. Parsons, H. Baden-Tillson, C. Pfannkoch, Y-H. Rogers, and H. O. Smith. 2004. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304:66-74.

Vogler, A. J., E. M. Driebe, J. Lee, R. K. Auerbach, C. J. Allender, M. Stanley, K. Kubota, G. L. Andersen, L. Radnedge, P. L. Worsham, P. Keim, and D. M. Wagner. 2008. Assays for the rapid and specific identification of North American Yersinia pestis and the common laboratory strain CO92. BioTechniques 44:201-207.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.



Arturo Casadevall,5,* Ferric C. Fang,6Liise-anne Pirofski5

Although an existential threat from the microbial world might seem like science fiction, a catastrophic decline in amphibian populations with the extinction of dozens of species has been attributed to a chytrid fungus (Daszak et al., 1999; Pound et al., 2006), and North American bats are being decimated by Geomyces destructans, a new fungal pathogen (Blehert et al., 2009). Hence, individual microbes can cause the extinction of a species. In the foregoing instances, neither fungus had a known relationship with the threatened species; there was neither selection pressure for pathogen attenuation nor effective host defense. Humans are also constantly confronted by new microbial threats as witnessed by the appearance of HIV, SARS coronavirus, and the latest influenza pandemic. While some microbial threats seem to be frequently emerging or re-emerging, others seem to wane or attenuate with time, as exemplified by the decline of rheumatic heart disease (Quinn, 1989), the evolution of syphilis from a fulminant to a chronic disease (Tognotti, 2009), and the disappearance of “English sweating sickness” (Beeson, 1980). A defining feature of infectious diseases is changeability, with change being a function of microbial, host, environmental, and societal changes that together translate into changes in the outcome of a host–microbe interaction. Given that species as varied as amphibians and bats can be threatened with extinction by microbes, the development of predictive tools for identifying microbial threats is both desirable and important.


4 Reprinted from PLoS Pathogens. Originally published as Casadevall A, Fang FC, Pirofski L-a (2011) Microbial Virulence as an Emergent Property: Consequences and Opportunities. PLoS Pathogens 7(7): e1002136. doi:10.1371/journal.ppat.1002136.

Editor: Glenn F. Rall, The Fox Chase Cancer Center, United States of America.

5 Department of Microbiology & Immunology and Medicine, Albert Einstein College of Medicine, Bronx, New York, United States of America.

6 Departments of Laboratory Medicine and Microbiology, University of Washington School of Medicine, Seattle, Washington, United States of America.

Published: July 21, 2011

Copyright: ©2011 Casadevall et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The authors are supported by NIH grants AI-45459 (LP), AI44374 (LP), AI39557 (FCF), AI44486 (FCF), AI77629 (FCF), AI91966 (FCF), HL059842 (AC), AI033774 (AC), AI033142 (AC), and AI052733 (AC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

* E-mail:

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Virulence as an Emergent Property

To those familiar with the concept of emergence (Box A2-1), it probably comes as no surprise that microbial virulence is an emerging property. However, the traditional view of microbial pathogenesis has been reductionist (Fang and Casadevall, 2011), namely, assigning responsibility for virulence to either the microbe or the host. Such pathogen- and host-centric views, and in turn the scientific approaches fostered by these viewpoints, differ significantly in their historical underpinnings and philosophy (Biron and Casadevall, 2010). In fact, neither alone can account for how new infectious diseases arise. The conclusion that virulence is an emergent property is obvious when one considers that microbial virulence can only be expressed in a susceptible host (Casadevall and Pirofski, 2001). Consequently, the very same microbe can be virulent in one host but avirulent in another (Casadevall and Pirofski, 1999). Furthermore, host immunity can negate virulence, as evidenced by the effectiveness of immunization that renders a microbe as deadly as the variola virus completely avirulent in individuals inoculated with the vaccinia virus. Infection with a microbe can result in diametrically opposed outcomes, ranging from the death of a host to

BOX A2-1
The Concept of Emergent Properties

Emergent properties are properties that cannot be entirely explained by their individual components (Ponge, 2005). An element of novelty is also considered to be an essential attribute of “emergent,” a term that contrasts with “resultant” with the latter denoting an outcome that is predicted from the combination of the two components, such that resultant properties are additive whereas emergent properties are non-additive (Ablowitz, 1939). Another facet of emergent properties is that they are irreducible to their constituent components. Most treatises on emergence have emphasized that emergent properties have two components: an outcome that is greater than the sum of the parts and some form of novelty (Ablowitz, 1939; Baylis, 1929; Henpel and Oppenheim, 2011). Although the concept of emergence dates back to antiquity when Aristotle stated that the “whole is not just the sum of its parts,” there is increasing interest in emergent properties as it becomes increasingly evident that reductionistic approaches cannot explain many phenomena in our world (Parrish et al., 2011). Examples of emergent properties in liquids are surface tension and viscosity, neither of which can be explained by analysis of individual molecules, as the properties pertain to the macroscopic world, and these phenomena have no corresponding analogs in the molecular realm. Biological systems have been described as characterized by emergent properties that exist at the edge of chaos, such that small fluctuations in their conditions can lead to sudden major changes (Mazzocchi, 2008). Similarly, self-organized movements of individuals, as in schools of fish, can result in a variety of forms that are thought to protect against predators (Parrish et al., 2011).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

elimination of the microbe. Hence, virulence is inherently novel, unpredictable, and irreducible to first principles.

Critical to our understanding of virulence as a property that can only be expressed in a susceptible host is that both the microbe and the host bring their own emergent properties to their interaction. Host and microbial cells receive and process information by signaling cascades that manifest emergent properties (Bhalla and Iyengar, 1999); e.g., gene expression studies reveal heterogeneous or bi-stable expression in clonal cell populations with important implications for phenotypic variability and fitness (Dubnau and Losick, 2006; Veening et al., 2008). Other emergent properties that have been identified in microbial and cellular systems could influence pathogenesis. Intracellular parasitism is associated with genome reduction, a phenomenon that could confer emergent properties, given that deliberate genome reduction in E. coli has led to unexpected emergent properties, such as ease of electroporation and increased stability of cloned DNA and plasmids (Posfai et al., 2006).

On the host side, many aspects of the immune system have the potential to spawn emergent properties. The antigenic determinants of a microbe are defined by antibodies and processing by host cells, consequently existing only in the context of an immune system (Van Regenmortel, 2004). Microbial determinants can elicit host-damaging immune responses. Such deleterious responses exemplify a detrimental emergent property of the same host defense mechanisms that mediate antimicrobial effects. The outcome of a viral infection can depend on prior infection with related or unrelated viruses that express related antigens; hence, the infection history of a host affects the outcome of subsequent infections (Welsh et al., 2010).

For those accustomed to viewing host–microbe interactions from an evolutionary perspective (Dethlefsen et al., 2007), the emergent nature of virulence is also no surprise, for the evolution of life itself can be viewed as an emergent process (Corning, 2002). Even in relatively well-circumscribed systems such as Darwin’s finches on the Galápagos Islands, evolutionary trends over time became increasingly unpredictable as a consequence of environmental fluctuations (Grant and Grant, 2002).

Consequences of the Emergent Nature of Microbial Virulence

The fact that virulence is an emergent property of host, microbe, and their interaction has profound consequences for the field of microbial pathogenesis, for it implies that the outcome of host–microbe interaction is inherently unpredictable. Even with complete knowledge of microbes and hosts, the outcome of all possible interactions cannot be predicted for all microbes and all hosts. Lack of predictability should not be unduly discouraging. Even in systems in which emergent properties reveal novel functions, such as fluid surface tension and viscosity, recognition of these properties can be useful. For example, molecular structure

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

might not predict the hydrodynamics of a fluid, but the empirical acquisition of information can be exploited to optimize pipeline diameter and flow rates. Novelty is unpredictable but novel events can be interpreted and comprehended once they have occurred (Ablowitz, 1939). A pessimist might argue that living systems are significantly more complex than flowing liquids. However, such pessimism may be unwarranted. The appearance of new influenza virus strains every year is an emergent property resulting from high rates of viral mutation and host selection of variants (Lofgren et al., 2007). Hence, the time or place in which new pandemics will arise or the relative proportion of strains that will circulate each year cannot be predicted with certainty. Nevertheless, the likely appearance of new strains can be estimated from the history of population exposure to given strains and knowledge of recently circulating strains, and this information can be used to formulate the next year’s vaccine.

A Probabilistic Framework

Although the field of infectious diseases may never achieve the predictive certainty achieved in other branches of medicine, it may be possible to develop a probabilistic framework for the identification of microbial threats. Although all known pathogenic host–microbe interactions have unique aspects, and it is challenging to extrapolate from experiences with one microbe to another, a probabilistic framework can incorporate extant information and attempt to estimate risks. For example, the paucity of invasive fungal diseases in mammalian populations with intact immunity has been attributed to the combination of endothermy and adaptive immunity (Robert and Casadevall, 2009). This notion could be extrapolated to other environmental microbes, i.e., those that cannot survive at mammalian temperatures have a low probability of emerging as new human pathogens. On the other hand, the identification of known virulence determinants in new bacterial strains may raise concern. In this regard, the expression of anthrax toxin components in Bacillus cereus produces an anthrax-like disease that is not caused by Bacillus anthracis (Hoffmaster et al., 2004).

Given the experience of recent decades, we can predict with confidence that new infectious diseases are likely to continue to emerge and make some general predictions about the nature of the microbes that could constitute these threats. One possibility is that an emergent pathogen could come from elsewhere in the animal kingdom. A comprehensive survey revealed that three-fourths of emerging pathogens are zoonotic (Taylor et al., 2001). Crossing the species barrier can result in particularly severe pathology, as pathogen and host have not had the opportunity to co-evolve toward equilibrium. Another good bet is that an RNA virus could emerge as a pathogen. The high mutation rate and generally broad host range of RNA viruses may favor species jumps (Woolhouse et al., 2005), and many emergent human pathogens belong to this group, e.g., HIV, H5N1 influenza, SARS coronavirus, Nipah virus, and hemorrhagic fever viruses. On

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

the other hand, global warming could hasten the emergence of new mammalian pathogenic fungi through thermal adaptation (Garcia-Solache and Casadevall, 2010), given that the relative resistance of mammals to fungal diseases has been attributed to a combination of higher body temperatures and adaptive immunity (Bergman and Casadevall, 2010; Robert and Casadevall, 2009).

Despite abandoning hopes for certainty and determinism in predicting microbial pathogenic interactions, we can attempt to develop a probabilistic framework that endeavors to estimate the pathogenic potential of a microbe based on lessons from known host–microbe interactions. A variety of mathematical models based on game theory or quantitative genetics have been developed in attempts to understand the evolution of virulence (Boots et al., 2009; Day and Proulx, 2004). These have provided interesting new insights into host–pathogen interactions, including the tendency for evolutionary dynamics to produce oscillations and chaos rather than stable fitness-maximizing equilibria, the unpredictability that results when multiple games are played simultaneously, and the tendency for three-way co-evolution of virulence with host tolerance or resistance to select for greater virulence and variability (Carval and Ferriere, 2010; Hashimoto, 2006; Nowak and Sigmund, 2004).

Preparing for the Unpredictable

Emerging infections seem to be becoming more frequent, and it is not difficult to understand why. An interesting experimental system examining a viral pathogen of moth larvae demonstrated that host dispersal promotes the evolution of greater virulence (Boots and Mealor, 2007). When hosts remain local, this encourages more “prudent” behavior by pathogens, but host movement encourages more infections and greater disease severity (Buckling, 2007). Global travel in the modern world can rapidly spread pathogenic microbes, but what is less obvious is that travel may also enhance virulence. Other factors contributing to the emergence and re-emergence of new pathogens include changes in land use, human migration, poverty, urbanization, antibiotics, modern agricultural practices, and other human behaviors (Cleaveland et al., 2007; IOM, 1992). Microbial evolution and environmental change, anthropogenic or otherwise, will continue to drive this process. Another implication of the emergent nature of virulence is recognition of the hubris and futility of thinking that we can simply target resources to the human pathogens that we already know well. The discovery of HIV as the cause of AIDS (Barre-Sinoussi et al., 1983) was greatly facilitated by research on avian and murine retroviruses that had taken place decades before (Hsiung, 1987), at a time when the significance of retroviruses as agents of human disease was unknown.

We share the view that sentinel capabilities are more important than predictive models at the present time (Barre-Sinoussi et al., 1983; Hsiung, 1987), but are optimistic that it will be possible to develop general analytical tools that can

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

be applied to provide probabilistic assessments of threats from future unspecified agents. Comparative analysis of microbes with differing pathogenic potential and their hosts could provide insight into those interactions that are most likely to result in virulence. Hence, the best preparation for the unexpected and unpredictable nature of microbial threats will be the combination of enhanced surveillance with a broad exploration of the natural world to ascertain the range of microbial diversity from which new threats are likely to emerge.


Ablowitz R (1939) The theory of emergence. Phil Sci 6: 1–16.

Barre-Sinoussi F, Chermann JC, Rey F, Nugeyre MT, Chamaret S, et al. (1983) Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS). Science 220: 868–871.

Baylis CA (1929) The philosophic functions of emergence. Philos Rev 38: 372–384.

Beeson PB (1980) Some diseases that have disappeared. Am J Med 68: 806–811.

Bergman A, Casadevall A (2010) Mammalian endothermy optimally restricts fungi and metabolic costs. MBio 1: 00212–10.

Bhalla US, Iyengar R (1999) Emergent properties of networks of biological signaling pathways. Science 283: 381–387.

Biron CA, Casadevall A (2010) On immunologists and microbiologists: ground zero in the battle for interdisciplinary knowledge. MBio 1: e00280–10.

Blehert DS, Hicks AC, Behr M, Meteyer CU, Berlowski-Zier BM, et al. (2009) Bat white-nose syndrome: an emerging fungal pathogen? Science 323: 227.

Boots M, Best A, Miller MR, White A (2009) The role of ecological feedbacks in the evolution of host defence: what does theory tell us? Philos Trans R Soc Lond B Biol Sci 364: 27–36.

Boots M, Mealor M (2007) Local interactions select for lower pathogen infectivity. Science 315: 1284–1286.

Buckling A (2007) Epidemiology. Keep it local. Science 315: 1227–1228.

Carval D, Ferriere R (2010) A unified model for the coevolution of resistance, tolerance, and virulence. Evolution 64: 2988–3009.

Casadevall A, Pirofski L (1999) Host-pathogen interactions: redefining the basic concepts of virulence and pathogenicity. Infect Immun 67: 3703–3713.

Casadevall A, Pirofski L (2001) Host-pathogen interactions: the attributes of virulence. J Infect Dis 184: 337–344.

Cleaveland S, Haydon DT, Taylor L (2007) Overviews of pathogen emergence: which pathogens emerge, when and why? Curr Top Microbiol Immunol 315: 85–111.

Corning PA (2002) The re-emergence of ‘emergence’: a venerable concept in search for a theory. Complexity 7: 18–30.

Daszak P, Berger L, Cunningham AA, Hyatt AD, Green DE, et al. (1999) Emerging infectious diseases and amphibian population declines. Emerg Infect Dis 5: 735–748.

Day T, Proulx SR (2004) A general theory for the evolutionary dynamics of virulence. Am Nat 163: E40–E63.

Dethlefsen L, McFall-Ngai M, Relman DA (2007) An ecological and evolutionary perspective on human-microbe mutualism and disease. Nature 449: 811–818.

Dubnau D, Losick R (2006) Bistability in bacteria. Mol Microbiol 61: 564–572.

Fang FC, Casadevall A (2011) Reductionistic and holistic science. Infect Immun 79: 1401–1414.

Garcia-Solache MA, Casadevall A (2010) Global warming will bring new fungal diseases for mammals. MBio 1: e00061–10.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Grant PR, Grant BR (2002) Unpredictable evolution in a 30-year study of Darwin’s finches. Science 296: 707–711.

Hashimoto K (2006) Unpredictability induced by unfocused games in evolutionary game dynamics. J Theor Biol 241: 669–675.

Henpel CG, Oppenheim P (2011) Studies in the logic of explanation. Phil Sci 15: 135–175.

Hoffmaster AR, Ravel J, Rasko DA, Chapman GD, Chute MD, et al. (2004) Identification of anthrax toxin genes in a Bacillus cereus associated with an illness resembling inhalation anthrax. Proc Natl Acad Sci U S A 101: 8449–8454. Taylor LH, Latham SM, Woolhouse ME (2001) Risk factors for human disease emergence. Philos Trans R Soc Lond B Biol Sci 356: 983–989.

Hsiung GD (1987) Perspectives on retroviruses and the etiologic agent of AIDS. Yale J Biol Med 60: 505–514.

IOM (1992) Emerging infections: microbial threats to the United States. Washington (D.C.): Institute of Medicine.

Lofgren E, Fefferman NH, Naumov YN, Gorski J, Naumova EN (2007) Influenza seasonality: underlying causes and modeling theories. J Virol 81: 5429–5436.

Mazzocchi F (2008) Complexity in biology. Exceeding the limits of reductionism and determinism using complexity theory. EMBO Rep 9: 10–14.

Nowak MA, Sigmund K (2004) Evolutionary dynamics of biological games. Science 303: 793–799.

Parrish JK, Viscido SV, Grumbaum D (2011) Self organized fish schools: an example of emergent properties. Biol Bull 202: 296–305

Ponge JF (2005) Emergent properties from organisms to ecosystems: towards a realistic approach. Biol Rev Camb Philos Soc 80: 403–411.

Posfai G, Plunkett G, III, Feher T, Frisch D, Keil GM, et al. (2006) Emergent properties of reduced-genome Escherichia coli. Science 312: 1044–1046.

Pounds JA, Bustamante MR, Coloma LA, Consuegra JA, Fogden MP, et al. (2006) Widespread amphibian extinctions from epidemic disease driven by global warming. Nature 439: 161–167.

Quinn RW (1989) Comprehensive review of morbidity and mortality trends for rheumatic fever, streptococcal disease, and scarlet fever: the decline of rheumatic fever. Rev Infect Dis 11: 928–953.

Robert VA, Casadevall A (2009) Vertebrate endothermy restricts most fungi as potential pathogens. J Infect Dis 200: 1623–1626.

Tognotti E (2009) The rise and fall of syphilis in Renaissance Europe. J Med Humanit 30: 99–113.

Van Regenmortel MH (2004) Reductionism and complexity in molecular biology. Scientists now have the tools to unravel biological and overcome the limitations of reductionism. EMBO Rep 5: 1016–1020.

Veening JW, Smits WK, Kuipers OP (2008) Bistability, epigenetics, and bet-hedging in bacteria. Annu Rev Microbiol 62: 193–210.

Welsh RM, Che JW, Brehm MA, Selin LK (2010) Heterologous immunity between viruses. Immunol Rev 235: 244–266.

Woolhouse ME, Haydon DT, Antia R (2005) Emerging pathogens: the epidemiology and evolution of species jumps. Trends Ecol Evol 20: 238–244.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.



Jennifer L. Gardy7

Outbreak Investigation: A Brief Primer

In public health, we are often confronted with the task of “solving” an infectious disease outbreak—identifying all the cases, determining a source of the illness, and deploying an intervention to prevent further cases. A typical scenario unfolds as follows. A potential outbreak alert is issued when routine laboratory or population-based surveillance methods detect a statistically significant increase in case counts relative to historical norms for a particular disease, or when an astute clinician or public health official notes an unusual clustering of cases. This alert triggers an initial investigation combining descriptive epidemiology with laboratory work. Epidemiologists use interviews and questionnaires to review case data, such as travel history, food exposures, and attendance at social events, with the goal of revealing common behaviours across cases—eating the same food items, visiting the same locations, or shared contact with a particular individual.

At the same time, microbiologists carry out their own epidemiological investigation using genotyping techniques. Similar to the genetic fingerprinting methods used in paternity testing or in forensic crime scene analysis, these “molecular epidemiology” tools, including pulsed-field gel electrophoresis (PFGE) and multi-locus sequencing typing (MLST), can quickly reveal whether a collection of bacterial specimens share a common genetic fingerprint and likely represent a true outbreak, or whether they display a range of genotypes and simply reflect an unusual excess of cases of that particular illness, none of which are related to each other.

The results of the descriptive epidemiology and molecular epidemiology investigations are then compared, and a determination is made as to whether the cluster of cases is truly an outbreak meriting further investigation. If this is indeed the case, then a more robust field epidemiological investigation is typically undertaken. This includes enhanced case-finding using more detailed survey instruments as well as case-control studies in which behaviours of cases are compared to those of controls in order to quantify risk factors strongly associated with illness. Through these analyses, investigators are able to form and test a specific hypothesis regarding the source of the outbreak. Laboratory work is also critical at this stage—new cases are genotyped to determine whether they are part of the outbreak, while genotyping of isolates collected from food, water,


7 Senior Scientist, Molecular Epidemiology, British Columbia Centre for Disease Control.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

and other non-human sources can confirm or rule out these entities as potential sources of the outbreak.

Once a source of the outbreak has been confirmed, intervention measures can be put in place. For food- or water-borne outbreaks, these typically involve issuing a recall for the food item in question, eliminating access to the water source until it has been declared safe, and issuing extensive media alerts warning consumers of the risks associated with the entity in question. For outbreaks involving personal contact or attendance at a shared location, such as a specific hospital ward, active case-finding is used to find and treat all infected patients or potential carriers of an illness, while infection control approaches such as patient decolonization or enhanced cleaning are deployed to prevent further infections.

Unfortunately, not every outbreak can be neatly resolved. A number of factors greatly limit public health’s ability to investigate an outbreak from both the field epidemiology and molecular epidemiology perspectives (Figure A3-1). Field investigations are typically limited by resources—not having enough personnel, time, or money to be able to effect a complete investigation—and patients’ inability to recall specific events that might be relevant to the investigation. Molecular epidemiology approaches are also limited in their utility. For some pathogens, such as Salmonella Enteritidis, unrelated isolates from multiple outbreaks may show identical genetic fingerprints. For others, such as Campylobacter jejeuni, one outbreak may comprise multiple distinct genetic fingerprints due to frequent rearrangement of the pathogen’s genome. Genotyping typically requires the


FIGURE A3-1 An example demonstrating how the limitations of field and molecular epidemiology complicate outbreak reconstructions. Panel A shows the “true” outbreak scenario—two different genotypes of pathogen are found in the hosts (white and grey circles), and arrows indicate person-to-person transmission events. Panel B shows what a reconstruction of that outbreak might look like using current tools. In this situation, the molecular epidemiology technique applied was not able to identify the different genotypes, and all isolates were grouped into a single cluster, as often happens in the case of clonal pathogens such as Salmonella Enteritidis. Patients’ inability to recall specific contacts means that several transmission events are missed.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

organism in question to be cultured, which may add several weeks to an investigation in the case of slow-growing organisms such as Mycobacterium tuberculosis, and the costs of many molecular epidemiology assays are not insignificant, meaning they are often not routinely deployed.

One of the biggest limitations of current molecular epidemiology methods is the low level of resolution they provide. At best, such tools are only capable of determining whether or not an isolate belongs to an outbreak cluster. Further detail, such as the order of person-to-person transmission, the underlying pattern of spread—superspreader or ongoing chains of transmission—is beyond the scope of current laboratory methods.

An Illustrative Example

In May 2006, a case of pleural tuberculosis (TB) was diagnosed in an adult female in a medium-sized community in British Columbia, Canada. Although pleural TB is suggestive of recently acquired disease, inquiring after the case’s contacts did not suggest a potential source for her illness. Molecular analysis using a TB-specific technique called mycobacterial interspersed repetitive unit variable number tandem repeat (MIRU-VNTR) was performed. In MIRU-VNTR, 24 variable number tandem repeat loci around the M. tuberculosis genome are amplified using polymerase chain reaction (PCR), followed by capillary electrophoresis to enumerate the number of repeats present at each locus. The patient’s MIRU-VNTR genotype indicated she harboured the same strain of TB that had been circulating in her community for several years. She was assumed to represent one of the few annual cases of TB that community regularly observed in a year and was treated for her illness.

Some months later, a second case of TB was reported in the community, this time in an infant female with no epidemiological link to the May case. TB in children is considered to be a marker for recent community transmission, and when MIRU-VNTR revealed the infant to be infected with the same strain of TB as the earlier case, the local public health nurses began an intensive case-finding effort. Using an approach called reverse contact tracing, they identified individuals who had been in contact with the infant and screened them using a tuberculin skin test in an attempt to find out who had been the source of the child’s infection. This investigation led to the diagnosis of nine more cases of active TB in the community, and an outbreak was declared.

Extensive investigation soon followed. Each case was interviewed using a detailed questionnaire, and the resulting data—connections between individuals who reported social relationships with each other, links between people and the places they regularly spent time at, and links between people and specific behaviours associated with an increased risk of TB infection, such as smoking, alcohol use, or drug use—was visualized as a network. The network suggested a potential source for the outbreak—an individual who had been symptomatic

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

and undiagnosed for many months prior to detection of the first case, who had a number of risk factors, and who was a high-degree node in the network—they reported contact with many of the cases.

Although the investigation revealed the likely source case, who was immediately put on treatment, the outbreak eventually grew to include 41 individuals over the 2006–2008 period, with a handful of subsequent diagnoses from 2009 onward.

Attempting to reconstruct the path the organism took through the community proved to be impossible. Despite the rich epidemiological and clinical data available, the social network structure in the community was too dense to interpret—each individual case had an average of six contacts with other cases, and most everyone in the community reported spending time at the same locations, including a series of hotel pubs and crack houses, meaning there were many potential sources for each person’s infection (Figure A3-2). All cases had identical 24-loci MIRU-VNTR patterns, but the low resolution of this technique was incapable of identifying smaller subclusters within the outbreak.

While the outbreak eventually abated, our inability to reconstruct individual transmission events meant that an important learning opportunity was missed. We could not describe the underlying pattern of disease spread in the community, we could not compare it to other TB outbreaks to determine whether this organism behaves in a similar way across different outbreaks in different communities with


FIGURE A3-2 The dense social network in the outbreak community complicated outbreak reconstruction attempts. Circular nodes represent outbreak cases—grey nodes are individuals with smear-positive tuberculosis; white nodes are individuals with smear-negative disease. Solid lines connect two cases that reported a social contact with each other.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

different social network structures, and we could not use our experience to guide future TB outbreak investigations.

This uncertainty about how an outbreak unfolds is not unique to tuberculosis. For the majority of communicable diseases, our understanding of how they behave “in the wild” is limited. Unfortunately, this lack of understanding of pathogens’ natural transmission tendencies precludes developing any sort of proactive evidence-based interventions. We do not know whether there are “one-size-fits-all” interventions for a given pathogen or a family of disease, or whether each outbreak is unique and will require a specifically tailored intervention.

The Rise of the Next Generation

We will return to the tuberculosis story in time; for now, we must climb into our microbial genomics time machine and rewind several years…. The complete genome has sometimes been described as “the ultimate genotype”—examining the total genetic content of an organism reveals the unique fingerprint that sets each of us apart from the other members of our species. Until recently, however, interrogating the complete genome of anything larger than a virus required a significant investment of time, money, and analytical resources. Sequencing of the first bacterial genome, Haemophilus influenzae, in 1995 took more than a year, cost nearly USD$1 million, and involved a large team of researchers running a not insignificant number of DNA sequencers. Subsequent microbial genomics efforts targeted individual bacteria selected to represent a range of common laboratory strains and interesting clinical isolates; experiments reporting the sequencing of more than one isolate were relatively rare and certainly outside the scope of most research groups’ technological abilities.

Tracking the number of bacterial genome projects recorded in the Genomes Online Database (GOLD) reveals that after approximately a decade of steady progress in microbial genomics, a sudden and dramatic upswing in the number of sequenced genomes began around 2006 (Figure A3-3). This sea change coincides with the commercial release of the so-called “next-generation” DNA-sequencing technologies. Previously, DNA sequencing was performed using the Sanger method, originally developed by Frederick Sanger in 1977, although subsequently modified to improve throughput. Next-generation sequencing methods, including the pyrosequencing platform commercialized by Roche, a reversible terminator platform marketed by Illumina, and the newer ion semiconductor-based approach available through Life Technologies, all take a fundamentally different approach to sequencing. They are based on the concept of “sequencing by synthesis,” in which DNA synthesis is essentially observed in real-time, with the sequencing instrument using one of the above technologies to extend a template base by base, and record which base was added at each step. Although the reads produced by these technologies are much shorter than those resulting from Sanger sequencing, the sheer magnitude of the parallel sequencing made possible by these approaches

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


FIGURE A3-3 The number of bacterial genome projects recorded in the Genomes Online Database (GOLD) increased exponentially with the introduction of next-generation sequencing methods in the mid-2000s.

means that next-generation sequencing runs generate orders of magnitude more data in a single run than Sanger sequencers are capable of.

With these new platforms, the cost of sequencing a complete bacterial genome dropped dramatically. Now, many large genome centres operating optimized pipelines and high-volume sequencers are able to offer their clients full bacterial genome sequences for between USD$50-250 per genome. Run times can be as little as a few hours for certain platforms, meaning it is now possible to sequence tens or even hundreds of bacterial genomes within a week for only a few thousand dollars.

A New Tool: Genomic Epidemiology

Soon after the commercialization and adoption of next-generation sequencing technologies, a few astute clinicians and infectious disease researchers recognized the technology’s potential for transforming molecular epidemiology. In a wonderful example of convergent evolution, several independent research groups around the world embarked upon proof-of-concept projects in the new area of “genomic epidemiology,” with the first few papers in the field appearing in 2010 and 2011.

The basic premise behind genomic epidemiology is that the microevolutionary events occurring within a pathogen’s genome over the course of an outbreak can be used as markers of transmission. For example, consider an outbreak in which the first patient is colonized with a bacterium having the genome sequence AAAAA. Any individuals infected by that person would then be colonized with bacteria having the same genome sequence. As a result of the natural process of mutation, many of these second-generation organisms will accrue a small number of nucleotide changes (the number depends on the duration of infection

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

and the natural mutation rate of the pathogen in question). If we have three second-generation patients, one might accrue no mutations and continue to display the AAAAA genome sequence, one might show the sequence ACAAA, and one might contain the sequence AAAAG. The third generation of cases, those infected by these individuals, will then show genome sequences identical to or descended from these second-generation cases. By sequencing the genomes of all the outbreak organisms and identifying positions that vary over the course of the outbreak, one should, in theory, be able to infer the individual transmission events that gave rise to the outbreak (Figure A3-4).

The words “in theory” are very important in this case. The first two studies to use genomics to identify person-to-person transmission events both revealed that the answers are not so readily forthcoming.

In the first study (Lewis et al., 2010), genome sequence was obtained for six multidrug-resistant Acinetobacter baumannii isolates from a hospital outbreak occurring over a seven-week period—four from military patients and two from


FIGURE A3-4 Using microevolutionary events to track person-to-person spread of a pathogen over a social network. As a pathogen spreads over a contact network, the accrual of mutations can be used to trace person-to-person transmission. When a mutation arises in one person, such as the C in position two of the second-generation female case in this example, it will be transmitted onwards to those individuals that case infects. Isolates may transmit without accruing mutations, as in the AAAAA sequence maintained here across three generations of illness. This can complicate reconstructions if the underlying contact data are unknown—if no contact information was available for this network, the genomic data would suggest that the third-generation AAAAA female case could have been infected by the first- or second-generation AAAAA cases.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

civilian patients. The hope was that the study would reveal how the bacterium was transferred from the military patients—who were presumed to have been infected in the field prior to hospital admission—to the civilian patients. Three positions across the approximately 4-megabase pair genome were found to vary between isolates, with patients showing four genotypes at these positions: CAG, TAG, TAT, and TTG. One of the civilian patients shared a genotype with two of the military patients, suggesting that one of these military patients was the source of the civilian’s infection. Examination of this hypothesis in the context of the available epidemiological information revealed that one of the military patients was housed in the bed next to the civilian, making him or her the most likely source. The other civilian patient displayed a unique genotype, and the study authors were not able to infer the source of that individual’s infection.

In the second study (Gardy et al., 2011), genomics was used to reconstruct the tuberculosis outbreak described earlier in this paper. Genome sequence was obtained for 36 M. tuberculosis isolates—32 from outbreak cases and 4 from patients diagnosed in the same community in the decade prior to the outbreak, all of which had the same 24-loci MIRU-VNTR fingerprint as the outbreak cases. More than 200 single nucleotide polymorphisms (SNPs) were found among the isolates, and the authors realized the nature of TB infection meant it would be impossible to trace the outbreak’s path SNP by SNP, as was done by Lewis et al. In TB, an individual may be infectious for a period of many months, during which time the colonizing organism is continuing to accrue mutations. If the individual transmits the disease to an individual on day 1 of his or her illness, to another individual on day 180, and to a third individual on day 270, and is diagnosed and his or her organism sampled at day 300, his or her isolate might show similarity to the isolate in patient 3, but could be very different from the isolate in patient 1, making it difficult to ascribe patient 1’s disease to the source case. The variable periods of latency associated with TB further complicate SNP-by-SNP reconstruction of transmission.

Instead, the authors used a phylogenetic tree of the data to demonstrate that two separate lineages of M. tuberculosis—labeled A and B—could be resolved within the single MIRU-VNTR genotype. Thus the genomic data acted as a sort of enhanced genotyping method, able to break one MIRU-VNTR cluster containing all the isolates into two distinct genome-based clusters, A and B. Although the original social network describing the relationships between all the outbreak cases was too complex to resolve, when it was broken down into two networks—one showing connections between A cases and one showing connections between B cases—the data became much more interpretable and several person-to-person transmission events could be identified. This revealed that several key individuals acting as superspreaders were associated with the majority of transmission events and that factors including delays in diagnosis, clinical presentation, and risk behaviours contributed toward these individuals’ role as sources of infection. Not every transmission event could be identified, however, and the genomic data also

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

suggested that some individuals in the network might have exhibited coinfection with both an A and a B strain.

Best Practices and Future Directions

The earliest genomic epidemiology studies suggested that when combined with epidemiological and clinical data, whole genome sequencing has the potential to inform reconstructions of communicable disease outbreaks. Since the publication of the first few studies, several other papers have described using whole genome sequencing to solve outbreaks of other organisms, including Clostridium difficile, methicillin-resistant Staphylococcus aureus, and Klebsiella pneumoniae. Projects are becoming larger and more ambitious, sequencing hundreds of isolates collected across large regions over many years, and the number of outbreak reconstructions available for an individual pathogen is growing as well.

As this emerging field continues to find its place in the realm of public health microbiology, it is important to note several “best practices” that must be considered when doing such a study.

1. Genomic data alone cannot reliably identify individual transmission events. The genomic data must be combined with epidemiological and clinical information if a plausible reconstruction of an outbreak is to be achieved.

2. The bioinformatics methods for identifying positions of variation across a series of isolates are not perfect. The results of any analysis must be carefully examined to ensure that errors in alignment or inappropriate scoring thresholds are not causing variants to be erroneously called or missed.

3. The data must be considered in terms of biological plausibility. An expected level of variation over an outbreak can be inferred from organisms’ mutation rates. If the observed variation is much less or much greater than the expected variation, then the analysis used to generate that data must be reexamined.

4. For others to evaluate a study’s accuracy and reproduce the results, the raw sequencing data for each isolate should be made freely available in a public repository. Manuscripts describing genomic epidemiology studies should include, as appendices, the analysis commands used to generate the data and a detailed description of how the data were filtered and processed after genome assembly and SNP calling.

5. There is a significant amount of interesting biology that can be mined from outbreak-derived genome sequences, particularly in the area of population genetics. To maximize the value of a sequencing data set, it is well worth identifying academic partners who can use a study data set for further analyses.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

As more and more genomic epidemiology projects are undertaken, the natural behaviour of pathogens in the wild will slowly be revealed. We will know much more about their spatial and temporal patterns of spread, whether superspreading is as common as early reports are indicating, and the factors that influence an individual’s tendency to spread disease. It will then be up to public health agencies to use this valuable information to develop evidence-based interventions—for example, directing case-finding efforts around contacts of potential superspreaders, or designing prevention programs targeted to specific high-risk communities or individuals. In our own work, we are sequencing 20 years worth of TB in a single Canadian province to identify province-wide transmission routes, community-level transmission events, and socioeconomic and clinical risk factors for acting as a source or sink community for disease. It is our hope that the resulting data will allow us to reshape our current TB prevention and control programs, enabling us to use our limited resources for maximum effect.

As sequencing technologies improve—generating longer reads at lower costs—and as bioinformatics methods become more reliable at identifying variation, we anticipate even more accurate and detailed outbreak reconstructions. The coming decade will be an exciting time for genomic epidemiology as it moves from proof of concept to a routine component of clinical practice.


Gardy, J. L., J. C. Johnston, S. J. Ho Sui, V. J. Cook, L. Shah, E. Brodkin, S. Rempel, R. Moore, Y. Zhao, R. Holt, R. Varhol, I. Birol, M. Lem, M. K. Sharma, K. Elwood, S. J. M. Jones, F. S. L. Brinkman, R. C. Brunham, and P. Tang. 2011. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. New England Journal of Medicine 364:730-739.

Lewis, T., N. J. Loman, L. Bingle, P. Jumaa, G. M. Weinstock, D. Mortiboy, and M. J. Pallen. 2010. High-throughput whole-genome sequencing to dissect the epidemiology of Acinetobacter baumannii isolates from a hospital outbreak. Journal of Hospital Infection 75(1):37-41.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.



Elodie Ghedin,9,* Edward C. Holmes,10,11
Jay V. DePasse,8Lady Tatiana Pinilla,12Adam Fitch,8
Marie-Eve Hamelin,13Jesse Papenburg,12Guy Boivin11,*


A small proportion (1-1.5%) of 2009 pandemic A/H1N1 influenza viruses (A(H1N1)pdm09) are oseltamivir-resistant, due almost exclusively to a H275Y mutation in the neuraminidase protein. However, many individuals infected with resistant strains had not received antivirals. Whether drug-resistant viruses are initially present as minor variants in untreated subjects before they emerge as the dominant strain in a virus population is of great importance for predicting the speed at which resistance will arise. To address this issue, we employed ultra-deep sequencing of viral populations from serial nasopharyngeal specimens from an immunocompromised child and from two individuals in a household outbreak. We observed that the Y275


8 Reprinted with permission by permission of Oxford University Press. Originally published as Ghedin, E, et al. (2012) Presence of oseltamivir-resistant pandemic A/H1N1 minor variants before drug therapy with subsequent selection and transmission. Journal of Infectious Diseases 206(10), 1504-1511. doi: 10.1093/infdis/jis571.

9 Department of Computational & Systems Biology, Center for Vaccine Research, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA.

10 Center for Infectious Disease Dynamics, Department of Biology,
     The Pennsylvania State University, University Park, PA 16802, USA.

11 Fogarty International Center, National Institutes of Health, Bethesda, MD 20892, USA.

12 Centre de Recherche du Centre Hospitalier Universitaire de Québec and Université Laval, Québec City, Québec, Canada.

13 McGill University Health Centre, Montréal, Québec, Canada.

Conflict of interest statement: E.G., E.C.H., J.V.D., T.P., A.F., M.-E.H., J.P.: No conflicts

G.B.: Research grant from GlaxoSmithKline

Funding statement: This work was supported in part by National Institute of Allergy and Infectious Diseases at the National Institutes of Health [grant number U54 GM088491]; National Institute of General Medical Sciences at the National Institutes of Health [grant number 2R01 GM080533-06 to E.C.H.] and the Canadian Institutes of Health Research [to G.B.].

* Corresponding authors: Dr. Elodie Ghedin, Center for Vaccine Research, University of Pittsburgh School of Medicine, 3501 5th Avenue, BST3 Room 9043b, Pittsburgh, PA 15261. Phone: (412) 383-5850. E-mail: Dr. Guy Boivin, Centre de Recherche en Infectiologie, CHUL, room RC-709, 2705 Laurier, Québec City, QC, Canada. E-mail:

Key words: Influenza; A(H1N1)pdm09; drug resistance; deep sequencing; oseltamivir.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

mutation was present as a minor variant in infected hosts prior to onset of therapy. We also found evidence for the transmission of this drug-resistant variant alongside drug-sensitive viruses. These observations provide important information on the relative fitness of the Y275 mutation in the absence of oseltamivir.

The 2009 pandemic A/H1N1 influenza virus (A(H1N1)pdm09) emerged following reassortment between two swine viruses circulating in North America and Eurasia (Garten et al., 2009). Between 1 and 1.5% of A(H1N1)pdm09 strains analyzed to date have been found to be resistant to oseltamivir, a neuraminidase (NA) inhibitor that constitutes the current standard of care (Pizzorno et al., 2011a). Virtually all oseltamivir-resistant A(H1N1)pdm09 viruses contain an H275Y amino acid substitution in the viral NA gene (Pizzorno et al., 2011b). Among the drug-resistant strains recovered from immunocompetent patients, approximately one-third have been recovered from untreated individuals (WHO, 2011). Whether drug-resistant variants are initially present as minor variants in untreated subjects due to transmission from a host harboring a minority drug-resistant population, or whether they emerge following de novo replication, is of great importance for predicting the speed at which resistance will arise: the selection of resistant mutations will occur more rapidly if they are already present within hosts as pre-existing minor variants (Bonhoeffer and Nowak, 1997). In addition, the presence (or not) of the H275Y mutation in pre-treatment samples provides important information on the relative fitness of drug resistance mutations in the absence of oseltamivir.

To determine whether the H275Y mutation is present as a minor variant within hosts infected with influenza A virus, we performed ultra-deep sequencing of viral populations from nasopharyngeal specimens of two sets of individuals infected with A(H1N1)pdm09 viruses. First, we examined longitudinal samples collected from an immunocompromised child who remained infected for more than 6 weeks, during which time a drug-resistant strain came to dominate the virus population. Second, we analyzed the emergence of oseltamivir-resistant viruses in an household outbreak of A(H1N1)pdm09 infections in which the contact case developed influenza symptoms 24 hours after starting post-exposure oseltamivir prophylaxis (Baz et al., 2009).

Materials and Methods

Study 1: Immunocompromised Child

A 31-month-old boy weighing 13.4 kg, diagnosed three months earlier with medulloblastoma, was admitted on January 5, 2011, for consolidation chemotherapy in preparation for the first of 3 consecutive autologous bone marrow transplants (ABMT). On admission, the child presented rhinorrhea and mild cough

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

but was afebrile. Members of his immediate family, including his older sister and his father, had cold-like symptoms 1-2 weeks prior; none of the family members, including the patient, had received the 2010-11 influenza vaccine, the monovalent A(H1N1)pdm09 vaccine or any antiviral drug. A nasopharyngeal aspirate (NPA) collected on admission was positive for the A(H1N1)pdm09 virus by real-time RT-PCR (Semret M et al., 2009) and by viral culture on A549 and Mink lung cells. Treatment with oseltamivir (30 mg, twice daily) was started on January 6. The following day, the patient developed fever (max. 39.2°C), coincident with dropping neutrophil counts. The child received his first ABMT on January 10. NPA specimens collected throughout admission remained positive for A(H1N1) pdm09 influenza virus by RT-PCR (Table A4-1). Oseltamivir therapy was continued during the hospitalisation and after discharge on January 22. The patient was readmitted from January 27 to February 14, 2011 for his second ABMT. A NPA specimen collected on January 28 was positive for A(H1N1)pdm09 by RT-PCR. Because of persistent viral excretion, oseltamivir was replaced by zanamivir (25 mg inhaled four times daily) on February 1 and continued until negative RT-PCR results on February 17. The patient received a third ABMT on February 18 and he recovered from his influenza infection without complications.

Study 2: Transmission in Household

A detailed description of the familial cluster of infections with A(H1N1) pdm09 virus has been reported elsewhere (Baz et al., 2009). Briefly, a 13-year-old asthmatic male developed infection with A(H1N1)pdm09 confirmed by RT-PCR testing of a NPA. The child was started on oseltamivir (60 mg twice daily for 5 days) and discharged home the same day. Simultaneously to treatment of the index case, post-exposure oseltamivir prophylaxis (75 mg once daily for 10 days) was prescribed to the 59-year-old father with chronic obstructive pulmonary disease. Approximately 24 hours after beginning oseltamivir prophylaxis, the father developed influenza-like symptoms. On day 8 of oseltamivir prophylaxis, he consulted his general practitioner for persistent cough. An NPA collected at that time was positive by RT-PCR and by culture for A(H1N1)pdm09. The father had an uneventful clinical course, and an NPA sampled at the end of his illness was negative. The son’s A(H1N1)pdm09 isolate collected before oseltamivir therapy was susceptible to oseltamivir (50% inhibitory concentration or IC50: 0.27 nM), whereas the father’s A(H1N1)pdm09 isolate was highly resistant to oseltamivir (IC50 > 400 nM). The complete (consensus sequence) virus genomes of the father (GenBank accession FN434454) differed by one amino substitution (H275Y) in the NA protein compared to the virus present in the son (GenBank FN434445).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

TABLE A4-1 Virological testing of nasopharyngeal aspirates sampled from a young boy undergoing autologous bone marrow transplantation and infected with A(H1N1)pdm09 influenza. n.e. = not evaluated. CM1 and CM2 = Culture passages 1 and 2. S = Primary specimen (nasopharyngeal aspirates).

Sample Antiviral therapy (43 days) Multiplex real time diagnostic PCR (pH1N1) Discriminatory real-time RT-PCR H275 % Y275 % Deep Sequencing Phenotypic drug susceptibility
H275 copies/mL ± stdev (%) Y275 copies/mL ± stdev (%) C.I. Coverage #reads
05-01-2011—S1 None Positive 1.78×107 ± 0.71×107 (99.91) 1.44×104 ± 0.29×104 (0.08) 95.7 3.7 1.7 488 n.e.
05-01-2011—CM2 None Positive 3.15×1010± 2.05×1010 (99.99) 3.16×105 ± 3.41×105 (0.001) 96.9 2.7 0.7 1914 Susceptible to oseltamivir, zanamivir, and peramivir
10-01-2011—S2 Oseltamivir Positive 5.99×108 ± 3.15×108 (99.75) 1.53×106 ± 1.26×106 (0.25) 94.6 4.4 1.0 1552 n.e.
17-01-2011—S3 Oseltamivir Positive 4.21×105 ± 4.09×105 (3.13) 1.30×107 ± 6.57×106 (96.87) 2.3 97 0.8 1341 n.e.
20-01-2011—S4 Oseltamivir Positive 2.24×106 ±1.56106 (4.08) 5.26×107 ± 3.67×107 (95.92) 3.3 96 1.0 1170 n.e.
20-01-2011—CM1 Oseltamivir Positive 6.25×105 ± 5.73×105 (4.62) 1.29×107 ± 0.83×107 (95.38) 3.3 96.5 0.8 1838 n.e.
20-01-2011—CM2 Oseltamivir n.e. 0 9.40×109 ± 3.24×109 (100.00) n.e. n.e. n.e. n.e. Resistant to peramivir and oseltamivir, susceptible to zanamivir
28-01-2011—S5 Zanamivir Positive 2.85×104 ± 1.15×104 (16.54) 1.44×105 ± 0.53×105 (83.46) 9.9 90 2.4 131 n.e.
08-02-2011—S6 Zanamivir Positive 8.23×105 ± 4.98×105 (11.53) 6.32×106 ± 3.18×106 (88.47) 6.1 93.9 6.1* 66 n.e.
17-02-2011—S7 Zanamivir Negative 0 0 n.e. n.e. n.e. n.e. n.e.

*Not significant with 95% confidence.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Informed Consent

Written consent was obtained for report of the case described in Study 1. Samples used in Study 2 were obtained as part of an investigation of the Public Health Department of the Ministry of Health, Quebec, Canada.

Clinical Specimens and Viral Culture

In Study 1 (immunocompromised child), 7 NPAs were collected between January 5 and February 17, 2011, for RT-PCR testing (Table A4-1 and Figure A4-1). Viral isolates were also obtained by culture from NPAs sampled on January 5 and January 20. In Study 2 (household transmission), the NPA from the index case (son) was collected prior to oseltamivir treatment whereas the NPA from his father was obtained on day 8 of oseltamivir prophylaxis (Figure A4-1).

NA Inhibition Assay

The drug resistance phenotype to NA inhibitors was determined by NA inhibition assays (Potier et al., 1979). The IC50 values were determined from the dose response curve. A virus was considered resistant to a drug if its IC50 value was 10-fold greater than that of the wild-type (WT) virus (Mishin et al., 2005).

RNA Extraction

Total RNA was extracted from 200 µL of thawed specimen or culture using the MagNA Pure instrument and the MagNA Pure LC total nucleic acid isolation kit (Roche Applied Science) according to the manufacturer’s instructions and stored at –80°C.

Discriminative Real-time PCR Assay

To discriminate between WT and H275Y oseltamivir-resistant strains of A(H1N1)pdm09, a modified version of a previously reported real-time RT-PCR method (van der Vries et al., 2010) was used to test samples. This technique requires a reverse (panN1-H275-sense 5′–cagtcgaaatgaatgcccctaa-3′) and a forward (panN1-H275-antisense 5′–tgcacacacatgtgatttcactag-3′) primer for both the WT and the H275Y viruses and two labelled allele-specific probes: panN1-275H-probe (5′–ttaTCActAtgAggaatga-6-FAM/BHQ-1) and panN1-275Y-probe (5′–ttaTTActAtgAggaatga-HEX/BHQ-1). In the aforementioned probe sequences, locked nucleic acid (LNA) nucleotides are denoted in upper case, DNA nucleotides are denoted in lower case, and the single nucleotide polymorphism (SNP) is underlined. The limits of detection for the assay are 50 copies for the H275Y target and between 10 and 50 copies for the WT target. RT-PCR conditions are available upon request. Data acquisition was performed in both FAM and HEX filters

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.
Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

during the annealing/extension step. Standard curves were constructed using 10-fold serial dilutions of pJET1.2-NA-Y275 and pJET1.2-NA-H275 plasmids.

Sequencing and analysis RNA isolated from two cultured isolates and seven primary specimens collected for Study 1 (Figure A4-1A), and two primary specimens for Study 2 (Figure A4-1B), was subjected to a multisegment RT-PCR (M-RT-PCR) step (Zhou et al., 2009) and random priming with barcoding using the SISPA (sequence independent single primer amplification) protocol (Djikeng et al., 2008). For each RNA sample, we performed two M-RT-PCR reactions using the One Step Superscript III RT kit (Invitrogen). Reactions were purified independently using the Qiagen MinElute kit and quantitated on a Nanodrop spectrophotometer; 100-200 ng of each purified M-RT-PCR reaction was used in two separate SISPA reactions with two different barcode tags for a total of 4 tagged reactions per original RNA sample. Products were then separated on a 1% agarose gel and fragments from 200–400bp purified with the Qiagen MinElute kit. Pooled samples were sent for paired end (PE) library preparation and 100 base sequencing on the Illumina Hi-Seq2000 platform.

The barcoded amplification products were sequenced on one lane of the sequence run. Analyses were performed to reduce the distortion caused by SISPA amplification, account for both PCR and sequencing errors, and provide a “clean” comparison between the mapped reads of the experimental samples. The trimmed reads were mapped to A/Quebec/144147/2009(H1N1) (GenBank accession FN434457-FN434464) using the bowtie short-read aligner (Langmead et al., 2009).

The frequency of each codon observed in the set of mapped reads from each amplification replicate was tabulated across each of the 10 influenza genes. To account for sequence-specific errors (Minoche et al., 2011; Nakamura et al., 2011), the variant counts for the forward and reverse direction reads were calculated separately, and only those variants for which counts were within 50% of each other in both directions were retained. For these summaries, the unique reads from all amplification replicates were pooled and total coverage is reported for each codon site. The proportion of codons expected to differ from the consensus due to background mutation and technical error was estimated from a separate cell culture of the PR8 strain that was otherwise processed in exactly the same manner as the specimens in this study. This proportion, found to be 0.00392, lies well outside of the 95% confidence interval for any variant codon in our study that is (a) represented by more than 4 sequence reads, and (b) found in at least 2% of all sequence reads mapped to that position. The lower limit of the 95% confidence interval determined by computing the inverse of the appropriate cumulative Beta distribution is 0.00813.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


Presence of Drug-Resistant Viruses Before Drug Treatment in an Immunocompromised Child (Study 1)

The results of the NA gene H275Y discriminatory real-time RT-PCR assay performed on the seven primary specimens and the two viral isolates (January 5 and 20) are presented in Table A4-1. In the first NPA collected on January 5 (day 1), prior to antiviral therapy (initiated January 6), 99.9% of the viral population was WT at NA position 275 by our discriminatory assay. Nevertheless, a very small sub-population of H275Y mutant was also detectable (0.08%). The corresponding viral isolate (05-01-2011—CM2 in Table 1) contained 99.9% of WT virus and was susceptible to oseltamivir (IC50=0.77 nM ± 0.02), zanamivir (IC50=0.15 nM ± 0.02), and peramivir (IC50=0.05 nM ± 0.01). Notably, the H275Y mutation could not be detected by conventional RT-PCR and Sanger sequencing in the original sample. A second NPA collected on January 10 (day 6) also demonstrated a predominance of the WT population (99.8%). However, the proportion of the H275Y mutant detected in NPAs collected on January 17, 20, and 28 increased to 96.9, 95.9, and 83.5%, respectively, during continuous oseltamivir treatment. Furthermore, the second passage on Madin Darby canine kidney (MDCK) cells of the January 20 viral isolate (20-01-2011-CM2 in Table A4-1) resulted in 100% H275Y mutant population compared with 95.4% from the primary culture recovered from A549 and Mink lung cells. This viral isolate exhibited an IC50 value of 556.75 nM ± 61.32 for oseltamivir, 0.22 nM ± 0.01 for zanamivir, and 34.81 nM ± 5.77 for peramivir, which indicates a resistance phenotype to oseltamivir and peramivir. Antiviral therapy was changed to zanamivir on February 1st. The February 8 sample contained a predominance of 88.5% of H275Y mutant virus, whereas the last NPA collected on February 17 was negative for A(H1N1)pdm09 by RT-PCR.

A number of the primary specimens (January 5, 10, 17, 20, 28, and February 8; corresponding to samples 1-6 in Figure A4-1A) for which M-RTPCR product could be generated, as well as the viral isolates, were subjected to deep sequencing to better evaluate the genetic diversity of the viral population, including the presence of drug-resistant mutants. Based on the average depth of coverage across each of the virus segments, we highlighted codons represented by at least 2% of the sequence reads covering each position (Table S1). This percentage is conservative enough that, even in low-coverage areas, it excludes potential sequence and PCR errors.

The positions on the NA and NS1 proteins that display evidence for the presence of minor variants at a frequency of 2% or above in more than one of the samples are shown in Figure A4-2. Similar patterns are observed for all other proteins (Table S1). Over time, the ratios of the minor variants to the dominant codon remain relatively stable, except for NA position 275 where a shift of H to Y is apparent on 17 January 2011. The ratios are similar to the ones observed in

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

the real-time discriminatory RT-PCR assays for each of the samples tested (Table A4-1), although values across both assays are not identical. No other position on the NA protein appears to co-vary with the 275Y variant. The same pattern is observed in the culture isolates (05-01-2011-CM2 and 20-01-2011-CM1 in Table S2). However, position 153 in NS1 displays a similar switch, although involving a synonymous mutation (from codon GAG to GAA, for E (glutamic acid)). Hence, the sample from the original infection contained a drug-associated minor variant prior to the onset of drug treatment, and this minor variant differed from the dominant strain by only two nucleotide positions. Due to drug-associated selection pressure, this minor variant eventually became dominant in the host. The variant codons observed at the other positions are also possibly representative of other minor variants in the original virus population but, as they remained minor members of the viral population, they are unlikely to have a selective advantage.

Evidence for Transmission of Drug-Resistant Viruses in Household (Study 2)

In a separate study, we observed a similar phenomenon where oseltamivir resistance emerged quickly in the household contact (father) of an index case (son). Both family members were started on oseltamivir on the same day (Figure A4-1B) i.e., twice a day treatment for the son and once a day prophylaxis for the father. The latter developed influenza-like symptoms 24 hours after drug treatment was begun. Such a rapid clinical presentation suggests that he was already infected at the time prophylaxis was initiated, and that drug-resistant viruses were most likely already present.

We characterized the genetic diversity of the virus populations in both individuals by deep sequencing. An example for the HA and NA genes where most of the variants seen in the son are also observed in the father is shown in Figure A4-3. While the dominant viruses are drug-sensitive in the son and drug-resistant in the father, apparent by the switch from H275 to 275Y, it is striking that a minor population of viruses in the son already carries the drug resistance mutation; minor drug resistant variant residue 275Y is present in more than 2.4% of the reads in the son (which was not detected by conventional RT-PCR and Sanger sequencing). Hence, it is likely that viruses carrying this mutation were transmitted to the father along with drug sensitive viruses, and became dominant in that individual following selection associated with a subtherapeutic (prophylactic) dose of oseltamivir.

Also of note was that the same minor variants were found in both the father and the son at 60 residue positions across all 10 viral proteins (Table S3). We estimate that there were 8 days of replication in the father from the time he was possibly infected by the son (assuming it occurred 24 hours before any symptoms) to the time the specimen was collected. Over that time, variant representation could have fluctuated such that the set of 60 variants seen in both samples is likely to underestimate the true number. While the number of conserved variants points

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

to possible transmission, and the probability that the same variants could appear in both the son and the father by chance alone is extremely low, we do not have other potential contacts or index cases to test in order to confirm this observation.


The most striking observation from both of these studies is that the mutation most commonly associated with resistance to oseltamivir (H275Y) is present in the viral population of some individuals prior to the onset of drug treatment. In addition, this minor drug-resistant population could not be revealed by conventional methods such as phenotypic resistance tests and Sanger sequencing. This observation is important for a number of reasons. First, the prior existence of Y275 means that the selection for drug resistance will proceed much more rapidly following the onset of drug selection pressure than if only wild-type viruses are present in the population, as there is no waiting time for the correct mutation to appear (Bonhoeffer and Nowak, 1997). Further, that the Y275 mutation is present in untreated hosts indicates that this mutation is not strongly deleterious in the absence of oseltamivir, and likely does not need compensatory mutations to enable its fixation (Hamelin et al., 2010; Memoli et al., 2010; Seibert et al., 2010). Indeed, in both cases studied here, we observed no amino acid changes that were fixed concordant with Y275, and only a single synonymous mutation (in NS1) in the case of the immunocomprised child. In these circumstances, the pre-existence of Y275 means that oseltamivir resistance will likely spread rapidly as soon as there is drug selection pressure, especially in immunocompromised individuals and when suboptimal antiviral dosage is used.

If the Y275 mutation is present in individual hosts prior to the onset of drug treatment then it is also likely to have been transmitted between individuals as a minor variant. This in turn suggests that there may not often be a severe population bottleneck during the inter-host transmission of influenza virus. Indeed, mixed infections of multiple variants of influenza virus have been observed in both natural human infections (Ghedin et al., 2009; Ghedin et al., 2011; Pajak et al., 2011) and experimental animal infections (Murcia et al., 2010; Murcia et al., 2012), and hence may be commonplace. Co-infection with major and minor variants, captured by deep sequencing, was also observed during the course of human rhinovirus infections (Cordey et al., 2010), indicating that this phenomenon is not unique to influenza. In contrast, sequencing studies of HIV suggest that a small number of viral particles initiate infection, such that most variants are produced following replication within the newly infected host (Keele et al., 2008).

Such transmission of multiple variants is most clearly documented in the son-father case, where perhaps 60 mutational variants are passed between these two individuals, one of which confers oseltamivir resistance. However, the availability of only short sequence reads makes it impossible to determine the exact number of distinct viral haplotypes these correspond to. In addition, our sampling

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

protocol in the son-father transmission case dictates that we cannot exclude that there was rapid selection of oseltamivir resistance in the son after we sampled his virus population, such that a majority Y275 population was in fact transmitted to the father. However, this would entail extremely rapid selection for resistance and does not change the central observation that multiple variants are transmitted between hosts as both H275 and Y275 are found in the father.

That the Y275 mutation is present in the son prior to oseltamivir treatment and so soon after symptom onset suggests that this resistance mutation was also present in the viral population initially transmitted to the son. Similarly, the presence of Y275 in the immunocomprised child suggests that this mutation may have been transmitted to the child in a mixed infection containing both drug-sensitive and -resistant mutations, although it cannot be excluded that the variant appeared de novo. If Y275 is indeed present in the founding population in both individuals then it is possible that this mutation is present as a low frequency variant in many individuals infected with A(H1N1)pdm09, and that its presence reflects the combined action of past selection for drug resistance in patients receiving oseltamivir, incomplete reversion to the wild-type H275 mutation in patients that are not on the drug, and a lack of strongly deleterious fitness effects in the absence of drug. The large-scale ultra-deep sequencing of additional A(H1N1)pdm09 patients who have not received oseltamivir will clearly be central to answering this question.

Next generation ultra-deep sequencing of intra-host viral populations such as that undertaken here promises to transform our understanding of the evolution of drug resistance in acute viral infections, allowing the dissection of the mutational spectrum at an unprecedented level of precision. Indeed, it is striking that in the two cases conventional RT-PCR failed to detect the presence of oseltamivir resistance even though Y275 was present in the viral population. However, despite its undoubted potential, ultra-deep sequencing also comes with a number of inherent analytical difficulties. First, because the sequencing protocol leads to the generation of short sequence reads, nucleotide positions cannot be linked either within or among individual genes except if they are close enough to appear on the same sequence read, or if they have the same pattern of prevalence. More fundamentally, it is critical to ensure that minor genetic variants are not the result of PCR/sequencing artefacts. Amplification leads to the well-known problem of “PCR duplicates,” sometimes resulting in severe distortion to the observed proportions of true variant subpopulations and the possible creation of false variant sequences through PCR errors. To address these problems, each specimen from our study was amplified in four independent reactions using different barcodes, allowing us to track amplification products and their respective sequence reads. Future work will employ a simpler and more cost-effective approach using modified primers that include unique tags for each template (Jabara et al., 2011).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


Baz, M., Y. Abed, J. Papenburg, X. Bouhy, M. E. Hamelin, and G. Boivin. 2009. Emergence of oseltamivir-resistant pandemic H1N1 virus during prophylaxis. New England Journal of Medicine 361(23):2296-2297.

Bonhoeffer, S., and M. A. Nowak. 1997. Pre-existence and emergence of drug resistance in HIV-1 infection. Proceedings: Biological Sciences 264(1382):631-637.

Cordey, S., T. Junier, D. Gerlach, F. Gobbini, L. Farinelli, E. M. Zdobnov, B. Winther, C. Tapparel, and L. Kaiser. 2010. Rhinovirus genome evolution during experimental human infection. PloS One 5(5):e10588.

Djikeng, A., R. Halpin, R. Kuzmickas, J. Depasse, J. Feldblyum, N. Sengamalay, C. Afonso, X. Zhang, N. G. Anderson, E. Ghedin, and D. J. Spiro. 2008. Viral genome sequencing by random priming methods. BMC Genomics 9:5.

Garten, R. J., C. T. Davis, C. A. Russell, B. Shu, S. Lindstrom, A. Balish, W. M. Sessions, X. Xu, E. Skepner, V. Deyde, M. Okomo-Adhiambo, L. Gubareva, J. Barnes, C. B. Smith, S. L. Emery, M. J. Hillman, P. Rivailler, J. Smagala, M. de Graaf, D. F. Burke, R. A. Fouchier, C. Pappas, C. M. Alpuche-Aranda, H. Lopez-Gatell, H. Olivera, I. Lopez, C. A. Myers, D. Faix, P. J. Blair, C. Yu, K. M. Keene, P. D. Dotson, Jr., D. Boxrud, A. R. Sambol, S. H. Abid, K. St George, T. Bannerman, A. L. Moore, D. J. Stringer, P. Blevins, G. J. Demmler-Harrison, M. Ginsberg, P. Kriner, S. Waterman, S. Smole, H. F. Guevara, E. A. Belongia, P. A. Clark, S. T. Beatrice, R. Donis, J. Katz, L. Finelli, C. B. Bridges, M. Shaw, D. B. Jernigan, T. M. Uyeki, D. J. Smith, A. I. Klimov, and N. J. Cox. 2009. Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans. Science 325(5937):197-201.

Ghedin, E., A. Fitch, A. Boyne, S. Griesemer, J. DePasse, J. Bera, X. Zhang, R. A. Halpin, M. Smit, L. Jennings, K. St George, E. C. Holmes, and D. J. Spiro. 2009. Mixed infection and the genesis of influenza virus diversity. Journal of Virology 83(17):8832-8841.

Ghedin, E., J. Laplante, J. DePasse, D. E. Wentworth, R. P. Santos, M. L. Lepow, J. Porter, K. Stellrecht, X. Lin, D. Operario, S. Griesemer, A. Fitch, R. A. Halpin, T. B. Stockwell, D. J. Spiro, E. C. Holmes, and K. St George. 2011. Deep sequencing reveals mixed infection with 2009 pandemic influenza A (H1N1) virus strains and the emergence of oseltamivir resistance. Journal of Infectious Diseases 203(2):168-174.

Hamelin, M. E., M. Baz, Y. Abed, C. Couture, P. Joubert, E. Beaulieu, N. Bellerose, M. Plante, C. Mallett, G. Schumer, G. P. Kobinger, and G. Boivin. 2010. Oseltamivir-resistant pandemic A/H1N1 virus is as virulent as its wild-type counterpart in mice and ferrets. PLoS Pathogens 6(7):e1001015.

Jabara, C. B., C. D. Jones, J. Roach, J. A. Anderson, and R. Swanstrom. 2011. Accurate sampling and deep sequencing of the HIV-1 protease gene using a primer id. Proceedings of the National Academy of Sciences of the United States of America 108(50):20166-20171.

Keele, B. F., E. E. Giorgi, J. F. Salazar-Gonzalez, J. M. Decker, K. T. Pham, M. G. Salazar, C. Sun, T. Grayson, S. Wang, H. Li, X. Wei, C. Jiang, J. L. Kirchherr, F. Gao, J. A. Anderson, L. H. Ping, R. Swanstrom, G. D. Tomaras, W. A. Blattner, P. A. Goepfert, J. M. Kilby, M. S. Saag, E. L. Delwart, M. P. Busch, M. S. Cohen, D. C. Montefiori, B. F. Haynes, B. Gaschen, G. S. Athreya, H. Y. Lee, N. Wood, C. Seoighe, A. S. Perelson, T. Bhattacharya, B. T. Korber, B. H. Hahn, and G. M. Shaw. 2008. Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proceedings of the National Academy of Sciences of the United States of America 105(21):7552-7557.

Langmead, B., C. Trapnell, M. Pop, and S. L. Salzberg. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10(3):R25.

Memoli, M. J., R. J. Hrabal, A. Hassantoufighi, B. W. Jagger, Z. M. Sheng, M. C. Eichelberger, and J. K. Taubenberger. 2010. Rapid selection of a transmissible multidrug-resistant influenza a/h3n2 virus in an immunocompromised host. Journal of Infectious Diseases 201(9):1397-1403.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Minoche, A. E., J. C. Dohm, and H. Himmelbauer. 2011. Evaluation of genomic high-throughput sequencing data generated on illumina hiseq and genome analyzer systems. Genome Biology 12(11):R112.

Mishin, V. P., F. G. Hayden, and L. V. Gubareva. 2005. Susceptibilities of antiviral-resistant influenza viruses to novel neuraminidase inhibitors. Antimicrobial Agents and Chemotherapy 49(11):4515-4520.

Murcia, P. R., G. J. Baillie, J. Daly, D. Elton, C. Jervis, J. A. Mumford, R. Newton, C. R. Parrish, K. Hoelzer, G. Dougan, J. Parkhill, N. Lennard, D. Ormond, S. Moule, A. Whitwham, J. W. McCauley, T. J. McKinley, E. C. Holmes, B. T. Grenfell, and J. L. Wood. 2010. Intra- and interhost evolutionary dynamics of equine influenza virus. Journal of Virology 84(14):6943-6954.

Murcia, P. R., J. Hughes, P. Battista, L. Lloyd, G. J. Baillie, R. H. Ramirez-Gonzalez, D. Ormond, K. Oliver, D. Elton, J. A. Mumford, M. Caccamo, P. Kellam, B. T. Grenfell, E. C. Holmes, and J. L. N. Wood. 2012. Evolution of an eurasian avian-like influenza virus in native and vaccinated pigs. PLoS Pathogens 8(5):e1002730.

Nakamura, K., T. Oshima, T. Morimoto, S. Ikeda, H. Yoshikawa, Y. Shiwa, S. Ishikawa, M. C. Linak, A. Hirai, H. Takahashi, M. Altaf-Ul-Amin, N. Ogasawara, and S. Kanaya. 2011. Sequence-specific error profile of illumina sequencers. Nucleic Acids Research 39(13):e90.

Pajak, B., I. Stefanska, K. Lepek, S. Donevski, M. Romanowska, M. Szeliga, L. B. Brydak, B. Szewczyk, and K. Kucharczyk. 2011. Rapid differentiation of mixed influenza A/H1N1 virus infections with seasonal and pandemic variants by multitemperature single-stranded conformational polymorphism analysis. Journal of Clinical Microbiology 49(6):2216-2221.

Pizzorno, A., Y. Abed, and G. Boivin. 2011a. Influenza drug resistance. Seminars in Respiratory and Critical Care Medicine 32(4):409-422.

Pizzorno, A., X. Bouhy, Y. Abed, and G. Boivin. 2011b. Generation and characterization of recombinant pandemic influenza A(H1N1) viruses resistant to neuraminidase inhibitors. Journal of Infectious Diseases 203(1):25-31.

Potier, M., L. Mameli, M. Belisle, L. Dallaire, and S. B. Melancon. 1979. Fluorometric assay of neuraminidase with a sodium (4-methylumbelliferyl-alpha-D-N-acetylneuraminate) substrate. Analytical Biochemistry 94(2):287-296.

Seibert, C. W., M. Kaminski, J. Philipp, D. Rubbenstroth, R. A. Albrecht, F. Schwalm, S. Stertz, R. A. Medina, G. Kochs, A. Garcia-Sastre, P. Staeheli, and P. Palese. 2010. Oseltamivir-resistant variants of the 2009 pandemic H1N1 influenza a virus are not attenuated in the guinea pig and ferret transmission models. Journal of Virology 84(21):11219-11226.

Semret, M., S. Fenn, H. Charest, J. McDonald, C. Frenette, and V. Loo. 2009 (18-21 June). A real- time RT-PCR assay for detection of influenza H1N1 (swine-type) and other respiratory viruses. Paper presented at 26th International Congress of Chemotherapy and Infection, Toronto, ON.

van der Vries, E., M. Jonges, S. Herfst, J. Maaskant, A. Van der Linden, J. Guldemeester, G. I. Aron, T. M. Bestebroer, M. Koopmans, A. Meijer, R. A. Fouchier, A. D. Osterhaus, C. A. Boucher, and M. Schutten. 2010. Evaluation of a rapid molecular algorithm for detection of pandemic influenza A (H1N1) 2009 virus and screening for a key oseltamivir resistance (H275Y) substitution in neuraminidase. Journal of Clinical Virology 47(1):34-37.

WHO. 2011. Weekly update on oseltamivir resistance in influenza A(H1N1)2009 viruses.

Zhou, B., M. E. Donnelly, D. T. Scholes, K. St George, M. Hatta, Y. Kawaoka, and D. E. Wentworth. 2009. Single-reaction genomic amplification accelerates sequencing and vaccine production for classical and swine origin human influenza a viruses. Journal of Virology 83(19):10309-10313.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.



Daniel P. Smith,14John C. Alverdy,15
Jeffrey A. Siegel,16,17and Jack A. Gilbert14,18

Milestones in Home and Hospital Microbiome Research

The populations of developed nations spend approximately 90 percent of their time indoors (Moschandreas, 1981), leading scientists and the public alike to take an interest in the microbial communities that share these spaces with us. This is especially true in healthcare environments, where hospital-acquired infections (HAIs) have long been among the leading causes of patient deaths (Anderson and Smith, 2005; Groseclose et al., 2004; Hall-Baker et al., 2010; Klevens et al., 2007). The first study of airborne pathogens in a hospital can be attributed to Bourdillon and Colebrook, who, in 1946, investigated the concentration of bacteria present in the air of a surgical changing room (Bourdillon and Colebrook, 1946). Their findings, and the findings of similar studies published over the following two decades, revealed levels of airborne bacteria that were cause for concern (Blowers and Wallace, 1960; Colebrook and Cawston, 1948; Cvjetanović, 1957; Greene et al., 1962a,b; Warner and Glassco, 1963) and prompted a rethinking of ventilation designs for hospitals.

The air-sampling techniques developed for hospitals were soon applied to other indoor spaces including subway trains (Williams and Hirch, 1950), classrooms (Williams et al., 1956), movie theaters (Cvjetanović, 1957), and apartments (Simard et al., 1983). Articles by Finch and colleagues in 1978 and Scott and colleagues in 1982 complemented these air-based studies with the first characterizations of bacteria living on bathroom and kitchen surfaces (Finch et al., 1978; Scott et al., 1982). The larger of the two studies, conducted by Scott and colleagues, examined 60 locations in 251 homes, and agreed well with the conclusions from an earlier study by Finch and colleagues of 21 homes that the dominant species on the studied surfaces were enterobacteria, Pseudomonads, micrococci, Bacillus, and Aeromonas hydrophila, with a lower incidence of Salmonella, Staphylococcus aureus, and Bacillus cereus.


14 Argonne National Laboratory, Institute for Genomic and Systems Biology, Argonne, IL, USA.

15 Department of Surgery, The University of Chicago Medical Center, Chicago, IL, USA.

16 Department of Civil, Architectural and Environmental Engineering, The University of Texas at Austin, Austin, TX, USA.

17 Department of Civil Engineering, The University of Toronto, Toronto, ON, Canada.

18 Department of Ecology and Evolution, The University of Chicago, Chicago, IL, USA.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Current literature regarding the relationship between the indoor environment and humans primarily explores the development of fungal contamination with damp surfaces (Hyvarinen et al., 2002; Jaffal et al., 1997; Lignell et al., 2008; Nevalainen and Seuri, 2005), the role of hygiene in removing microbial communities (Bright et al., 2009; Grice and Segre, 2011), and the length of time microbes can survive on surfaces (Kramer et al., 2006). There have been a number of studies to explore the microbial diversity of communities associated with dust (Pitkäranta et al., 2008; Rintala et al., 2008; Sebastian and Larsson, 2003) and air (Huttunen et al., 2008; Tringe et al., 2008). One study that investigated temporal succession of microbial communities performed on indoor dust found seasonal patterns, and these were building specific, probably as a result of skin cells shed from inhabitants within the buildings (Rintala et al., 2008). These existing studies demonstrate fundamental principles regarding experimental design, explicitly regarding the types of environmental conditions that need to be monitored (e.g., surface material, moisture, HVAC system), and the observation that the architectural design of an indoor space influences the potential community structure and hence human health (Guenther and Vittori, 2008). The influence of air ventilation and the number of people in a space must be explored with regard to the impact on microbial community structure (Hospodsky et al., 2012; Kembel et al., 2012; Qian and Li, 2010; Qian et al., 2012). Additionally, the variability associated with body sites (Costello et al., 2009; Fierer et al., 2010; Grice et al., 2009) will have a major impact on the interpretation of the analyses, because different body sites interact with different surfaces differently. The most diverse skin sites are the driest areas and hence are less likely to be transferred to a surface with sebaceous exudates. This will affect the time that the microbial community maintains structural cohesion with reference to relative abundance of members on a surface (Kramer et al., 2006).

Studies of indoor microbiology are highly relevant in today’s age, because concern for protecting ourselves from microbial pathogens is ever-present. Studies in this field do much to put this threat in perspective by identifying incorrect preconceptions. For instance, using cleaning products containing the antibacterial agent triclosan over the course of a year does not result in an increase in antimicrobial drug-resistant bacteria in homes (Aiello et al., 2005; Cole et al., 2003). Several research groups have studied the beneficial effect of childhood exposure to dirty environments, which is particularly pronounced in the inverse correlation between children who live on farms and their reduced likelihood of later developing asthma and other respiratory problems (Adler et al., 2005; Alfvén et al., 2006; Klintberg et al., 2001; Leynaert et al., 2001; Merchant et al., 2005; Remes et al., 2005; Riedler et al., 2001; Schram et al., 2005). Hospitals have found that by opening the windows in patient rooms, the percentage of potentially pathogenic microbes in the air is significantly reduced (Escombe et al., 2007; Kembel et al., 2012). These are just a few examples of a larger trend toward questioning the culture of cleanliness, or as its come to be called—the hygiene hypothesis

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

(Bloomfield et al., 2006; Martinez, 2001; Rook, 2009; Rook and Stanford, 1998; Yazdanbakhsh and Matricardi, 2004). The study of pathogens in indoor environments is valuable, but perhaps even more valuable is gaining a better understanding of the competition between pathogenic and non-pathogenic microorganisms and how we might shift that balance in our favor.

A New Scientific Community

The expansion of indoor microbiome research from healthcare environments into homes and offices has been driven in large part by funding initiatives by the Alfred P. Sloan Foundation. This private agency stipulates a high degree of collaboration between its grantees and as a result has brought about the development of Internet portals designed to enable any indoor microbiome researcher to exchange raw data and results with other scientists as well as with reporters and the general public. One such nexus, the Microbiology of the Built Environment Network website,19 tracks investigators, projects, publications, computational resources, protocols, standards, press releases, social media, conferences, and workshops relating to indoor microbiology. A community data archive is hosted by the Microbiome of the Built Environment Data Analysis Core (MoBEDAC), which integrates with data analysis tools including visualization and analysis of microbial population structures (VAMPS), quantitative insights into microbial ecology (QIIME), meta-genome rapid annotation using system technology (MG-RAST), and FungiDB (Caporaso et al., 2010; Meyer et al., 2008; Stajich et al., 2011).

While provides general information applicable to a wide range of indoor microbiology projects, working groups for specialists in this field have also emerged. The Berkeley Indoor Microbial Ecology Research Consortium (BIMERC) focuses on identifying the source populations and human influences on the microbial components of indoor air. Discovering the mechanisms and rates with which microbial communities spread throughout healthcare facilities is the goal of the Hospital Microbiome Consortium.20 The Biology and the Built Environment Center at the University of Oregon21 has begun training students in the investigation of how architecture can influence the structure of indoor microbiomes.

Indoor Microbiology Without Culturing?

Early studies of the indoor microbiome relied heavily on culture-based methods: agar plates were commonly pressed directly against the surface of interest,



21 BioBE;

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

and the resultant colonies were counted and microscopically characterized in order to establish quantitative and taxonomic abundance metrics. However, with the introduction of next-generation sequencing technologies, rapid, high-throughput characterization of taxonomic marker genes (e.g., 16S/18S rRNA) and whole genomic DNA from environmental samples is now financially viable, altering the landscape of tools available to the researcher interested in the indoor microbiome.

The Pros and Cons of High-Throughput Sequencing

Since 2007, high-throughput sequencing technologies offered by Illumina, Roche, and ABI have enabled researchers to quickly and inexpensively profile the relative abundances of taxonomic groups in samples of environmental microbial communities. This approach to microbial ecology offers three advantages over culture-based methods. First, uncultivable species can be identified, thereby providing a more complete characterization of the microbial community. Second, organisms can be systematically classified by computer-aided alignment of DNA sequences to reference genes and genomes. And lastly, this entire process is relatively easily scalable from tens of samples to tens of thousands of samples.

Classical culturing, however, still yields important information that cannot be attained through high-throughput DNA sequencing of ribosomal genes. Measuring the absolute abundance of colony-forming units is very straightforward when working with agar press-plates, but it is a difficult metric to attain from nanogram quantities of DNA extracted from environmental samples. Growth of colonies on plates also provides concrete evidence that the cell taken from the environment was viable. In addition, sequencing ribosomal genes does not provide information on whether the detected microbial species harbor genetic cassettes encoding antibiotic-resistance genes, an important factor in evaluating the pathogenicity of microorganisms. The ability to retain bacterial colonies for further studies is a third advantage of plate-based culturing, allowing one to subject any detected species to thorough examination. In light of these considerations, a study seeking to characterize both currently uncultivable microorganisms and antibiotic-resistant human pathogens would need to draw on both classical and next-generation microbial community analysis techniques.

Microarrays for Rapid Identification of Antibiotic Resistance

A new approach to rapidly screening microbial communities for multiple antibiotic resistance markers has recently been developed by Taitt et al. (2012). Their Antimicrobial Resistance Determinant Microarray chips are designed to detect more than 250 resistance genes covering 12 classes of antibiotics and have been shown to be compatible with low concentrations of DNA extracted from swabs (personal communication). Advances such as this are quickly closing the

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

gap between the high-throughput sequencing technologies and classical culture-based phenotyping.

Sample and Metadata Collection

The selection of sampling locations and environmental parameters to monitor is fundamental to any microbiome project. Balancing the comprehensiveness of an investigation against logistical constraints has led to studies that examine highly specific aspects of the indoor microbiome (Hilton and Austin, 2000; Kelley et al., 2004; Kembel et al., 2012; Kopperud et al., 2004; Krogulski and Szczotko, 2011; Tang, 2009; Wiener-Well et al., 2011). As might be expected, the sampling locations that these studies chose are those that humans most come into contact with on a daily basis. A list of these locations for homes (Table A5-1) and hospitals (Table A5-2) are provided below.


The air within built environments is arguably one of the most important mediums to consider when investigating the interaction between humans and the indoor microbiome. Air inside of buildings is biologically distinct from outdoor air, containing a greater proportion of human-associated microflora shed by its occupants (Bouillard et al., 2005; Clark, 2009; Fox et al., 2010; Hospodsky et al., 2012; Korves et al., 2012; Noble et al., 1976; Noris et al., 2011; Qian et al., 2012; Rintala et al., 2008; Täubel et al., 2009). Low air exchange rates and recycling of air for conditioning purposes can exacerbate the negative effects of aerosolized microorganisms that have effects on human health due to pathogenic, toxic, and/or allergic properties (D’Amato et al., 2005; Monto, 2002; Peccia et al., 2008; Pope et al., 1993).

Airborne particles are commonly collected using one of four methods: settle plates, impactors, impingers, and filtration, each of which offer differing advantages and efficiencies (Fahlgren et al., 2010; Griffin et al., 2010; Morrow et al., 2012). Settle plates offer a silent and inexpensive option for enumerating colony-forming units deposited by gravity onto agar petri dishes. Settle plates will preferentially sample larger particles, because they are more likely to be deposited by gravitational settling. Impactors increase the rate and control of particle deposition by accelerating air in an arc relative to the agar surface, utilizing centrifugal forces to select for a specific range of particle masses. The same design principle is used by impingers, where the deposition media is liquid rather than solid state. However, the mechanical stresses introduced by impactors and impingers can rupture cellular membranes, thereby reducing culturing viability. Filtration of air through a porous membrane is less mechanically stressful to cells, but may result in desiccation. Although active samplers—impactors, impingers, and filtering devices—offer the added benefit of providing a quantitative accounting of the volume of air sampled, the noise generated by the unit’s pump or fan may preclude their use in occupied buildings. Filters from central HVAC systems have also been used in lieu of portable sampling units (Bonetta et al., 2009; Drudge et al., 2011; Farnsworth et al., 2006; Hospodsky et al., 2012; Korves et al., 2012; Noris et al., 2009, 2011; Stanley et al., 2008).

TABLE A5-1 Home High-Touch Surfaces and Bacterial Reservoirs

All rooms

Light switches, air, dust, floor, rugs, door knobs


Countertop, sink, faucet handles, drain, u-pipe, refrigerator handle, refrigerator shelves, microwave buttons, dish sponge, drying towels, drying rack


Countertop, sink, u-pipes, shower floor, shower curtain, showerhead, shower poufs, bar soap, toilet bowl, toilet water, toilet seat, toilet flush handle, hand towels


Pillows, sheets

Living room

Seats, arm rests, head rests, pillows, blankets, remote controls

Office, etc.

Keyboard, mouse, water from water heater, mop head, HVAC filters

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

TABLE A5-2 Hospital High-Touch Surfaces and Bacterial Reservoirs

Patient area

Bed rails, tray table, call boxes, telephone, bedside tables, patient chair, IV pole, floor, light switches, glove box, air, air exhaust filter

Patient restroom

Sink, faucet handles, inside faucet head, hot tap water, cold tap water, light switches, door knob, handrails, toilet seats, flush lever, bed pan cleaning equipment, floor, air, air exhaust filter

Additional equipment

IV Pump control panel, monitor control panel, monitor touch screen, monitor cables, ventilator control panel, blood pressure cuff, janitorial equipment


Cold tap water, hot tap water, water used to clean floors


Stool sample, nasal swab, hand


Nasal swab, bottom of shoe, dominant hand, cell phone, pager, iPad, computer mouse, work phone, shirt cuff, stethoscope

Travel areas

Corridor floor, corridor wall, steps, stairwell door knobs, stairwell door kick plates, elevator buttons, elevator floor, handrails, air


Front desk surface, chairs, coffee tables, floor, air

Public restroom

Floor, door handles, sink controls, sink bowl, soap dispenser, towel dispenser, toilet seats, toilet lever, stall door lock, stall door handle, urinal flush lever, air, air exhaust filter


Municipal water supplies have long been known to contain biofilm-forming and planktonic microorganisms, including Mycobacterium and Legionella

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

(Angenent, 2005; Boe-Hansen et al., 2002; du Moulin et al., 1988; Embil et al., 1997; Falkinham et al., 2001; Le Dantec et al., 2002; Lee et al., 1988; Leoni et al., 1999, 2001; Thomas et al., 2006; Vaerewijck et al., 2001, 2005). Tap water therefore may be an important source of microbes in built environments. Cell counts can be attained visually by microscopy using non-specific DNA stains such as SYBR Gold, or in an automated fashion with flow cytometry. Both methods can be adapted to fluorescent in situ hybridization (FISH) analysis, in which taxon-specific fluorescent probes replace or complement non-specific DNA stains. The particle size of aggregated cells can also be used to determine if cells are biofilm-originating or planktonic. Because biofilms may form on faucet heads, sampling strategies may opt to collect water samples as soon as water begins flowing from the tap (to favor collection of the tap’s biofilms) and/or wait until water has been flowing through the tap for several minutes (to measure systemic contaminants). Collection of both hot and cold tap water samples is crucial, because the different water temperatures can have an effect on the microorganisms that are able to persist in these water systems.

External Factors Influencing the Indoor Microbiome

Microbial community structure is highly habitat dependent. Therefore, the collection of metadata is fundamental to any project seeking to characterize the microbial community composition and structure. For constantly fluctuating parameters such as temperature, relative humidity, and brightness, one might consider recording not only the value at the time of sampling, but also the recent highs, lows, and averages for the sampled location. Table A5-3 lists parameters that may have an influence on microbial populations and communities. Many of these factors, such as temperature and humidity, are constantly

TABLE A5-3 Environmental Parameters




Latitude, longitude, altitude
Foundation type
HVAC sterility
Surrounding flora
Construction materials

Window closed vs. open
Window direction (N, S, E, W)
Light bulb type
Hours per day occupied
Barefoot vs. shoe traffic
Exposure to pets, vermin, etc.
Plants or water features
Number of occupants
Connections to other rooms
Air temperature
Relative humidity
Percentage recirculated air
Air exchange rate

Material (carpet, granite, etc.)
Water activity
Time since last cleaning
Type of cleaning
Light exposure
Surface temperature
History of moisture events
Occupant proximity and interaction

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

changing, therefore measuring these variables continuously in order to observe maximums, minimums, and averages can be useful in generating a comprehensive site analysis.

To facilitate the adoption of a consistent metadata ontology for these types of measurements, a Minimum Information about any Sequence (MIxS) standard specific to the built environment (MIxS-BE) was presented to the Genomics Standards Consortium (GSC) on March 7, 2012 (Gilbert et al., 2012). At the time of this writing, the MIxS-BE standard is available as a working draft on the Microbiology of the Built Environment website ( in order to solicit feedback and provide direction to early adopters. The MIxS GSC standards have been integral to generating comparable results among different research centers, with the existing GSC standards for genomics (MIGS), metagenomics (MIMS), and genetic markers (MIMARKS) (Field et al., 2008; Kottmann et al., 2008; Yilmaz et al., 2011) having enabled tens of thousands of environmental samples from dozens of laboratories and hundreds of geographic locations to be included in the Earth Microbiome Project (Gilbert et al., 2010a,b, 2011).

Automated Monitoring

Many of the parameters in Table A5-3 can be automatically measured and recorded at regular intervals by specialized data loggers designed for this purpose. Temperature and humidity monitors are relatively inexpensive, whereas devices for assessing air exchange rate, fraction of recirculated air, HVAC system flows, and occupancy and activities to a high level of precision can be a significant portion of a research budget and can require substantial post-processing. In selecting monitoring equipment, it is also important to consider the options of using battery versus electrical outlet power, and whether to store data locally on memory cards versus transmitting readings off site. In hospital settings, it is often necessary to obtain permission from technical administrators before installing wireless transmitters, because such devices may interfere with sensitive medical equipment.

Personnel- and asset-tracking infrastructure can also be of immense value to a study of microbial communities in an environment where human movement is hypothesized to be a driving factor in the introduction of new microbial species to surfaces and airborne particles. Through a combination of uniquely identifiable radio-frequency identification (RFID) tags worn by hospital occupants and RFID sensors placed throughout the building, this system is able to continuously monitor the location of personnel, thereby providing time-stamped information on person-to-person and person-to-room interactions. From this data, one can examine the connection between movement of staff between rooms and the movement of bacterial populations between rooms. Furthermore, by combining the observed number of occupants per room with the air exchange rate, it is possible to estimate the CO2 and airborne microbe concentrations at each point in time. RFID systems are commonly used to track equipment and can therefore

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

also provide information regarding which equipment is shared among patients, such as IVs and dialysis machines. These data enable researchers to observe not only microbial communities over time, but also the influence of human interaction with those communities.

Seasonal changes in outdoor humidity and temperature have previously been found to influence the composition of microbial communities in indoor environments (Augustowska and Dutkiewicz, 2006; Eber et al., 2011; Kaarakainen et al., 2009; Park et al., 2000; Pitkäranta et al., 2008; Rintala et al., 2008, 2012; Yamada, 2007). Therefore, the recording of meteorological conditions is an important aspect of indoor microbiome studies. This can be accomplished by retrieving publicly available National Oceanic and Atmospheric Administration records through the National Climatic Data Center website at This collection offers hourly measurements from a network of temperature, humidity, pressure, wind velocity, and precipitation sensors gathered from automated weather monitoring stations throughout the United States.

Special Considerations

Effect of Cleaning

Cleaning practices in indoor environments are inherently designed to affect the resident microbiota; therefore, it is important to take note of the strategies used to disinfect the environments under study and the time-points at which cleaning regimens were conducted. In previous studies, the effects of cleaning were found to be highly dependent upon the cleaning products used (Barker et al., 2004; Exner et al., 2004; Josephson et al., 1997; Marshall et al., 2012; Rusin et al., 1998; Rutala et al., 2000; Scott et al., 1984): antimicrobial agents, bleach, ethanol, peroxide, and Lysol were much more effective at sterilizing a surface than surfactants, detergents, vinegar, ammonia, or baking soda. Although there has been speculation that households using antimicrobial cleaning products may select for antibiotic-resistant bacteria, randomized studies investigating this hypothesis did not observe differences in bacterial population structure or antibiotic resistance in response to antimicrobial cleaning products (Aiello et al., 2005; Cole et al., 2003).

Healthcare Facility Sampling

Prior studies have identified numerous hospital-associated pathogens (HAPs) as well as routes of transmission between patients, staff, equipment, surfaces, and recycled air. HAPs that are of particular relevance are coagulase-negative staphylococci, Staphylococcus aureus, Enterococcus species, Candida species, Escherichia coli, Pseudomonas aeruginosa, Klebsiella pneumoniae, Enterobacter species, Acinetobacter baumannii, and Klebsiella oxytoca, which have been

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

previously found to collectively account for 84 percent of HAIs over a 21-month period in 463 hospitals (Hidron et al., 2008). These bacteria have been found on physician’s and nursing staff’s clothing (Babb et al., 1983; Biljan et al., 1993; Loh et al., 2000; Lopez et al., 2009; Perry et al., 2001; Snyder et al., 2008; Treakle et al., 2009; Wiener-Well et al., 2011; Wong et al., 1991; Zachary et al., 2001), cell phones (Akinyemi et al., 2009; Brady et al., 2006, 2009; Datta et al., 2009; Hassoun et al., 2004; Kilic et al., 2009; Ulger et al., 2009), stethoscopes (Marinella et al., 1997; Zachary et al., 2001), computer keyboards (Bures et al., 2000; Doğan et al., 2008), faucet handles (Bures et al., 2000), telemetry leads (Safdar et al., 2012), electronic thermometers (Livornese et al., 1992), blood-pressure cuffs (Myers, 1978), X-ray cassettes (Kim et al., 2012), gels for ultrasound probes (Schabrun et al., 2006), and in the air of patient rooms (Berardi and Leoni, 1993; Fleischer, 2006; Genet et al., 2011; Huang et al., 2006; Sudharsanam et al., 2008).

Patient microflora are one of the most significant drivers of microbial ecology within a hospital room (Bhalla et al., 2004; Drees et al., 2008b). Microorganisms are readily transferred from patients to hospital staff (Bhalla et al., 2004) and to the next occupant of the room after it has been cleaned (Drees et al., 2008a; Huang et al., 2006). Human traffic that enters and leaves multi-specialty medical centers includes patients with active diseases and infections as well as healthy patients undergoing invasive medical and surgical procedures. The inherent risk of cross-contamination from interactions between healthcare workers, patients, and their families presents a major obstacle to protecting patient health using only the practices of isolation and containment. Several studies have found that regular washing of patients’ skin with the bactericide chlorhexidine can reduce the likelihood of the patient acquiring a nosocomial antibiotic-resistant infection (Bleasdale, 2007; Climo et al., 2009; Kassakian et al., 2011; O’Horo et al., 2012; Paulson, 1993; Popovich et al., 2010; Vernon et al., 2006). Diagnostic testing performed by hospital laboratories in the course of patient treatment produces a detailed accounting of specific patient-associated microorganisms that, with institutional review board approval, can be included in study metadata to identify point sources of microbial populations within the larger hospital environment under observation.

Characterization of the Microbial Community

Ribosomal RNA sequencing is a common method to identify the microbial community structure in environmental samples. This approach involves PCR amplifying a variable region of the 16S (bacterial), 18S (eukaryotic), or ITS (fungal) rRNA gene using multifunctional DNA oligos that contain not only a complementary nucleotide sequence for priming the PCR reaction, but also a multiplex barcode for marking amplified sequences with a unique sample-specific DNA sequence and 5′ region encoding base pairs needed by the sequence technology.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

The amplified, barcoded sequences from multiple samples are pooled together in equimolar concentrations for sequencing and then demultiplexed with computer algorithms based on their barcode sequence. Several software suites are freely available for processing high-throughput sequencing data including QIIME (Caporaso et al., 2010), MG-RAST (Meyer et al., 2008), mother (Schloss et al., 2009), Galaxy (Goecks et al., 2010), HUMAnN (Abubucker et al., 2012), and MEGAN (Huson et al., 2011).

The influence of environmental parameters on microbial populations can be quantified using multivariate non-parametric algorithms (e.g., principal coordinate analysis, principal component analysis, non-metric multidimensional scaling) for community composition and univariate analysis of variance (ANOVA) tests for diversity measures. These statistical tools calculate the percentages of variation that can be explained by individual parameters such as treatment, temperature, building material, adjacent microbiome, and any other environmental characteristic measured in concert with sample collections. Multivariate-crossed analyses are particularly useful in determining if specific combinations of environmental parameters (interactions) have a synergistic effect on population structure or composition. Univariate tests of diversity indices use higher-way ANOVA and are calculated with distribution-free, permutation-based (PERMANOVA) routines (Anderson, 2001). Additionally, following taxonomic characterization of the communities, using the QIIME pipeline (Caporaso et al., 2010), and production of an abundance matrix of operational taxonomic units against experimental condition, community similarity between samples can be represented by calculating a Bray-Curtis similarity matrix and UniFrac distances (Lozupone and Knight, 2005). Non-metric multidimensional scaling can be used to visualize the relationship between the experimental factors and formally tested using a combination of permutation-based PERMANOVA and fully non-parametric ANOSIM tests (Clarke, 1993). The QIIME, MoBEDAC, and VAMPS web servers calculate these metrics as well as facilitate the public release of such data sets. State-of-the-art artificial neural network software developed Larsen and colleagues can be employed to generate models for predicting the development of microbial communities based on the bacterial abundances observed in the study (Larsen et al., 2012). Source-tracking algorithms developed by Knights and colleagues identifies transference of communities from one sampling site to another (Knights et al., 2011). Taken together, these analyses provide insights into the driving factors behind microbial community development.

In addition to examining the relationships between microbial community structure and environmental variables, it is often desirable to compare population structures directly to one another. Such metrics include the diversity of species in a sample (alpha diversity) and the closely related measures of richness and evenness that describe the quantity of species and the range of population sizes (Whittaker, 1960, 1972). Beta diversity takes into account the average alpha diversity and the combined diversity of species across all samples (gamma

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

diversity) to evaluate the presence or absence of a core microbiome commonly shared across a significant subset of samples. This is particularly relevant in the study of disease-causing microorganisms, where it is important to differentiate between systemic populations of opportunistic pathogens that become disease-causing under specific conditions and microbes that are only associated with infections.

When describing the presence of potentially pathogenic microorganisms discovered in surveys of ribosomal sequences, care should be taken to place these results in context. Taxonomic assignment of reads is reliant on reference databases that contain a disproportionate number of sequences from disease-causing microbes, leading many novel operational taxonomic units to phylogenetically ordinate closest to a pathogen with which they may or may not share specific infectivity characteristics. The RDP Classifier (Wang et al., 2007) used in QIIME (Caporaso et al., 2010) and other taxonomic assignment software partially alleviates this issue by providing a confidence score for the assignment of a read into each taxonomic level—domain, kingdom, phylum, class, order, family, genus, and species. However, even correct species-level assignments fail to provide information on the presence or absence of genetic elements that are responsible for many pathogenic and antibiotic-resistant phenotypes.


The ideas presented in this article have been shaped in large part by Hospital Microbiome Consortium ( discussions held at the University of Chicago on June 7, 2012. We also thank the Arthur P. Sloan Foundation for funding the Home Microbiome Project (2011-6-05) and the Hospital Microbiome Consortium Workshop grant (2012-3-25).


Abubucker, S., N. Segata, J. Goll, A. M. Schubert, J. Izard, B. L. Cantarel, B. Rodriguez-Mueller, J. Zucker, M. Thiagarajan, B. Henrissat, O. White, S. T. Kelley, B. Methé, P. D. Schloss, D. Gevers, M. Mitreva, and C. Huttenhower. 2012. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Computational Biology 8(6):e1002358. doi:10.1371/journal.pcbi.1002358.

Adler, A., I. Tager, and D. R. Quintero. 2005. Decreased prevalence of asthma among farm-reared children compared with those who are rural but not farm-reared. Journal of Allergy and Clinical Immunology 115(1):67-73. doi:10.1016/j.jaci.2004.10.008.

Aiello, A. E., B. Marshall, S. B. Levy, P. Della-Latta, S. X. Lin, and E. Larson. 2005. Antibacterial cleaning products and drug resistance. Emerging Infectious Diseases 11(10):1565-1570. doi:10.3201/eid1110.041276.

Akinyemi, K. O., A. D. Atapu, O. O. Adetona, and A. O. Coker. 2009. The potential role of mobile phones in the spread of bacterial infections. Journal of Infection in Developing Countries 3(8):628-632.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Alfvén, T., C. Braun-Fahrländer, B. Brunekreef, E. von Mutius, J. Riedler, A. Scheynius, M. van Hage, M. Wickman, M. R. Benz, J. Budde, K. B. Michels, D. Schram, E. Ublagger, M. Wasser, G. Pershagen, and PARSIFAL study group. 2006. Allergic diseases and atopic sensitization in children related to farming and anthroposophic lifestyle—the PARSIFAL Study. Allergy 61(4):414-421. doi:10.1111/j.1398-9995.2005.00939.x.

Anderson, M. J. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecology 26(1):32-46. doi:10.1111/j.1442-9993.2001.01070.pp.x.

Anderson, R. N., and B. L. Smith. 2005. Deaths: Leading causes for 2002. National Vital Statistics Reports: From the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System 53(17):1-89.

Angenent, L. T. 2005. Molecular identification of potential pathogens in water and air of a hospital therapy pool. Proceedings of the National Academy of Sciences of the USA 102(13):4860-4865. doi:10.1073/pnas.0501235102.

Augustowska, M., and J. Dutkiewicz. 2006. Variability of airborne microflora in a hospital ward within a period of one year. Annals of Agricultural and Environmental Medicine: AAEM 13(1):99-106.

Babb, J. R., J. G. Davies, and G. A. J. Ayliffe. 1983. Contamination of protective clothing and nurses’ uniforms in an isolation ward. Journal of Hospital Infection 4(2):149-157. doi:10.1016/0195- 6701(83)90044-0.

Barker, J., I. B. Vipond, and S. F. Bloomfield. 2004. Effects of cleaning and disinfection in reducing the spread of norovirus contamination via environmental surfaces. Journal of Hospital Infection 58(1):42-49. doi:10.1016/j.jhin.2004.04.021.

Berardi, B. M., and E. Leoni. 1993. Indoor air climate and microbiological airborne: Contamination in various hospital areas. Zentralblatt Für Hygiene Und Umweltmedizin = International Journal of Hygiene and Environmental Medicine 194(4):405-418.

Bhalla, A., N. J. Pultz, D. M. Gries, A. J. Ray, E. C. Eckstein, D. C. Aron, and C. J. Donskey. 2004. Acquisition of nosocomial pathogens on hands after contact with environmental surfaces near hospitalized patients. Infection Control and Hospital Epidemiology 25(2):164-167. doi:10.1086/502369.

Biljan, M. M., C. A. Hart, D. Sunderland, P. R. Manasse, and C. R. Kingsland. 1993. Multicentre randomised double bind crossover trial on contamination of conventional ties and bow ties in routine obstetric and gynaecological practice. BMJ 307(6919):1582-1584. doi:10.1136/bmj.307.6919.1582.

Bleasdale, S. C. 2007. Effectiveness of chlorhexidine bathing to reduce catheter-associated blood-stream infections in medical intensive care unit patients. Archives of Internal Medicine 167(19): 2073. doi:10.1001/archinte.167.19.2073.

Bloomfield, S. F., R. Stanwell-Smith, R. W. R. Crevel, and J. Pickup. 2006. Too clean, or not too clean: The hygiene hypothesis and home hygiene. Clinical & Experimental Allergy 36(4):402- 425. doi:10.1111/j.1365-2222.2006.02463.x.

Blowers, R., and K. R. Wallace. 1960. Environmental aspects of staphylococcal infections acquired in hospitals. III. Ventilation of operating rooms—bacteriological investigations. American Journal of Public Health and the Nation’s Health 50:484-490.

Boe-Hansen, R., H-J. Albrechtsen, E. Arvin, and C. Jørgensen. 2002. Bulk water phase and biofilm growth in drinking water at low nutrient conditions. Water Research 36(18):4477-4486.

Bonetta, S., S. Bonetta, S. Mosso, S. Sampò, and E. Carraro. 2009. Assessment of microbiological indoor air quality in an Italian office building equipped with an HVAC system. Environmental Monitoring and Assessment 161(1-4):473-483. doi:10.1007/s10661-009-0761-8.

Bouillard, L., O. Michel, M. Dramaix, and M. Devleeschouwer. 2005. Bacterial contamination of indoor air, surfaces, and settled dust, and related dust endotoxin concentrations in healthy office buildings. Annals of Agricultural and Environmental Medicine: AAEM 12(2):187-192.

Bourdillon, R. B., and L. Colebrook. 1946. Air hygiene in dressing-rooms for burns or major wounds. Lancet 1(6400):601.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Brady, R. R. W., A. Wasson, I. Stirling, C. McAllister, and N. N. Damani. 2006. Is your phone bugged? The incidence of bacteria known to cause nosocomial infection on healthcare workers’ mobile phones. Journal of Hospital Infection 62(1):123-125. doi:10.1016/j.jhin.2005.05.005.

Brady, R. R. W., J. Verran, N. N. Damani, and A. P. Gibb. 2009. Review of mobile communication devices as potential reservoirs of nosocomial pathogens. Journal of Hospital Infection 71(4):295- 300. doi:10.1016/j.jhin.2008.12.009.

Bright, K. R., S. A. Boone, and C. P. Gerba. 2009. Occurrence of bacteria and viruses on elementary classroom surfaces and the potential role of classroom hygiene in the spread of infectious diseases. Journal of School Nursing 26(1):33-41. doi:10.1177/1059840509354383.

Bures, S., J. T. Fishbain, C. F. Uyehara, J. M. Parker, and B. W. Berg. 2000. Computer keyboards and faucet handles as reservoirs of nosocomial pathogens in the intensive care unit. American Journal of Infection Control 28(6):465-471. doi:10.1067/mic.2000.107267.

Caporaso, J. G., J. Kuczynski, J. Stombaugh, K. Bittinger, F. D. Bushman, E. K. Costello, N. Fierer, A. G. Peña, J. K. Goodrich, J. I. Gordon, G. A. Huttley, S. T. Kelley, D. Knights, J. E. Koenig, R. E. Ley, C. A. Lozupone, D. McDonald, B. D. Muegge, M. Pirrung, J. Reeder, J. R. Sevinsky, P. J. Turnbaugh, W. A. Walters, J. Widmann, T. Yatsunenko, J. Zaneveld, and R. Knight. 2010. QIIME allows analysis of high-throughput community sequencing data. Nature Methods 7(5):335-336. doi:10.1038/nmeth.f.303.

Clark, R. P. 2009. Skin scales among airborne particles. Journal of Hygiene 72(01):47. doi:10.1017/S0022172400023196.

Clarke, K. R. 1993. Non-parametric multivariate analyses of changes in community structure. Austral Ecology 18(1):117-143. doi:10.1111/j.1442-9993.1993.tb00438.x.

Climo, M. W., K. A. Sepkowitz, G. Zuccotti, V. J. Fraser, D. K. Warren, T. M. Perl, K. Speck, J. A. Jernigan, J. R. Robles, and E. S. Wong. 2009. The effect of daily bathing with chlorhexidine on the acquisition of methicillin-resistant Staphylococcus aureus, vancomycin-resistant Enterococcus, and healthcare-associated bloodstream infections: Results of a quasi-experimental multicenter trial. Critical Care Medicine 37(6):1858-1865. doi:10.1097/CCM.0b013e31819ffe6d.

Cole, E. C., R. M. Addison, J. R. Rubino, K. E. Leese, P. D. Dulaney, M. S. Newell, J. Wilkins, D. J. Gaber, T. Wineinger, and D. A. Criger. 2003. Investigation of antibiotic and antibacterial agent cross-resistance in target bacteria from homes of antibacterial product users and nonusers. Journal of Applied Microbiology 95(4):664-676. doi:10.1046/j.1365-2672.2003.02022.x.

Colebrook, L., and W. C. Cawston. 1948. Microbic content of air on roof of city hospital, at street level, and in wards. Medical Research Council, Special Report 262:233-241.

Costello, E. K., C. L. Lauber, M. Hamady, N. Fierer, J. I. Gordon, and R. Knight. 2009. Bacterial community variation in human body habitats across space and time. Science 326(5960):1694- 1697. doi:10.1126/science.1177486.

Cvjetanović, B. 1957. Determination of bacterial air pollution in various premises. Journal of Hygiene 56(02):163. doi:10.1017/S0022172400037657.

D’Amato, G., G. Liccardi, M. D’Amato, and S. Holgate. 2005. Environmental risk factors and allergic bronchial asthma. Clinical & Experimental Allergy 35(9):1113-1124. doi:10.1111/j.1365- 2222.2005.02328.x.

Datta, P., H. Rani, J. Chander, and V. Gupta. 2009. Bacterial contamination of mobile phones of health care workers. Indian Journal of Medical Microbiology 27(3):279-281. doi:10.4103/0255- 0857.53222.

Doğan, M., B. Feyzioğlu, M. Ozdemir, and B. Baysal. 2008. Investigation of microbial colonization of computer keyboards used inside and outside hospital environments. Mikrobiyoloji Bülteni 42(2):331-336.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Drees, M., D. R. Snydman, C. H. Schmid, L. Barefoot, K. Hansjosten, P. M. Vue, M. Cronin, S. A. Nasraway, and Y. Golan. 2008a. Prior environmental contamination increases the risk of acquisition of vancomycin-resistant enterococci. Clinical Infectious Diseases 46(5):678-685. doi:10.1086/527394.

Drees, M., D. R. Snydman, C. H. Schmid, L. Barefoot, K. Hansjosten, P. M. Vue, M. Cronin, S. A. Nasraway, and Y. Golan. 2008b. Antibiotic exposure and room contamination among patients colonized with vancomycin-resistant enterococci. Infection Control and Hospital Epidemiology 29(8):709-715. doi:10.1086/589582.

Drudge, C. N., S. Krajden, R. C. Summerbell, and J. A. Scott. 2011. Detection of antibiotic resistance genes associated with methicillin-resistant Staphylococcus aureus (MRSA) and coagulase-negative staphylococci in hospital air filter dust by PCR. Aerobiologia 28(2):285- 289. doi:10.1007/s10453-011-9219-x.

du Moulin, G. C., K. D. Stottmeier, P. A. Pelletier, A. Y. Tsang, and J. Hedley-Whyte. 1988. Concentration of Mycobacterium avium by hospital hot water systems. Journal of the American Medical Association 260(11):1599-1601.

Eber, M. R., M. Shardell, M. L. Schweizer, R. Laxminarayan, and E. N. Perencevich. 2011. Seasonal and temperature-associated increases in gram-negative bacterial bloodstream infections among hospitalized patients. PLoS ONE 6(9):e25298. doi:10.1371/journal.pone.0025298.

Embil, J., P. Warren, M. Yakrus, R. Stark, S. Corne, D. Forrest, and E. Hershfield. 1997. Pulmonary illness associated with exposure to mycobacterium-avium complex in hot tub water. Hypersensitivity pneumonitis or infection? Chest 111(3):813-816.

Escombe, A. R., C. C. Oeser, R. H. Gilman, M. Navincopa, E. Ticona, W. Pan, C. Martínez, J. Chacaltana, R. Rodríguez, D. A. Moore, J. S. Friedland, and C. A. Evans. 2007. Natural ventilation for the prevention of airborne contagion. PLoS Medicine 4(2):e68. doi:10.1371/journal. pmed.0040068.

Exner, M., V. Vacata, B. Hornei, E. Dietlein, and J. Gebel. 2004. Household cleaning and surface disinfection: New insights and strategies. Journal of Hospital Infection 56(Suppl 2):S70-S75. doi:10.1016/j.jhin.2003.12.037.

Fahlgren, C., G. Bratbak, R.-A. Sandaa, R. Thyrhaug, and U. Li Zweifel. 2010. Diversity of airborne bacteria in samples collected using different devices for aerosol collection. Aerobiologia 27(2):107-120. doi:10.1007/s10453-010-9181-z.

Falkinham, J. O., C. D. Norton, and M. W. LeChevallier. 2001. Factors influencing numbers of Mycobacterium avium, Mycobacterium intracellulare, and other mycobacteria in drinking water distribution systems. Applied and Environmental Microbiology 67(3):1225-1231. doi:10.1128/AEM.67.3.1225-1231.2001.

Farnsworth, J. E., S. M. Goyal, S. W. Kim, T. H. Kuehn, P. C. Raynor, M. A. Ramakrishnan, S. Anantharaman, and W. Tang. 2006. Development of a method for bacteria and virus recovery from heating, ventilation, and air conditioning (HVAC) filters. Journal of Environmental Monitoring 8(10):1006. doi:10.1039/b606132j.

Field, D., G. Garrity, T. Gray, N. Morrison, J. Selengut, P. Sterk, T. Tatusova, N. Thomson, M. J. Allen, S. V. Angiuoli, M. Ashburner, N. Axelrod, S. Baldauf, S. Ballard, J. Boore, G. Cochrane, J. Cole, P. Dawyndt, P. De Vos, C. dePamphilis, R. Edwards, N. Faruque, R. Feldman, J. Gilbert, P. Gilna, F. O. Glöckner, P. Goldstein, R. Guralnick, D. Haft, D. Hancock, H. Hermjakob, C. Hertz-Fowler, P. Hugenholtz, I. Joint, L. Kagan, M. Kane, J. Kennedy, G. Kowalchuk, R. Kottmann, E. Kolker, S. Kravitz, N. Kyrpides, J. Leebens-Mack, S. E. Lewis, K. Li, A. L. Lister, P. Lord, N. Maltsev, V. Markowitz, J. Martiny, B. Methe, I. Mizrachi, R. Moxon, K. Nelson, J. Parkhill, L. Proctor, O. White, S. Assunta-Sansone, A. Spiers, R. Stevens, P. Swift, C. Taylor, Y. Tateno, A. Tett, S. Turner, D. Ussery, B. Vaughan, N. Ward, T. Whetzel, I. San Gil, G. Wilson, and A. Wipat. 2008. The Minimum Information About a Genome Sequence (MIGS) Specification. Nature Biotechnology 26(5):541-547. doi:10.1038/nbt1360.

Fierer, N., C. L. Lauber, N. Zhou, D. McDonald, E. K. Costello, and R. Knight. 2010. Forensic identification using skin bacterial communities. Proceedings of the National Academy of Sciences of the USA 107(14):6477-6481. doi:10.1073/pnas.1000162107.

Finch, J. E., J. Prince, and M.Hawksworth. 1978. A bacteriological survey of the domestic environment. Journal of Applied Microbiology 45(3):357-364. doi:10.1111/j.1365-2672.1978. tb04236.x.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Fleischer, M. 2006. Microbiological control of airborne contamination in hospitals. Indoor and Built Environment 15(1):53-56. doi:10.1177/1420326X06062230.

Fox, K., A. Fox, T. Elssner, C. Feigley, and D. Salzberg. 2010. MALDI-TOF mass spectrometry speciation of staphylococci and their discrimination from micrococci isolated from indoor air of schoolrooms. Journal of Environmental Monitoring 12(4):917-923. doi:10.1039/b925250a.

Genet, C., G. Kibru, and W. Tsegaye. 2011. Indoor air bacterial load and antibiotic susceptibility pattern of isolates in operating rooms and surgical wards at Jimma University Specialized Hospital, southwest Ethiopia. Ethiopian Journal of Health Sciences 21(1):9-17.

Gilbert, J. A., F. Meyer, D. Antonopoulos, P. Balaji, C. T. Brown, N. Desai, J. A. Eisen, D. Evers, D. Field, W. Feng, D. Huson, J. Jansson, R. Knight, J. Knight, E. Kolker, K. Konstantindis, J. Kostka, N. Kyrpides, R. Mackelprang, A. McHardy, C. Quince, J. Raes, A. Sczyrba, A. Shade, and R. Stevens. 2010a. Meeting report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project. Standards in Genomic Sciences 3(3):243-248. doi:10.4056/sigs.1433550.

Gilbert, J. A., F. Meyer, J. Jansson, J. Gordon, N. Pace, J. Tiedje, R. Ley, N. Fierer, D. Field, N. Kyrpides, F-O. Glöckner, H-P. Klenk, K. E. Wommack, E. Glass, K. Docherty, R. Gallery, Rick Stevens, and R. Knight. 2010b. The Earth Microbiome Project: Meeting report of the 1st EMP Meeting on Sample Selection and Acquisition at Argonne National Laboratory, October 6, 2010. Standards in Genomic Sciences 3(3):249-253. doi:10.4056/aigs.1443528.

Gilbert, J. A., M. Bailey, D. Field, N. Fierer, J. A. Fuhrman, B. Hu, J. Jansson, R. Knight, G. A. Kowalchuk, N. C. Kyrpides, F. Meyer, and R. Stevens. 2011. The Earth Microbiome Project: The Meeting Report for the 1st International Earth Microbiome Project Conference, Shenzhen, China, June 13-15, 2011. Standards in Genomic Sciences 5(2):243-247. doi:10.4056/sigs.2134923.

Gilbert, J. A., Y. Bao, H. Wang, S-A. Sansone, S. C. Edmunds, N. Morrison, F. Meyer, L. M. Schriml, N. Davies, P. Sterk, J. Wilkening, G. M. Garrity, D. Field, R. Robbins, D. P. Smith, I. Mizrachi, and C. Moreau. 2012. Report of the 13th Genomic Standards Consortium Meeting, Shenzhen, China, March 4-7, 2012. Standards in Genomic Sciences 6(2):276-286. doi:10.4056/sigs.2876184.

Goecks, J., A. Nekrutenko, J. Taylor, and The Galaxy Team. 2010. Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology 11(8):R86. doi:10.1186/gb-2010-11-8-r86.

Greene, V. W., D. Vesley, R. G. Bond, and G. S. Michaelsen. 1962a. Microbiological contamination of hospital air. I. Quantitative studies. Applied Environmental Microbiology 10:561-566.

Greene, V. W., D. Vesley, R. G. Bond, and G. S. Michaelsen. 1962b. Microbiological contamination of hospital air. II. Qualitative studies. Applied Environmental Microbiology 10:567-571.

Grice, E. A., H. H. Kong, S. Conlan, C. B. Deming, J. Davis, A. C. Young, NISC Comparative Sequencing Program, G. G. Bouffard, R. W. Blakesley, P. R. Murray, E. D. Green, M. L. Turner, and J. A. Segre. 2009. Topographical and temporal diversity of the human skin microbiome. Science 324(5931):1190-1192. doi:10.1126/science.1171700.

Grice, E. A., and J. A. Segre. 2011. The skin microbiome. Nature Reviews Microbiology 9(4):244-253. doi:10.1038/nrmicro2537.

Griffin, D. W., C. Gonzalez, N. Teigell, T. Petrosky, D. E. Northup, and M. Lyles. 2010. Observations on the use of membrane filtration and liquid impingement to collect airborne microorganisms in various atmospheric environments. Aerobiologia 27(1):25-35. doi:10.1007/s10453-010-9173-z.

Groseclose, S. L., W. S. Brathwaite, P. A. Hall, F. J. Connor, P. Sharp, W. J. Anderson, R. F. Fagan, J. J. Aponte, G. F. Jones, D. A. Nitschke, C. A. Worsham, N. Adekoya, M-H. Chang, T. Doyle, R. Dhara, and R. A. Jajosky. 2004. Summary of notifiable diseases—United States, 2002. Morbidity and Mortality Weekly Report 51(53):1-84.

Guenther, R., and G. Vittori. 2008. Sustainable healthcare architecture. Hoboken, NJ: John Wiley & Sons.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Hall-Baker, P. A., E. Nieves, R. A. Jajosky, D. A. Adams, P. Sharp, W. J. Anderson, J. J. Aponte, A. E. Aranas, S. B. Katz, M. Mayes, M. S. Wodajo, D. H. Onweh, J. Baillie, M. Park. 2010. Summary of notifiable diseases—United States, 2008. Morbidity and Mortality Weekly Report 57(54):1-100.

Hassoun, A., E. M. Vellozzi, and M. A. Smith. 2004. Colonization of personal digital assistants carried by healthcare professionals. Infection Control and Hospital Epidemiology 25(11):1000-1001. doi:10.1086/502334.

Hidron, A. I., J. R. Edwards, J. Patel, T. C. Horan, D. M. Sievert, D. A. Pollock, S. K. Fridkin, National Healthcare Safety Network Team, and Participating National Healthcare Safety Network Facilities. 2008. NHSN annual update: Antimicrobial-resistant pathogens associated with healthcare-associated infections: Annual summary of data reported to the National Healthcare Safety Network at the Centers for Disease Control and Prevention, 2006–2007. Infection Control and Hospital Epidemiology 29(11):996-1011. doi:10.1086/591861.

Hilton, A. C., and E. Austin. 2000. The kitchen dishcloth as a source of and vehicle for foodborne pathogens in a domestic setting. International Journal of Environmental Health Research 10(3):257-261. doi:10.1080/09603120050127202.

Hospodsky, D., J. Qian, W. W. Nazaroff, N. Yamamoto, K. Bibby, H. Rismani-Yazdi, and J. Peccia. 2012. Human occupancy as a source of indoor airborne bacteria. PLoS ONE 7(4):e34867. doi:10.1371/journal.pone.0034867.

Huang, L. L., I. F. Mao, M. L. Chen, and C. T. Huang. 2006. The microorganisms of indoor air in a teaching hospital. Taiwan Journal of Public Health 25(4):315-322.

Huang, S. S., R. Datta, and R. Platt. 2006. Risk of acquiring antibiotic-resistant bacteria from prior room occupants. Archives of Internal Medicine 166(18):1945-1951. doi:10.1001/archinte.166. 18.1945.

Huson, D. H., S. Mitra, H.-J. Ruscheweyh, N. Weber, and S. C. Schuster. 2011. Integrative analysis of environmental sequences using MEGAN4. Genome Research 21(9):1552-1560. doi:10.1101/gr.120618.111.

Huttunen, K., H. Rintala, M-R. Hirvonen, A. Vepsäläinen, A. Hyvärinen, T. Meklin, M. Toivola, and A. Nevalainen. 2008. Indoor air particles and bioaerosols before and after renovation of moisture-damaged buildings: The effect on biological activity and microbial flora. Environmental Research 107(3):291-298. doi:10.1016/j.envres.2008.02.008.

Hyvarinen, A., T. Meklin, A. Vepsäläinen, and A. Nevalainen. 2002. Fungi and actinobacteria in moisture-damaged building materials—concentrations and diversity. International Biodeterioration & Biodegradation 49(1):27-37. doi:10.1016/S0964-8305(01)00103-2.

Jaffal, A. A., I. M. Banat, A. A. El Mogheth, H. Nsanze, A. Bener, and A. S. Ameen. 1997. Residential indoor airborne microbial populations in the United Arab Emirates. Environment International 23(4):529-533. doi:10.1016/S0160-4120(97)00055-X. Josephson, K. L., J. R. Rubino, and I. L. Pepper. 1997. Characterization and quantification of bacterial pathogens and indicator organisms in household kitchens with and without the use of a disinfectant cleaner. Journal of Applied Microbiology 83(6):737-750.

Kaarakainen, P., H. Rintala, A. Vepsäläinen, A. Hyvärinen, A. Nevalainen, and T. Meklin. 2009. Microbial content of house dust samples determined with qPCR. Science of the Total Environment 407(16):4673-4680. doi:10.1016/j.scitotenv.2009.04.046.

Kassakian, S. Z., L. A. Mermel, J. A. Jefferson, S. L. Parenteau, and J. T. Machan. 2011. Impact of chlorhexidine bathing on hospital-acquired infections among general medical patients. Infection Control and Hospital Epidemiology 32(3):238-243. doi:10.1086/658334.

Kelley, S. T., U. Theisen, L. T. Angenent, A. St Amand, and N. R. Pace. 2004. Molecular analysis of shower curtain biofilm microbes. Applied and Environmental Microbiology 70(7):4187-4192. doi:10.1128/AEM.70.7.4187-4192.

Kembel, S. W., E. Jones, J. Kline, D. Northcutt, J. Stenson, A. M. Womack, B. J. M. Bohannan, G. Z. Brown, and J. L. Green. 2012. Architectural design influences the diversity and structure of the built environment microbiome. ISME Journal. doi:10.1038/ismej.2011.211.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Kilic, I. H., M. Ozaslan, I. D. Karagoz, Y. Zer, and V. Davutoglu. 2009. The microbial colonisation of mobile phone used by healthcare staffs. Pakistan Journal of Biological Sciences 12(11):882- 884. doi:10.3923/pjbs.2009.882.884.

Kim, J.-S., H.-S. Kim, J.-Y. Park, H.-S. Koo, C.-S. Choi, W. Song, H. C. Cho, and K. M. Lee. 2012. Contamination of X-ray cassettes with methicillin-resistant Staphylococcus aureus and methicillin-resistant Staphylococcus haemolyticus in a radiology department. Annals of Laboratory Medicine 32(3):206. doi:10.3343/alm.2012.32.3.206.

Klevens, R. M., J. R. Edwards, C. L. Richards Jr., T. C. Horan, R. P. Gaynes, D. A. Pollock, and D. M. Cardo. 2007. Estimating health care-associated infections and deaths in U.S. hospitals, 2002. Public Health Reports (Washington, D.C.: 1974) 122(2):160-166.

Klintberg, B., N. Berglund, G. Lilja, M. Wickman, and M. van Hage-Hamsten. 2001. Fewer allergic respiratory disorders among farmers’ children in a closed birth cohort from Sweden. European Respiratory Journal: Official Journal of the European Society for Clinical Respiratory Physiology 17(6):1151-1157.

Knights, D., J. Kuczynski, E. S. Charlson, J. Zaneveld, M. C. Mozer, R. G. Collman, F. D. Bushman, R. Knight, and S. T. Kelley. 2011. Bayesian community-wide culture-independent microbial source tracking. Nature Methods 8(9):761-763. doi:10.1038/nmeth.1650.

Kopperud, R. J., A. R. Ferro, and L. M. Hildemann. 2004. Outdoor versus indoor contributions to indoor particulate matter (PM) determined by mass balance methods. Journal of the Air & Waste Management Association (1995) 54(9):1188-1196.

Korves, T. M., Y. M. Piceno, L. M. Tom, T. Z. DeSantis, B. W. Jones, G. L. Andersen, and G. M. Hwang. 2012. Bacterial communities in commercial aircraft high-efficiency particulate air (HEPA) filters assessed by PhyloChip analysis. Indoor Air. doi:10.1111/j.1600- 0668.2012.00787.x.

Kottmann, R., T. Gray, S. Murphy, L. Kagan, S. Kravitz, T. Lombardot, D. Field, and F. O. Glöckner. 2008. A standard MIGS/MIMS compliant XML schema: Toward the development of the Genomic Contextual Data Markup Language (GCDML). OMICS: A Journal of Integrative Biology 12(2):115-121. doi:10.1089/omi.2008.0A10.

Kramer, A., I. Schwebke, and G. Kampf. 2006. How long do nosocomial pathogens persist on inanimate surfaces? A systematic review. BMC Infectious Diseases 6:130. doi:10.1186/1471-2334-6-130.

Krogulski, A., and M. Szczotko. 2011. Microbiological quality of hospital indoor air. Determinant factors for microbial concentration in air of operating theatres. Roczniki Państwowego Zakładu Higieny 62(1):109-113.

Larsen, P. E., D. Field, and J. A. Gilbert. 2012. Predicting bacterial community assemblages using an artificial neural network approach. Nature Methods. doi:10.1038/nmeth.1975.

Le Dantec, C., J.-P. Duguet, A. Montiel, N. Dumoutier, S. Dubrou, and V. Vincent. 2002. Occurrence of mycobacteria in water treatment lines and in water distribution systems. Applied and Environmental Microbiology 68(11):5318-5325. doi:10.1128/AEM.68.11.5318-5325.2002.

Lee, T. C., J. E. Stout, and V. L. Yu. 1988. Factors predisposing to Legionella pneumophila colonization in residential water systems. Archives of Environmental Health: An International Journal 43(1):59-62. doi:10.1080/00039896.1988.9934375.

Leoni, E., P. Legnani, M. T. Mucci, and R. Pirani. 1999. Prevalence of mycobacteria in a swimming pool environment. Journal of Applied Microbiology 87(5):683-688. doi:10.1046/j.1365- 2672.1999.00909.x.

Leoni, E., P. Legnani, M. A. Bucci Sabattini, and F. Righi. 2001. Prevalence of Legionella spp. in swimming pool environment. Water Research 35(15):3749-3753. doi:10.1016/S0043-1354(01) 00075-6.

Leynaert, B., C. Neukirch, D. Jarvis, S. Chinn, P. Burney, and F. Neukirch. 2001. Does living on a farm during childhood protect against asthma, allergic rhinitis, and atopy in adulthood? American Journal of Respiratory and Critical Care Medicine 164(10 Pt 1):1829-1834.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Lignell, U., T. Meklin, H. Rintala, A. Hyvärinen, A. Vepsäläinen, J. Pekkanen, and A. Nevalainen. 2008. Evaluation of quantitative PCR and culture methods for detection of house dust fungi and streptomycetes in relation to moisture damage of the house. Letters in Applied Microbiology 47(4):303-308. doi:10.1111/j.1472-765X.2008.02431.x.

Livornese, L. L., Jr., S. Dias, C. Samel, B. Romanowski, S. Taylor, P. May, P. Pitsakis, G. Woods, D. Kaye, and M. E. Levison. 1992. Hospital-acquired infection with vancomycin-resistant Enterococcus faecium transmitted by electronic thermometers. Annals of Internal Medicine 117(2):112-116.

Loh, W., V. V. Ng, and J. Holton. 2000. Bacterial flora on the white coats of medical students. Journal of Hospital Infection 45(1):65-68. doi:10.1053/jhin.1999.0702.

Lopez, P.-J., O. Ron, P. Parthasarathy, J. Soothill, and L. Spitz. 2009. Bacterial counts from hospital doctors’ ties are higher than those from shirts. American Journal of Infection Control 37 (1):79- 80. doi:10.1016/j.ajic.2008.09.018.

Lozupone, C., and R. Knight. 2005. UniFrac: A new phylogenetic method for comparing microbial communities. Applied and Environmental Microbiology 71(12):8228-8235. doi:10.1128/AEM.71.12.8228-8235.2005.

Marinella, M. A., C. Pierson, and C. Chenoweth. 1997. The stethoscope: A potential source of nosocomial infection? Archives of Internal Medicine 157(7):786-790. doi:10.1001/archinte.1997. 00440280114010.

Marshall, B. M., E. Robleto, T. Dumont, and S. B. Levy. 2012. The frequency of antibiotic-resistant bacteria in homes differing in their use of surface antibacterial agents. Current Microbiology 65(4):407-415. doi:10.1007/s00284-012-0172-x.

Martinez, F. D. 2001. The coming-of-age of the hygiene hypothesis. Respiratory Research 2(3):129- 132. doi:10.1186/rr48.

Merchant, J. A., A. L. Naleway, E. R. Svendsen, K. M. Kelly, L. F. Burmeister, A. M. Stromquist, C. D. Taylor, P. S. Thorne, S. J. Reynolds, W. T. Sanderson, and E. A. Chrischilles. 2005. Asthma and farm exposures in a cohort of rural Iowa children. Environmental Health Perspectives 113(3):350-356.

Meyer, F., D. Paarmann, M. D’Souza, R. Olson, E. Glass, M. Kubal, T. Paczian, A. Rodriguez, R. Stevens, A. Wilke, J. Wilkening, and R. A. Edwards. 2008. The metagenomics RAST Server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9(1):386. doi:10.1186/1471-2105-9-386.

Monto, A. S. 2002. Epidemiology of viral respiratory infections. American Journal of Medicine 112(6):4-12. doi:10.1016/S0002-9343(01)01058-0.

Morrow, J. B., A. S. Downey, and J. Peccia. 2012. Challenges in microbial sampling in the indoor environment. National Institutes of Standards and Technology.

Moschandreas, D. J. 1981. Exposure to pollutants and daily time budgets of people. Bulletin of the New York Academy of Medicine 57(10):845-859.

Myers, M. G. 1978. Longitudinal evaluation of neonatal nosocomial infections: Association of infection with a blood pressure cuff. Pediatrics 61(1):42-45.

Nevalainen, A., and M. Seuri. 2005. Of microbes and men. Indoor Air 15(s9):58-64. doi:10.1111/j. 1600-0668.2005.00344.x.

Noble, W. C., J. D. F. Habbema, R. Van Furth, I. Smith, and C. De Raay. 1976. Quantitative studies on the dispersal of skin bacteria into the air. Journal of Medical Microbiology 9(1):53-61. doi:10.1099/00222615-9-1-53.

Noris, F., J. A. Siegel, and K. A. Kinney. 2009. Biological and chemical contaminants in HVAC filter dust. ASHRAE Transactions 115(2):484-491.

Noris, F., J. A. Siegel, and K. A. Kinney. 2011. Evaluation of HVAC filters as a sampling mechanism for indoor microbial communities. Atmospheric Environment 45(2):338-346. doi:10.1016/j. atmosenv.2010.10.017.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

O’Horo, J. C., G. L. M. Silva, L. S. Munoz-Price, and N. Safdar. 2012. The efficacy of daily bathing with chlorhexidine for reducing healthcare-associated bloodstream infections: A meta-analysis. Infection Control and Hospital Epidemiology 33(3):257-267. doi:10.1086/664496.

Park, J. H., D. L. Spiegelman, H. A. Burge, D. R. Gold, G. L. Chew, and D. K. Milton. 2000. Longitudinal study of dust and airborne endotoxin in the home. Environmental Health Perspectives 108(11):1023-1028.

Paulson, D. S. 1993. Efficacy evaluaton of a 4% chlorhexidine gluconate as a full-body shower wash. American Journal of Infection Control 21(4):205-209. doi:10.1016/0196-6553(93)90033-Z. Peccia, J., D. K. Milton, T. Reponen, and J. Hill. 2008. A role for environmental engineering and science in preventing bioaerosol-related disease. Environmental Science & Technology 42(13):4631-4637. doi:10.1021/es087179e.

Perry, C., R. Marshall, and E. Jones. 2001. Bacterial contamination of uniforms. Journal of Hospital Infection 48(3):238-241. doi:10.1053/jhin.2001.0962.

Pitkäranta, M., T. Meklin, A. Hyvärinen, L. Paulin, P. Auvinen, A. Nevalainen, and H. Rintala. 2008. Analysis of fungal flora in indoor dust by ribosomal DNA sequence analysis, quantitative PCR, and culture. Applied and Environmental Microbiology 74(1):233-244. doi:10.1128/AEM.00692-07.

Pope, A. M., R. Patterson, H. Burge, and Institute of Medicine (U.S.). Committee on the Health Effects of Indoor Allergens. 1993. Indoor allergens assessing and controlling adverse health effects. Washington, DC: National Academy Press.

Popovich, K. J., B. Hota, R. Hayes, R. A. Weinstein, and M. K. Hayden. 2010. Daily skin cleansing with chlorhexidine did not reduce the rate of central-line associated bloodstream infection in a surgical intensive care unit. Intensive Care Medicine 36(5):854-858. doi:10.1007/s00134-010-1783-y.

Qian, H., and Y. Li. 2010. Removal of exhaled particles by ventilation and deposition in a multibed airborne infection isolation room. Indoor Air 20(4):284-297. doi:10.1111/j.1600- 0668.2010.00653.x.

Qian, J., D. Hospodsky, N. Yamamoto, W. W. Nazaroff, and J. Peccia. 2012. Size-resolved emission rates of airborne bacteria and fungi in an occupied classroom. Indoor Air. doi:10.1111/j. 1600-0668.2012.00769.x.

Remes, S. T., H. O. Koskela, K. Iivanainen, and J. Pekkanen. 2005. Allergen-specific sensitization in asthma and allergic diseases in children: The Study on Farmers’ and Non-farmers’ Children. Clinical and Experimental Allergy: Journal of the British Society for Allergy and Clinical Immunology 35(2):160-166. doi:10.1111/j.1365-2222.2005.02172.x.

Riedler, J., C. Braun-Fahrländer, W. Eder, M. Schreuer, M. Waser, S. Maisch, D. Carr, R. Schierl, D. Nowak, and E. von Mutius. 2001. Exposure to farming in early life and development of asthma and allergy: A cross-sectional survey. Lancet 358(9288):1129-1133. doi:10.1016/S0140-6736(01)06252-3.

Rintala, H., M. Pitkaranta, M. Toivola, L. Paulin, and A. Nevalainen. 2008. Diversity and seasonal dynamics of bacterial community in indoor environment. BMC Microbiology 8(1):56. doi:10.1186/1471-2180-8-56.

Rintala, H., M. Pitkäranta, and M. Täubel. 2012. Microbial communities associated with house dust. Advances in Applied Microbiology 78:75-120. Elsevier. Rook, G. A. W. 2009. Review series on helminths, immune modulation and the hygiene hypothesis: The broader implications of the hygiene hypothesis. Immunology 126(1):3-11. doi:10.1111/j. 1365-2567.2008.03007.x.

Rook, G. A. W., and J. L. Stanford. 1998. Give us this day our daily germs. Immunology Today 19(3):113-116. doi:10.1016/S0167-5699(98)80008-X.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Rusin, P., P. Orosz-Coughlin, and C. Gerba. 1998. Reduction of faecal coliform, coliform and heterotrophic plate count bacteria in the household kitchen and bathroom by disinfection with hypochlorite cleaners. Journal of Applied Microbiology 85(5):819-828.

Rutala, W. A., S. L. Barbee, N. C. Aguiar, M. D. Sobsey, and D. J. Weber. 2000. Antimicrobial activity of home disinfectants and natural products against potential human pathogens. Infection Control and Hospital Epidemiology 21(1):33-38. doi:10.1086/501694.

Safdar, N., J. Drayton, J. Dern, S. Warrack, M. Duster, and M. Schmitz. 2012. Telemetry leads harbor nosocomial pathogens. International Journal of Infection Control 8(2). doi:10.3396/ijic. v8i2.012.12.

Schabrun, S., L. Chipchase, and H. Rickard. 2006. Are therapeutic ultrasound units a potential vector for nosocomial infection? Physiotherapy Research International 11(2):61-71. doi:10.1002/pri.329.

Schloss, P. D., S. L. Westcott, T. Ryabin, J. R. Hall, M. Hartmann, E. B. Hollister, R. A. Lesniewski, B. B. Oakley, D. H. Parks, C. J. Robinson, J. W. Sahl, B. Stres, G. G. Thallinger, D. J. Van Horn, and C. F. Weber. 2009. Introducing Mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology 75(23):7537-7541. doi:10.1128/AEM.01541-09.

Schram, D., G. Doekes, M. Boeve, J. Douwes, J. Riedler, E. Ublagger, E. Mutius, J. Budde, G. Pershagen, F. Nyberg, J. Alm, C. Braun-Fahrländer, M. Waser, B. Brunekreef, and the PARSIFAL Study Group. 2005. Bacterial and fungal components in house dust of farm children, Rudolf Steiner school children and reference children—the PARSIFAL Study. Allergy 60(5):611-618. doi:10.1111/j.1398-9995.2005.00748.x.

Scott, E., S. F. Bloomfield, and C. G. Barlow. 1982. An investigation of microbial contamination in the home. Journal of Hygiene 89(2):279-293.

Scott, E., S. F. Bloomfield, and C. G. Barlow. 1984. Evaluation of disinfectants in the domestic environment under ‘in use’ conditions. Journal of Hygiene 92(2):193-203.

Sebastian, A., and L. Larsson. 2003. Characterization of the microbial community in indoor environments: A chemical-analytical approach. Applied and Environmental Microbiology 69(6):3103- 3109. doi:10.1128/AEM.69.6.3103-3109.2003.

Simard, C., M. Trudel, G. Paquette, and P. Payment. 1983. Microbial investigation of the air in an apartment building. Journal of Hygiene 91(2):277-286.

Snyder, G. M., K. A. Thom, J. P. Furuno, E. N. Perencevich, M.-C. Roghmann, S. M. Strauss, G. Netzer, and A. D. Harris. 2008. Detection of methicillin-resistant Staphylococcus aureus and vancomycin-resistant enterococci on the gowns and gloves of healthcare workers. Infection Control and Hospital Epidemiology 29(7):583-589. doi:10.1086/588701.

Stajich, J. E., T. Harris, B. P. Brunk, J. Brestelli, S. Fischer, O. S. Harb, J. C. Kissinger, W. Li, V. Nayak, D. F. Pinney, C. J. Stoekert Jr., and D. S. Roos. 2011. FungiDB: An integrated functional genomics database for fungi. Nucleic Acids Research 40(D1):D675-D681. doi:10.1093/nar/gkr918.

Stanley, N. J., T. H. Kuehn, S. W. Kim, P. C. Raynor, S. Anantharaman, M. A. Ramakrishnan, and Sagar M. Goyal. 2008. Background culturable bacteria aerosol in two large public buildings using HVAC filters as long term, passive, high-volume air samplers. Journal of Environmental Monitoring 10(4):474. doi:10.1039/b719316e.

Sudharsanam, S., P. Srikanth, M. Sheela, and R. Steinberg. 2008. Study of the indoor air quality in hospitals in South Chennai, India—microbial profile. Indoor and Built Environment 17(5):435- 441. doi:10.1177/1420326X08095568.

Taitt, C. R., T. Leski, D. Stenger, G. J. Vora, B. House, M. Nicklasson, G. Pimentel, D. V. Zurawski, B. C. Kirkup, D. Craft, P. E. Waterman, E. P. Lesho, U. Bangurae, and R. Ansumana. 2012. Antimicrobial resistance determinant microarray for analysis of multi-drug resistant isolates. SPIE 8371: 83710X-83710X-10.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Tang, J. W. 2009. The effect of environmental parameters on the survival of airborne infectious agents. Journal of The Royal Society Interface 6(Suppl_6):S737-S746. doi:10.1098/rsif.2009.0227. focus.

Täubel, M., H. Rintala, M. Pitkäranta, L. Paulin, S. Laitinen, J. Pekkanen, A. Hyvärinen, and A. Nevalainen. 2009. The occupant as a source of house dust bacteria. Journal of Allergy and Clinical Immunology 124(4):834-840.e47. doi:10.1016/j.jaci.2009.07.045.

Thomas, V., K. Herrera-Rimann, D. S. Blanc, and G. Greub. 2006. Biodiversity of amoebae and amoeba-resisting bacteria in a hospital water network. Applied and Environmental Microbiology 72(4):2428-2438. doi:10.1128/AEM.72.4.2428-2438.2006.

Treakle, A. M., K. A. Thom, J. P. Furuno, S. M. Strauss, A. D. Harris, and E. N. Perencevich. 2009. Bacterial contamination of health care workers’ white coats. American Journal of Infection Control 37(2):101-105. doi:10.1016/j.ajic.2008.03.009.

Tringe, S. G., T. Zhang, X. Liu, Y. Yu, W. H. Lee, J. Yap, F. Yao, S. T. Suan, S. K. Ing, M. Haynes, F. Rohwer, C. L. Wei, P. Tan, J. Bristow, E. M. Rubin, and Y. Ruan. 2008. The airborne metagenome in an indoor urban environment. PLoS ONE 3(4):e1862. doi:10.1371/journal. pone.0001862.

Ulger, F., S. Esen, A. Dilek, K. Yanik, M. Gunaydin, and H. Leblebicioglu. 2009. Are we aware how contaminated our mobile phones with nosocomial pathogens? Annals of Clinical Microbiology and Antimicrobials 8(1):7. doi:10.1186/1476-0711-8-7.

Vaerewijck, M. J. M., G. Huys, J. Carlos Palomino, J. Swings, and F. Portaels. 2005. Mycobacteria in drinking water distribution systems: Ecology and significance for human health. FEMS Microbiology Reviews 29(5):911-934. doi:10.1016/j.femsre.2005.02.001.

Vernon, M. O., M. K. Hayden, W. E. Trick, R. A. Hayes, D. W. Blom, and R. A. Weinstein. 2006. Chlorhexidine gluconate to cleanse patients in a medical intensive care unit: The effectiveness of source control to reduce the bioburden of vancomycin-resistant Enterococci. Archives of Internal Medicine 166(3):306-312. doi:10.1001/archinte.166.3.306.

Wang, Q., G. M. Garrity, J. M. Tiedje, and J. R. Cole. 2007. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology 73(16):5261-5267. doi:10.1128/AEM.00062-07.

Warner, P., and A. Glassco. 1963. Enumeration of air-borne bacteria in hospital. Canadian Medical Association Journal 88:1280-1283.

Whittaker, R. H. 1960. Vegetation of the Siskiyou Mountains, Oregon and California. Ecological Monographs 30(3):279. doi:10.2307/1943563.

Whittaker, R. H. 1972. Evolution and measurement of species diversity. Taxon 21(2/3):213. doi: 10.2307/1218190.

Wiener-Well, Y., M. Galuty, B. Rudensky, Y. Schlesinger, D. Attias, and A. M. Yinnon. 2011. Nursing and physician attire as possible source of nosocomial infections. American Journal of Infection Control 39(7):555-559. doi:10.1016/j.ajic.2010.12.016.

Williams, R. E. O., and A. Hirch. 1950. Bacterial contamination of air in underground trains. Lancet 1(6595):128-131. doi:10.1016/S0140-6736(50)90081-X. Williams, R. E. O., O. M. Lidwell, and A. Hirch. 1956. The bacterial flora of the air of occupied rooms. Journal of Hygiene 54(04):512. doi:10.1017/S002217240004479X. Wong, D., K. Nye, and P. Hollis. 1991. Microbial flora on doctors’ white coats. BMJ 303(6817):1602- 1604. doi:10.1136/bmj.303.6817.1602.

Yamada, K. 2007. A study on the behavior and control of indoor airborne microbe in a clinic. Journal of the National Institute of Public Health 56(3):300-302.

Yazdanbakhsh, M., and P. M. Matricardi. 2004. Parasites and the hygiene hypothesis: Regulating the immune system? Clinical Reviews in Allergy & Immunology 26(1):15-24. doi:10.1385/CRIAI:26:1:15.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Yilmaz, P., R. Kottmann, D. Field, R. Knight, J. R. Cole, L. Amaral-Zettler, J. A. Gilbert, I. Karsch-Mizrachi, A. Johnston, G. Cochrane, R. Vaughan, C. Hunter, J. Park, N. Morrison, P. Rocca-Serra, P. Sterk, M. Arumugam, M. Bailey, L. Baumgartner, B. W. Birren, M. J. Blaser, V. Bonazzi, T. Booth, P. Bork, F. D. Bushman, P. L. Buttigieg, P. S. G. Chain, E. Charlson, E. K. Costello, H. Huot-Creasy, P. Dawyndt, T. DeSantis, N. Fierer, J. A. Fuhrman, R. E. Gallery, D. Gevers, R. A. Gibbs, I. San Gil, A. Gonzalez, J. I. Gordon, R. Guralnick, W. Hankeln, S. Highlander, P. Hugenholtz, J. Jansson, A. L. Kau, S. T. Kelley, J. Kennedy, D. Knights, O. Koren, J. Kuczynski, N. Kyrpides, R. Larsen, C. L. Lauber, T. Legg, R. E. Ley, C. A. Lozupone, W. Ludwig, D. Lyons, E. Maguire, B. A. Methé, F. Meyer, B. Muegge, S. Nakielny, K. E. Nelson, D. Nemergut, J. D. Neufeld, L. K. Newbold, A. E. Oliver, N. R. Pace, G. Palanisamy, J. Peplies, J. Petrosino, L. Proctor, E. Pruesse, C. Quast, J. Raes, S. Ratnasingham, J. Ravel, D. A. Relman, S. Assunta-Sansone, P. D. Schloss, L. Schriml, R. Sinha, M. I. Smith, E. Sodergren, A. Spor, J. Stombaugh, J. M. Tiedje, D. V. Ward, G. M. Weinstock, D. Wendel, O. White, A. Whiteley, A. Wilke, J. R. Wortman, T. Yatsunenko, and F. O. Glöckner. 2011. Minimum Information about a Marker Gene Sequence (MIMARKS) and Minimum Information About Any (x) Sequence (MIxS) Specifications. 2011. Nature Biotechnology 29(5):415-420. doi:10.1038/nbt.1823.

Zachary, K. C., P. S. Bayne, V. J. Morrison, D. S. Ford, L. C. Silver, and D. C. Hooper. 2001. Contamination of gowns, gloves, and stethoscopes with vancomycin-resistant Enterococci. Infection Control and Hospital Epidemiology 22(9):560-564. doi:10.1086/501952.



Susan M. Huse,22David B. Mark Welch, and Mitchell L. Sogin


Our understanding of microbial communities is in a time of rapid change. The application of polymerase chain reaction (PCR), cloning, and DNA sequencing to microbial diversity research has rapidly expanded our appreciation of the extent of the microbial world. In particular, analysis of PCR amplicons from various regions of the small subunit ribosomal RNA (SSU rRNA or 16S) gene generated from culture-independent samples is now the accepted standard for cataloguing microbial communities. As sequencing technologies improved, it became feasible to assess community membership from more than 1,000 individual SSU rRNA amplicons. With the advent of next-generation sequencing (NGS) that did not require the cloning of individual amplicons, researchers transitioned from generating thousands of ~800–1,100 nt reads to hundreds of thousands of 100–500 nt sequencing reads (454 technology). Illumina technology can now


22 Brown University.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

produce millions of 100 nt reads from hundreds of samples in a single run, potentially providing a nearly exhaustive survey of microbes present in a sample.

While the NGS technologies provide deeper sampling, the trade-off for depth has been shorter read lengths. The ~800–1,100 nt reads produced by the late 1990s using Sanger sequencing on ABI or LICOR platforms could be used to reconstruct the entire SSU rRNA gene through multiple sequencing of the same clone. To make use of the shorter reads produced by NGS technology, Sogin et al. capitalized on the structure of the SSU rRNA gene. The gene includes a series of regions that are highly conserved across the bacterial domain, interspersed with a series of nine hypervariable regions. This structure lends itself conveniently to NGS because oligonucleotide primers that target conserved regions on either side of the hypervariable regions can amplify DNA from across the bacterial domain. The more rapidly evolving hypervariable regions in contrast are unique for most microbial genera and in many cases can differentiate below the genus level. Sogin et al. used primers to conserved flanking regions to amplify the V6 hypervariable region, which at 60–80 nt in length could reliably be completely sequenced on a 454 GS20. As with conventionally sequenced clone libraries, each read in principle represents an SSU rRNA operon and is a proxy for a microbe from the sample. By comparing the hypervariable region sequences against databases of SSU rRNA gene sequences from known taxonomy, such as RDP (Wang et al., 2007), SILVA (Pruesse et al., 2007), or Greengenes (DeSantis et al., 2006), the reads become tags for cataloging the taxonomy of the community being studied.

As the technology for microbial community research has evolved, so has our understanding of the communities we study. In the first published study implementing NGS in environmental samples, Sogin et al. (2006) examined several marine environments and discovered a richness and diversity in microbial community structures previously unknown. Each community exhibited a relatively small number of highly abundant taxa and a large number of low abundance taxa, a pattern often described as a long-tail distribution (Figure A6-1). Because of the unevenness of this community structure, previous studies using hundreds or thousands of sequencing reads were able to identify only the most abundant members and a small fraction of the taxa in the long tail. The greater sequencing depth of NGS methods revealed the breadth of the low abundance taxa—the “rare biosphere.”

Impact of Sequencing Errors and Clustering Methods

As NGS provided a means to explore ever deeper into microbial community structures, the gap between the number of named species and the number of sequence phylotypes increased. Unfortunately, the short read lengths of NGS technology, especially when applied to hypervariable regions with few stable phylogenetically informative positions, are poorly suited for the traditional phylogenetic analyses required for registering new taxa. Researchers turned to

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

taxonomic-independent sequence clustering methods for characterizing microbial communities. By assuming that very similar sequences represent closely related organisms, and that more divergent sequences represent more distantly related organisms, the sequences can be clustered into groups of similar organisms, each cluster or “operational taxonomic unit” (OTU) presumed to represent a phylotype (Schloss and Handelsman, 2005). The width of the clustering, meaning the percent identity threshold for sequence tags to be placed in the same OTU, represents the similarity of the microbes in each OTU.

A critical element in taxonomy-independent analyses of diversity is sequencing error. Random errors can be tolerated in an assembly project where the goal is a consensus sequence. In OTU clustering, however, each read is assumed to represent an individual organism, and if a read has sufficient errors, then it will not cluster with its template, instead forming a new, spurious OTU. Thus, if not filtered out or unaccounted for, sequencing error can lead to inflation in the number of OTUs attributed to a community.

To address the issue of sequence quality, several authors developed quality filtering (Huse et al., 2007) and data de-noising (Quince et al., 2009) techniques for processing raw 454 sequencing reads to reduce sequencing errors and thereby reduce OTU inflation. In 2009, in a paper titled Wrinkles in the Rare Biosphere, Kunin et al. (2010) highlighted the impact of error rates on OTU analyses by sequencing a single strain of E. coli and generating more than 600 OTUs. Reeder and Knight followed this with a “News and Views” piece: The ‘rare biosphere’: A reality check (Reeder and Knight, 2009). The combination of these two publications spotlighted the very real concerns about the impact of sequencing error on microbial community diversity estimates. They posed a critical question: Is the rare biosphere real or simply an artifact of sequencing errors?

Even with very high-quality sequencing and stringent quality filtering, the depth of sampling afforded by NGS technology leads to more absolute OTU inflation than the earlier Sanger sequencing. This is for two reasons: First, when processing data from sequencing hundreds to thousands of Sanger capillary reads, individual chromatograms are often read by hand and confirmed by forward and reverse reads resulting in very high-quality sequence assemblies. While it is in principle possible to develop a similar skill reading 454 flowgrams, it is not conceivable to hand-edit hundreds of thousands of reads. In this regard it is worth noting that the generally accepted error rate in automated high-throughput capillary sequencing is 1 percent (i.e., an average Phred score of 20). The second reason is that the sheer number of reads produced by NGS will result in more spurious OTUs even if the error rate is much lower. Most OTU clustering methods are based only on the percent identity between sequences. If a read has sufficient errors that the difference between it and its template is greater than the clustering threshold, the clustering algorithm will place it in a new OTU. If a sequencing error rate leads to 1 read per thousand that fails to cluster with its template, a traditional clone library Sanger project with 1,000 reads will have, on average,

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

one spurious OTU. With NGS, with the same error rate, a data set of 100,000 reads will have, on average, 100 spurious OTUs. In practice, quality-controlled NGS reads tend to have a lower overall error rate (Schloss et al., 2011) than automated ABI capillary sequencing. So while, the relative rate of OTU inflation per read may have dropped with NGS, the absolute number of spurious OTUs has increased considerably because of dramatically increased depths of sequencing.

As it turns out, though, much of the OTU inflation observed in NGS projects was not due to sequencing errors. Ironing out the Wrinkles in the Rare Biosphere (Huse et al., 2010) demonstrated that the most commonly used method for OTU generation dramatically compounded the problem of sequencing error. Following on the Kunin technique, Huse et al. clustered DNA amplified from a single E. coli gene, and using only sequences with an error rate below the clustering threshold, showed that switching from a single multiple sequence (MS) alignment to multiple pairwise (PW) alignments, and from complete linkage (CL) clustering to average linkage (AL) clustering reduced the OTU inflation from 599 OTUs to 24. They introduced a single-linkage preclustering (SLP) to smooth errors prior to clustering. Using SLP-PWAL for clustering brought the OTU count to 1.

The effect of reads with more errors than the clustering threshold still needs to be taken into consideration. With established, simple sequence quality filtering, SLP clustering of reads generated on the Roche Genome Sequencer FLX platform from amplicons of the V6 SSU rRNA hypervariable region, errant reads produce spurious singleton OTUs at a rate of ~1 spurious OTU per 1,000 reads. When applied to analysis of control communities of limited diversity this rate can sound alarming. Processing 50,000 reads of a control community with 40 known members would generate 40 + 50,000/1,000 = 90 OTUs. However it is important to recognize that the number of spurious OTUs produced scales with the sequencing depth, not the complexity of the community. If a biological community sampled to a depth of 50,000 reads were to have 1,000 observed OTUs, then 50 of these would be due to sequencing error. As shown in Table A6-1, algorithm choice has a much greater effect on OTU inflation than sequencing error.

As NGS technologies produce longer reads (> 500 nt with Roche/454 and 300 nt with the Illumina MiSeq at the time of writing) the same error rate results in fewer errant reads generating new OTUs, because more errors per read are required for a sequence to fail to cluster with its template. Instead, a second type of error becomes of increasing importance: chimeric sequences from two or more templates during amplification. Chimeras are generated when a template is incompletely replicated during the elongation step of PCR. This truncated sequence can hybridize to other targets in subsequent PCR cycles and act as a primer, generating a single sequence from multiple templates. The frequency of chimeras scales with SSU rRNA amplicon length (Huber et al., 2009). The amplification of sample DNA is largely independent of sequencing platform (though the effect of platform-specific primer adapters has not been thoroughly explored) and several modifications of standard PCR result in reduced chimera formation (Acinas et al., 2004; Lahr and Katz, 2009; Qiu et al., 2001). However, chimeras remain

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

TABLE A6-1 OTU Inflation Due to Clustering Algorithm and Sequencing Error

Sample Number of Reads MS-CL OTUs SLP-PWAL OTUs Estimated Spurious OTUs Estimated Sequencing Inflation Estimated Algorithm Inflation
Deep-sea vent Archaea 63,133 709 470 63 15% 59%
English Channel 12,851 1,154 859 13 1.5% 35%
Human gut 15,239 803 566 15 2.7% 43%
Sewage 33,082 2,383 1,831 33 1.8% 31%
North Atlantic deep water 15,497 1,713 1,339 15 1.1% 28%

NOTE: OTUs found using multiple sequence alignment + complete linkage (MS-CL) and single linkage preclustering followed by pairwise average linkage (SLP-PWAL) in four example V6 data sets sequenced using a 454 GSFLX. The number of spurious SLP-PWAL OTUs is estimated to be 1 for every 1,000 sequence reads. The number of true OTUs is estimated to be the number of SLP-PWAL OTUs minus the estimated spurious OTUs. The estimated sequencing inflation is the ratio of estimated spurious OTUs to the number of estimated true OTUs. The estimated algorithm inflation is the ratio of the number of OTUs generated using MS-CL minus the number of spurious OTUs due to sequencing to the number of estimated true OTUs.
SOURCE: Adapted from Table 1 in Huse et al. (2010).

in most heterogeneous amplicon pools. Several methods have been developed for identifying and removing chimeric reads. Haas et al. (2011) developed Chimera Slayer to remove chimeras by comparing each sequence read against a curated database of non-chimeric, SSU rRNA genes. Quince et al. (2011) and Edgar et al. (2011) have developed chimera checkers (Perseus and UChime, respectively) using reference comparison, but optimized for shorter NGS reads. UChime also performs a de novo check by comparing triplets of reads in a data set to see if any reads appear to be a combination of two other reads from the same amplicon pool. Because chimeras have large sections of sequence substitutions, we can conservatively presume that each unique chimera will likely create a new OTU during clustering. Chimera checking, therefore, is just as important for improving OTU richness estimates as basic sequence quality filtering.

Sequencing and Clustering Best Practices

DNA Amplification

Given the known limitations of both NGS technologies and of clustering methods, it is particularly important to exercise best practices in both generation and use of NGS data. The first way to reduce the impact of sequencing errors on microbial ecology investigations is to reduce the rates of base incorporation errors and chimera formation in the DNA amplification step. This includes the use

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

of a high-fidelity polymerase such as Platinum Taq, reducing contaminants that interfere with polymerase processivity or proofreading, minimizing the number of PCR cycles, and optimizing the amount of input DNA (Acinas et al., 2004; Lahr and Katz, 2009; Qiu et al., 2001). Technical replicates of the amplification step facilitate discrimination of novel sequences from high-frequency errors or chimeras.

Quality Filtering

Several authors have provided extensive analyses of quality filtering methods for both 454 and Illumina sequencing technologies (Huse et al., 2007; Meacham et al., 2011; Minoche et al., 2011; Quince et al., 2011; Schloss et al., 2011). In brief, removing reads with ambiguous bases (Ns), with low-quality scores, that are truncated, and that have known mismatches in the primer region are computationally simple means of decreasing the error rate several-fold. More computationally intensive algorithms such as AmpliconNoise (Quince et al., 2011) (implemented either directly or as implemented in QIIME [Caporaso et al., 2010] or mothur [Schloss et al., 2009]) can be used as the first step for quality filtering pyrosequencing data. Illumina paired-end technology allows another, more traditional approach to quality filtering: each amplicon is sequenced in both directions, so if the amplicon length is less than twice the read length, overlap of the complementary sequences can be used to assess accuracy (Bartram et al., 2011; Gloor et al., 2010). If the two reads overlap completely, meaning the amplicon length is less than or equal to the read length, requiring no mismatches could lead to data sets with little or no sequencing error, although systematic errors that are the same in both directions, could still exist.

Sequence quality filtering should be followed by chimera checking. Using a reference database for chimera detection is the standard method for identifying and removing chimeras and is very effective. Unfortunately, sequencing of novel environments is fast outstripping curated database growth, and novel genes that are parents of chimeras will be missed by reference comparison methods. A combination of both reference comparison and de novo chimera checking should always be used.

Smoothing Imperfect Data by Aggregation

Even with the best quality filtering and chimera checking, sequencing data will still contain reads with base incorporation and base calling errors, chimeric reads, reads from contaminating DNA, and reads from amplification of non-target areas of sample DNA. The methods chosen for downstream analysis of the data will determine the degree of impact these errors will have on research results.

Assigning reads to their closest match in a database of sequences annotated with defined taxonomy is a simple, straightforward way to minimize the impact of small sequence differences, segregate chimeras, identify contaminants, and

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

eliminate reads from non-target amplification. For instance, many of the reference SSU rRNA databases provide taxonomy primarily to the genus level. While this may not be sufficient resolution for some analyses, the use of genus-level taxonomy for analysis will assign similar sequences to the same genus, so that sequences with even a moderate number of errors are likely to still be classified together with their template. Chimeras of sequences within the same genus will remain in that genus, and chimeras of sequences from different genera within a family will tend to not be classified at the genus level; these cross-genera chimeras will aggregate as sequences classified to the family level but no further. The presence of unexpected genera, such as Ralstonia in deep-sea sediment samples, can indicate contamination.

Routinely assigning taxonomy to all sequence reads has the additional advantage of quickly identifying non-target reads. Occasionally, the SSU rRNA primers can amplify DNA from a section of the genome other than the SSU rRNA gene. The resulting amplicons will be quite divergent from any 16S reference gene. Finally, PCR primers designed to be specific for domains or other groups often amplify the rRNA SSU gene from a subset of species outside that group, generally in a non-quantitative manner. Reads that map to taxa outside the group to which the primers were designed can easily be eliminated from downstream analyses.

The other common way to aggregate similar sequences is to cluster them into OTUs based on percent similarity, as described previously. This is often done after assigning taxonomy to reads so that reads from non-template amplicons or from taxonomic groups outside the target range of the primers can be eliminated. Clustering reads based on a similarity score of 97 percent has become a common way of approximating “species” or phylotype in the absence of taxonomic resolution. However, it is important to note that different algorithms create very different 97 percent OTUs, and some of these methods lead to OTU inflation, as discussed previously. Among those that do not inflate OTU counts are SLP-PWAL clustering, which uses a nearest-neighbor approach to link sequences likely to be derived from base incorporation error with average neighbor linkage to form OTUs, and methods known as greedy clustering algorithms. One of the more popular of these is UClust (Edgar, 2010). Briefly, the UClust ranks sequences in order of abundance and seeds the first OTU with the most abundant sequence. The next sequence is compared to the first, and if it is within the clustering threshold then it is added to the OTU, if not then it becomes the seed of a second OTU. Each sequence is compared to the OTU seeds, in order, until all reads have been assigned to an existing OTU or used to create a new OTU.

Diversity in an Imperfect World

Ecologists measure the diversity of a microbial community in a variety of ways. Richness is the number of different members (OTUs, species, genera, phylotypes, etc.) in a community. Evenness describes the distribution of relative

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

abundances of the members, whether they have similar abundances or are skewed with some highly abundant and others rare. A community’s diversity combines richness and evenness (although richness is often called diversity as well). The richness of a single community is often referred to as alpha diversity. The degree of similarity or difference between two or more communities in richness or evenness is beta diversity. One important conceptual difference between these two measures is that alpha diversity is nearly always used to designate an estimate of the true richness of the community from which a data sample was taken, while beta diversity is generally a metric describing the similarity between the observed richness or evenness of two samples (though there are methods for estimating community similarity from sample data [Chao, 2004], they are rarely used in molecular microbial ecology).

Algorithms used to calculate diversity differ in the stability of their results in the presence of residual sequencing errors and chimeras (as reflected in OTU inflation) and in the depth of sampling. Estimates of sample richness (alpha diversity) are particularly susceptible to the impacts of both. Rarefaction curves, while not true estimators of alpha diversity, are often used to illustrate the relationship between the observed number of community members (i.e., OTUs) and sampling depth. The number of OTUs observed for a range of subsampling sizes are plotted against the average number of OTUs observed for each subsample size, describing the number of new members discovered for an incremental increase in sampling effort. With very small subsamples, small increases in sampling depth lead to the discovery of many new members. As sampling depth increases, the number of new members found decreases and begins to asymptote as the sampling depth provides a more complete picture of the underlying community. If OTUs are created using a clustering method that inflates with sampling depth, the rarefaction curves cannot reach an asymptote and instead will increase linearly as a direct function of sampling depth. Superimposing multiple rarefaction curves from complete-linkage OTUs demonstrates the dependence of the slope of the rarefaction curve on the sample depth (Figure A6-2, Panel A). Rarefaction curves based on depth-independent OTUs (e.g., SLP-PWAL or UClust) have essentially identical slopes for sample depths ranging over two orders of magnitude (Figure A6-2, Panel B).

Some nonparametric methods of estimating alpha diversity remain sensitive to sampling depth independent of OTU inflation or other forms of error. Two of the most common estimators, ACE (Chao and Lee, 1992) and Chao1 (Chao, 1984), are well known to underestimate richness when used for populations where many of the members are “unseen” in the sample, and are more accurately considered lower bounds rather than true estimates. In other words, they do not perform well when a community is drastically undersampled, as is the case for most samples of microbial communities. Panel A of Figure A6-3 illustrates the impact of both sampling depth and clustering method on the estimated richness for a human gut microbiome sample. With increasing depth, the sample more

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

adequately reflects the underlying community, and the richness estimate can stabilize. In this example, Chao1 estimates rise rapidly with sampling depth from 1,000 to 20,000, but then the estimates level. Doubling the sample depth from 20,000 to 40,000 does not change the estimated richness beyond the error bars. Plots of richness against subsample depth may serve as a reality check on the stability (if not accuracy) of calculated richness for a given sample. Parametric estimators such as CatchAll (Bunge, 2011) are less sensitive to sample size and undersampling and also make use of a wider range of OTU sizes in extrapolating total richness.

Even with depth-independent OTU clustering methods, community richness estimates can be vulnerable to OTU inflation. Most algorithms for estimating richness heavily weight the number of OTUs with one or two reads (singleton and doubleton OTUs). We can assume that most spurious OTUs are singletons (this will not always be the case, for instance early-round chimeras can be amplified in a sample, but it is a conservative assumption). By removing the estimated number of spurious OTUs (e.g., 1 in 1,000) from the count of singletons in the species abundance data used to calculate richness, we can compensate for OTU inflation in estimating alpha diversity. We do not need to know which of our OTUs are spurious and which are true; we only need an estimated number. Panel A of Figure A6-3 shows the impact of several OTU inflation rates on richness estimates for a subsample of 25,000.

At any sampling depth, microbial communities consistently display a long-tail distribution, and therefore evenness will be low at all sampling depths. Both Simpson’s and Shannon’s diversity estimates show very little direct susceptibility to the sampling depth (Figure A6-3, Panel B), but they are still affected by the OTU clustering method. Clearly, choosing a clustering algorithm that minimizes OTU inflation and that is stable to sample size is critical at all times.

The list of beta diversity metrics that compare the degree of similarity or difference between two communities is very long. We highlight the importance of using metrics that are robust to both differences in sampling depth and under-sampling using three distance metrics: Jaccard presence/absence, Bray-Curtis, and Morisita-Horn. Results using the Yue-Clayton distance were consistently similar to Morisita-Horn (results not shown). We subsampled a large data set to provide pseudo-replicates that we expect to be similar. If we compare a subsample with itself, then the distance by any metric will be zero (or 1), and the expectation is that a comparison of two different random subsamples should give a very similar value. In Figure A6-4, Panel A, we take multiple subsamples of 5,000 reads from a sample of 50,000 reads and calculate the beta diversity of pairs of replicates. Although we would like the distances to approach zero, we know that microbial diversity is so great that there will still be differences between the replicates due to incomplete sampling. Morisita-Horn, Bray-Curtis, and Yue-Clayton (not shown) all return distances of about 5 percent or less. The Jaccard presence/absence metric, on the other hand, returns a community distance of 50-60 percent.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


FIGURE A6-4 Panel A, Selecting multiple random subsamples of 5,000 reads from a larger data set of 50,000 reads, we created a set of pseudo-replicate samples. Because they all represent the same larger sample, the pairwise distances should be very small. The use of Jaccard presence/absence is highly affected by the membership within the rare biosphere. Bray-Curtis includes abundance and returns much smaller values. Morisita-Horn was specifically designed for smaller samples and to adjust for different sample sizes and returns values approaching zero. Panel B, Bray-Curtis uses absolute counts, rather than relative abundances, and displays increasing community distance values with increasing differences in sampling depth. Panel C, Even when considering only the Bray-Curtis distance between samples of the same depth, the values returned are still affected

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

The test of presence or absence in a community, where fewer members have been detected than not, will always show large differences between the communities. The small number of abundant members will be consistent across replicates, but rare members detected in any given replicate will vary. It is not surprising that upwards of half the members of a replicate pair can be different.

In the case of similar subsample size, both Bray-Curtis and Morisita-Horn returned suitably low beta diversity values. Unfortunately, NGS data sets vary greatly in the number of reads depending on the amount of the amplicon library loaded on the sequencer and the quality of the particular sequencing run. Bray-Curtis and Morisita-Horn compare not only which members are present, but also the abundance of each. To adjust for undersampling, Morisita-Horn emphasizes the abundant members, assuming that if a member is abundant in one sample and not detected in the other sample, then this reflects a true difference in the communities, while if a rare member is detected in one and not the other, this may be an artifact of undersampling. Bray-Curtis includes but does not differentially weight the abundance information.

In Panels B through E in Figure A6-4, we illustrate the effect of different sample sizes on beta diversity values. Bray-Curtis returns small diversity values for subsamples of the same size and increasingly larger values when comparing samples of different sizes; the average distance between subsamples of 1,000 and 25,000 is about 90 percent (Figure A6-4, Panel B). This disparity comes in part because the Bray-Curtis method uses absolute rather than relative abundance. Even if data are subsampled to the same depth, Bray-Curtis can still cause misinterpretations of results when combined with one of the most common visualization tools for illustrating community similarity, principal coordinate analysis (PCoA). Panel C of Figure A6-4, shows that Bray-Curtis measures of subsample similarity pairs subsamples based on read depth, even though they are subsamples from the very same community. Even though the absolute distances were low for pairs from the same subsampling depth, a PCoA plot does not report absolute differences but scales according to the set of distances used. For subtle changes in


by the sampling depth. In a classic principal coordinate analysis plot, commonly used in microbial community studies, the clustering is affected by the sampling depth used. Panel D, The Morisita-Horn metric, which uses relative abundances and places more emphasis on the more abundant community members, is not appreciably affected by comparing samples of different sizes. The larger distances consistently reported for samples of 1,000 reads likely reflects the lower bound of sample depth required to be representative for this data set. Even so, the values returned are below 1 percent. Panel E, The distance values returned by Morisita-Horn cluster together for samples with more than 7,500 reads. The increasing spread of points for sampling depths less than 7,500 presumably reflects the lower bound of representative sampling for this data set, rather than an inherent limitation of the Morisita-Horn metric.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

the community, a method such as Bray-Curtis could lead to erroneous interpretations of community shifts.

Morisita-Horn effectively compensates for sampling depth, returning beta diversity values less than 1 percent for all subsample comparisons (Figure A6-4, Panel D). Interestingly, the Morisita-Horn distances for pairs where one subsample was at 1,000 reads were noticeably higher than the other distances. With increasing depth this divergence disappears. In the PCoA plot, the 1,000 read depth pairs do not cluster either with each other or with any of the other data (Figure A6-4, Panel E). Depth pairs with 5,000 are still divergent but much less so than 1,000. In this particular data set, it appears that a minimum sampling depth of 10,000 is necessary to adequately reflect the community in the subsample.

Continued Evidence for the Rare Biosphere

In evaluating both the extent of the rare biosphere and our ability to meaningfully sample it, it is helpful to put the word “rare” into perspective. Estimates vary, but let us assume that there are at least 1×1011 bacterial cells in a single gram dry weight of human stool (Franks et al., 1998). If we sequence DNA from a single gram of stool and analyze a relatively large data set of 50,000 (5×104) reads, we are sampling a tiny fraction of the census population. An OTU found as a singleton may be present at a frequency of only 1/50,000, but that is 2×106 cells/g, which may not be an insignificant number.

But is the long-tail distribution, while consistent across bacterial communities sampled from human and other hosts, marine, freshwater, soil, sand, leaves, sewage, and any number of other environments, merely an artifact of the known phenomenon of OTU inflation caused by deep sequencing? Returning to the empirically derived estimate of 1 spurious OTU per 1,000 reads, we can remove a fraction of singleton OTUs equal to those attributed to OTU inflation (Figure A6-5, Panel A). What we see is that even if we remove 1 out of every 500 singleton OTUs, the distribution retains its characteristic shape, because the fraction of singletons removed compared to the number observed is relatively small. Any spurious OTUs are simply extending the end of the tail incrementally; they are not fundamentally altering the shape of the species abundance curve.

Everything May or May Not Be Everywhere, but Everything Is Rare Somewhere

One of the stronger pieces of evidence supporting the existence of the rare biosphere comes from comparing the sequences found in different microbial communities. The human gut microbiome, for example, varies greatly between subjects. In practice, this leads to a wide range of relative abundances of even the most common OTUs. Panels B and C in Figures A6-5 show the 100 most abundant OTUs across 208 Human Microbiome Project subjects in rank order (Huse et al., 2012). The maximum abundance in a single subject for each OTU is

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


FIGURE A6-5 Panel A, The rank abundance curve shows only minor reductions in the long tail even assuming that as many as 1 in 500 reads generates a spurious OTU. Panel B, The relative abundance of the 100 most abundant OTUs in the Human Microbiome Project stool samples. Each dot represents a separate subject. Even the most abundant OTUs that dominant some samples, are also rare in other samples, indicating that an OTU that is rare in an individual is not necessarily either spurious or consistently rare. Low abundance taxa should not be dismissed based on only a few samples. Panel C, The same stool samples as plotted in Panel D, but here with absolute abundance in a sample. The blue expanse on the left represents the percentage of samples that do not contain the OTU. Even OTUs that are more absent across samples than present appear consistently in multiple samples with an abundance greater than 100. Panel D, An another example of the abundance graph in Panel C, with data from aquatic samples of the English Channel. Here the intersample variation is much lower.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

1-100 percent (Figure A6-5, Panel B). The minimum abundance for each of these (except the first most abundant OTU) is within the rare biosphere for at least one subject. In other words, essentially all of the most abundant gut OTUs are highly abundant in some subjects and rare or not detected in others. Panel C of Figure A6-5 uses absolute rather than relative abundance and frequencies to portray the same data. Looking only at a single subject, we might be tempted to discount rare OTUs as noise in the data rather than a true rare biosphere signal. In the greater context of many samples, however, we realize that true rare members are prevalent across subjects, and these same members can dominate the microbiome in other subjects. This same pattern can be seen in other environments including the waters of the English Channel (Gilbert et al., 2009) (Figure A6-5, Panel D).


The use of NGS methods has revolutionized microbial ecology. But, as with any new technology, new challenges must be met. For accurate results, great care must be taken to reduce the rates of sequencing errors and to remove DNA amplification chimeras, using high-quality de-noising or paired-end overlap filtering, and chimera detection. These initial steps, however, are not enough. Researchers must also select bioinformatics tools that avoid artificially inflating the number of OTU clusters, alpha diversity estimates, and beta diversity estimates. OTU clustering methods such as SLP-PWAL and UClust reduce inflation, whereas methods employing multiple sequence alignments and complete linkage clustering overestimate the appropriate number of OTUs.

The selection of diversity metrics affects the research results. Simple richness estimates are sample size dependent. Because the most common estimators are known to be affected by undersampling, larger sample sizes (in the absence of depth-dependent OTU inflation) may provide the most accurate richness estimates. One means to enhance the interpretation of richness estimates is to plot the richness at subsample depths for a given sample to see whether the sample depth is sufficient for the estimate to be stable or whether the sample depth is still within the zone of distinct depth-dependence. Fortuitously, both Simpson’s and Shannon’s diversity estimates show independence of sample depth.

Beta diversity should only be calculated with an appropriately robust metric that can accommodate sample depth. Even in cases where multiple samples are subsampled to the same depth before calculating intercommunity distance, the use of a depth-dependent metric such as Bray-Curtis will still be affected by depth and can in cases of more subtle shifts in community structure skew PCoA plots, resulting in possible misinterpretation of results. The practice of subsampling to the minimum can introduce artifacts of undersampling as demonstrated in Figure A6-5. A much more robust method is to select a beta diversity algorithm such as Morisita-Horn or Yue-Clayton that does not require subsampling. For all alpha and beta diversity calculations there are thresholds of undersampling

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

that no metric selection can overcome. Therefore, in the absence of other depth-dependent overestimates (such as poor selection of clustering method), it is best to use full sample sizes rather than subselecting to a minimal and therefore less representative sample size.

Even in the best of all research worlds, errors, OTU inflation, chimeras, contamination, and other inaccuracies will still exist. In this light, the use of multiple samples for determining when low abundance “errare” OTUs or taxa are errors and when they are true rare members is necessary. One straightforward means for deciding to trust the validity is if an OTU occurs abundantly in any sample. By clustering OTUs or using taxonomy and performing bioinformatics analyses across multiple samples at once, it is easy to detect abundant members in the set of samples, validating those members in communities where they are rare. Given our current techniques, context is the best method for discerning truth from fiction in the rare biosphere.


Acinas, S. G., V. Klepac-Ceraj, D. E. Hunt, C. Pharino, I. Ceraj, D. L. Distel, and M. F. Polz. 2004. Fine-scale phylogenetic architecture of a complex bacterial community. Nature 430:551-554.

Bartram, A. K., M. D. J. Lynch, J. C. Stearns, G. Moreno-Hagelsieb, and J. D. Neufeld. 2011. Generation of multimillion-sequence 16s rRNA gene libraries from complex microbial communities by assembling paired-end illumina reads. Applied and Environmental Microbiology 77(11):3846-3852.

Bunge, J., 2011. Estimating the number of species with CatchAll. Proceedings of the 2011 Pacific Symposium on Biocomputing.

Caporaso, J. G., J. Kuczynski, J. Stombaugh, K. Bittinger, F. D. Bushman, E. K. Costello, N. Fierer, A. G. Pena, J. K. Goodrich, J. I. Gordon, G. A. Huttley, S. T. Kelley, D. Knights, J. E. Koenig, R. E. Ley, C. A. Lozupone, D. McDonald, B. D. Muegge, M. Pirrung, J. Reeder, J. R. Sevinsky, P. J. Turnbaugh, W. A. Walters, J. Widmann, T. Yatsunenko, J. Zaneveld, and R. Knight. 2010. QIIME allows analysis of high-throughput community sequencing data. Nature Methods 7(5):335-336.

Chao, A. 1984. Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics 11:265-270.

Chao, A., and S.-M. Lee. 1992. Estimating the number of classes via sample coverage. Journal of the American Statistical Association 87(417):210-217.

DeSantis, T. Z., P. Hugenholtz, N. Larsen, M. Rojas, E. L. Brodie, K. Keller, T. Huber, D. Dalevi, P. Hu, and G. L. Andersen. 2006. Greengenes, a chimera-checked 16s rRNA gene database and workbench compatible with ARB. Applied and Enviromental. Microbiology 72(7):5069-5072.

Edgar, R. C. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460-2461.

Edgar, R. C., B. J. Haas, J. C. Clemente, C. Quince, and R. Knight. 2011. Uchime improves sensitivity and speed of chimera detection. Bioinformatics 27(16):2194-2200.

Franks, A. H., H. J. M. Harmsen, G. C. Raangs, G. J. Jansen, F. Schut, and G. W. Welling. 1998. Variations of bacterial populations in human feces measured by fluorescent in situ hybridization with group-specific 16s rRNA-targeted oligonucleotide probes. Applied and Environmental Microbiology 64(9):3336-3345.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Gilbert, J. A., F. Dawn, S. Paul, N. Lindsay, O. Anna, S. Tim, J. S. Paul, H. Sue, and J. Ian. 2009. The seasonal structure of microbial communities in the western English Channel. Environmental Microbiology 11(12):3132-3139.

Gloor, G. B., R. Hummelen, J. M. Macklaim, R. J. Dickson, A. D. Fernandes, R. MacPhee, and G. Reid. 2010. Microbiome profiling by illumina sequencing of combinatorial sequence-tagged PCR products. PLoS ONE 5(10):e15406.

Haas, B. J., D. Gevers, A. M. Earl, M. Feldgarden, D. V. Ward, G. Giannoukos, D. Ciulla, D. Tabbaa, S. K. Highlander, E. Sodergren, B. Methé, T. Z. DeSantis, C. The Human Microbiome, J. F. Petrosino, R. Knight, and B. W. Birren. 2011. Chimeric 16s rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Research 21(3):494-504.

Huber, J. A., H. G. Morrison, S. M. Huse, P. R. Neal, M. L. Sogin, and D. B. Mark Welch. 2009. Effect of PCR amplicon size on assessments of clone library microbial diversity and community structure. Environmental Microbiology 11(5):1292-1302.

Huse, S., J. Huber, H. Morrison, M. Sogin, and D. Mark Welch. 2007. Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biology 8(7):R143.

Huse, S. M., D. Mark Welch, H. G. Morrison, and M. L. Sogin. 2010. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environmental Microbiology 12(7):1889-1898.

Huse, S. M., Y. Ye, Y. Zhou, and A. A. Fodor. 2012. A core human microbiome as viewed through 16s rRNA sequence clusters. PLoS ONE 7(6):e34242. doi:10.1371/journal.pone.0034242.

Kunin, V., A. Engelbrektson, H. Ochman, and P. Hugenholtz. 2010. Wrinkles in the rare biosphere: Pyrosequencing errors lead to artificial inflation of diversity estimates. Environmental Microbiology 12(1):118-123.

Lahr, D. J. G., and L. A. Katz. 2009. Reducing the impact of PCR-mediated recombination in molecular evolution and environmental studies using a new-generation high-fidelity DNA polymerase. BioTechniques 47:857-866.

Meacham, F., D. Boffelli, J. Dhahbi, D. Martin, M. Singer, and L. Pachter. 2011. Identification and correction of systematic error in high-throughput sequence data. BMC Bioinformatics 12(1):451.

Minoche, A., J. Dohm, and H. Himmelbauer. 2011. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biology 12(11):R112.

Pruesse, E., C. Quast, K. Knittel, B. M. Fuchs, W. Ludwig, J. Peplies, and F. O. Glockner. 2007. Silva: A comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Research 35(21):7188-7196.

Qiu, X., L. Wu, H. Huang, P. E. McDonel, A. V. Palumbo, J. M. Tiedje, and J. Zhou. 2001. Evaluation of PCR-generated chimeras, mutations, and heteroduplexes with 16s rRNA gene-based cloning Applied and Environmental Microbiology 67:880-887.

Quince, C., A. Lanzen, T. P. Curtis, R. J. Davenport, N. Hall, I. M. Head, L. F. Read, and W. T. Sloan. 2009. Noise and the accurate determination of microbial diversity from 454 pyrosequencing data. Nature Methods 6(9):639-641.

Quince, C., A. Lanzen, R. Davenport, and P. Turnbaugh. 2011. Removing noise from pyrosequenced amplicons. BMC Bioinformatics 12(1):38.

Reeder, J., and R. Knight. 2009. The “rare biosphere”: A reality check. Nature Methods 6(9):636-637.

Schloss, P. D., D. Gevers, and S. L. Westcott. 2011. Reducing the effects of PCR amplification and sequencing artifacts on 16s rRNA-based studies. PLoS ONE 6(12):e27310.

Schloss, P. D., and J. Handelsman. 2005. Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Applied and Environmental Microbiology 71(3):1501-1506.

Schloss, P. D., S. L. Westcott, T. Ryabin, J. R. Hall, M. Hartmann, E. B. Hollister, R. A. Lesniewski, B. B. Oakley, D. H. Parks, C. J. Robinson, J. W. Sahl, B. Stres, G. G. Thallinger, D. J. Van Horn, and C. F. Weber. 2009. Introducing mothur: Open source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology 75(23):7537-7541.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Sogin, M. L., H. G. Morrison, J. A. Huber, D. Mark Welch, S. M. Huse, P. R. Neal, J. M. Arrieta, and G. J. Herndl. 2006. Microbial diversity in the deep sea and the underexplored “rare biosphere.” Proceedings of the National Academy of Sciences 103(32):12115-12120.

Wang, Q., G. M. Garrity, J. M. Tiedje, and J. R. Cole. 2007. A naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology, 73(16):5261-5267.



Amy J. Vogler,24Fabien Chan,25David M. Wagner,24
Philippe Roumagnac,26,¤aJudy Lee,24Roxanne Nera,24
Mark Eppinger,27Jacques Ravel,26Lila Rahalison,24,¤b
Bruno W. Rasoamanana,24,¤cStephen M. Beckstrom-Sternberg,24,28
Mark Achtman,24,29Suzanne Chanteau,24,¤dand Paul Keim24,27,*


Background Plague was introduced to Madagascar in 1898 and continues to be a significant human health problem. It exists mainly in the central highlands, but


23 Reprinted from PLoS Neglected Tropical Diseases. Originally published as: Vogler AJ, Chan F, Wagner DM, Roumagnac P, Lee J, et al. (2011) Phylogeography and Molecular Epidemiology of Yersinia pestis in Madagascar. PLoS Negl Trop Dis 5(9): e1319. doi:10.1371/journal.pntd.0001319

Editor: Mathieu Picardeau, Institut Pasteur, France.

24 Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, USA.

25 Institut Pasteur de Madagascar, Antananarivo, Madagascar.

26 Max Planck Institut für Infektionsbiologie, Berlin, Germany.

27 Institute for Genomic Sciences (IGS), School of Medicine, University of Maryland, Baltimore, Maryland, USA.

28 Translational Genomics Research Institute, Phoenix, Arizona, United States of America.

29 Environmental Research Institute, University College Cork, Cork, Ireland.

Copyright: © 2011 Vogler et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail:

¤a Current address: Unite Mixte de Recherche 6191, Centre National de la Recherche Scientifique-Commissariat à l’Energie Atomique-Aix-Marseille Université, Commissariat à l’Energie Atomique Cadarache, Saint Paul Lez Durance, France.

¤b Current address: Centers for Disease Control and Prevention, Atlanta, Georgia, USA.

¤c Current address: Laboratoire de Biologie Médicale du Tampon, Le Tampon, Reunion Island, France.

¤d Current address: Institut Pasteur de Nouvelle-Calédonie, Nouméa, New Caledonia.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

in the 1990s it was reintroduced to the port city of Mahajanga, where it caused extensive human outbreaks. Despite its prevalence, the phylogeography and molecular epidemiology of Y. pestis in Madagascar has been difficult to study due to the great genetic similarity among isolates. We examine island-wide geographic-genetic patterns based upon whole-genome discovery of SNPs, SNP genotyping, and hypervariable variable-number tandem repeat (VNTR) loci to gain insight into the maintenance and spread of Y. pestis in Madagascar.

Methodology and principal findings We analyzed a set of 262 Malagasy isolates using a set of 56 SNPs and a 43-locus multi-locus VNTR analysis (MLVA) system. We then analyzed the geographic distribution of the subclades and identified patterns related to the maintenance and spread of plague in Madagascar. We find relatively high levels of VNTR diversity in addition to several SNP differences. We identify two major groups, Groups I and II, which are subsequently divided into 11 and 4 subclades, respectively. Y. pestis appears to be maintained in several geographically separate subpopulations. There is also evidence for multiple long distance transfers of Y. pestis, likely human mediated. Such transfers have resulted in the reintroduction and establishment of plague in the port city of Mahajanga, where there is evidence for multiple transfers both from and to the central highlands.

Conclusions and Significance The maintenance and spread of Y. pestis in Madagascar is a dynamic and highly active process that

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

relies on the natural cycle between the primary host, the black rat, and its flea vectors as well as human activity.

Author Summary

Plague, caused by the bacterium Yersinia pestis, has been a problem in Madagascar since it was introduced in 1898. It mainly affects the central highlands, but also has caused several large outbreaks in the port city of Mahajanga, after it was reintroduced there in the 1990s. Despite its prevalence, the genetic diversity and related geographic distribution of different genetic groups of Y. pestis in Madagascar has been difficult to study due to the great genetic similarity among isolates. We subtyped a set of Malagasy isolates and identified two major genetic groups that were subsequently divided into 11 and 4 subgroups, respectively. Y. pestis appears to be maintained in several geographically separate subpopulations. There is also evidence for multiple long distance transfers of Y. pestis, likely human mediated. Such transfers have resulted in the reintroduction and establishment of plague in the port city of Mahajanga where there is evidence for multiple transfers both from and to the central highlands. The maintenance and spread of Y. pestis in Madagascar is a dynamic and highly active process that relies on the natural cycle between the primary host, the black rat, and its flea vectors as well as human activity.


Throughout recorded history, Yersinia pestis, etiologic agent of plague, has spread multiple times from foci in central Asia in greatly widening swaths as human-mediated transport became more efficient (Morelli et al., 2010). Plague attained its current global distribution during the current “third” pandemic, which began in 1855 in the Chinese province of Yünnan, when it was introduced to many previously unaffected countries via infected rats on steam ships (Perry and Fetherston, 1997). Plague caused widespread outbreaks during this introduction period (~1900 A.D.), and though disease incidence has since largely decreased, plague remains a significant human health threat due to the severe and often fatal nature of the disease, the many natural plague foci (Perry and Fetherston, 1997), and its potential as a bioterror agent (it is currently classified as a Class A Select Agent [Rotz et al., 2002]). Plague is of particular significance in Madagascar, which has reported some of the highest human plague case numbers (18%–60% of the world total each year between 1995 and 2009) (WHO, 2010) and was the origin of a natural multi-drug resistant strain of Y. pestis (Galimand et al., 1997; Welch et al., 2007).

Plague has been a problem in Madagascar since its introduction during the current pandemic. It was first introduced to Toamasina in 1898 (Brygoo, 1966), likely via India (Morelli et al., 2010), with outbreaks in other coastal cities soon after. In 1921, plague reached the capital, Antananarivo, likely via infected rats transported on the railroad linking Toamasina and Antananarivo. Subsequent rat epizootics signaled the establishment of plague in the central highlands (Brygoo, 1966). Plague then disappeared from the coast and now exists within two large areas in the central and northern highlands above 800 m in elevation (Chanteau et al., 1998). This elevational distribution of plague is linked to the presence of the flea vectors Xenopsylla cheopis and Synopsyllus fonquerniei, which are less abundant and absent, respectively, below 800 m (Duplantier, 2001; Duplantier et al., 1999). Plague has never disappeared from this region, and although it was relatively controlled in the 1950s due to public hygiene improvements and the introduction of antibiotics and insecticides, disease incidence began increasing in 1989 (Chanteau et al., 1998, 2000; Migliani et al., 2006). Human plague cases peaked in 1997 but continue to occur at high frequencies, making Madagascar among the top three countries for human plague cases during the past 15 years (WHO, 2010).

A third, newly emerged plague focus outside the central and northern highlands is the port city of Mahajanga, located ~400 km by air from Antananarivo (Chanteau et al., 1998). Plague first appeared in Mahajanga during an outbreak in 1902. Subsequent outbreaks occurred in 1907 and between 1924 and 1928

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

(Brygoo, 1966). Plague then disappeared from Mahajanga for a period of 62 years before reappearing during a large outbreak in 1991 (Laventure et al., 1991). Subsequent outbreaks occurred from 1995–1999 (Boisier et al., 1997, 2000; Rasolomaharo et al., 1995). During this time, the Mahajanga focus was responsible for ~30% of the reported human plague cases in Madagascar (Boisier et al., 2002). Interestingly, this focus likely represents one of the only examples of plague being reintroduced to an area where it had gone extinct, rather than emergence from a silently cycling rodent reservoir without telltale human cases (Duplantier et al., 2005).

Molecular subtyping of Y. pestis for epidemiological tracking has been difficult due to a lack of genetic diversity (Achtman et al., 1999). SNP genotyping (Achtman et al., 2004; Eppinger et al., 2010; Morelli et al., 2010), ribotyping (Guiyoule et al., 1994), IS100 insertion element restriction fragment length polymorphism (RFLP) analysis (Achtman et al., 1999), PCR-based IS100 genotyping (Achtman et al., 2004; Motin et al., 2002) and pulsed-field gel electrophoresis (PFGE) (Lucier and Brubaker, 1992) have been used to differentiate global isolate collections; however, SNP genotyping provides the most robust phylogenetic reconstructions. SNP genotyping (Morelli et al., 2010), ribotyping (Guiyoule et al., 1997), IS100 insertion element RFLP analysis (Huang et al., 2002), different region (DFR) analysis (Li et al., 2008), clustered regularly interspaced short palindromic repeats (CRISPR) analysis (Cui et al., 2008), ERIC-PCR (Kingston et al., 2009), ERIC-BOX-PCR (Kingston et al., 2009), and PFGE (Huang et al., 2002; Zhang et al., 2009) have shown limited to moderate ability in differentiating isolates on a regional scale. Of these, ribotyping has been applied to a set of 187 Malagasy isolates, but only revealed four ribotypes, three of which were unique to Madagascar (Guiyoule et al., 1997). SNP genotyping of 82 Malagasy isolates provided greater and more phylogenetically informative resolution, revealing two major groups and an additional 10 subgroups derived from these two major groups that were mostly isolate-specific (Morelli et al., 2010). In contrast to these other molecular subtyping methods, multi-locus variable-number tandem repeat (VNTR) analysis (MLVA) has shown high discriminatory power at global (Achtman et al., 2004; Klevytska et al., 2001; Pourcel et al., 2004), regional (Girard et al., 2004; Klevytska et al., 2001; Li et al., 2009; Lowell et al., 2007; Zhang et al., 2009), and local scales (Girard et al., 2004), indicating its likely usefulness for further differentiation among Y. pestis isolates from Madagascar.

The use of SNPs and MLVA together, in a hierarchical approach, has been successfully applied to clonal, recently emerged pathogens (Keim et al., 2004; Van Ert et al., 2007; Vogler et al., 2009). Point mutations that result in SNPs occur at very low rates, making SNPs relatively rare in the genome, but discoverable through intensive sampling (i.e., whole genome sequencing). In addition, since each SNP likely occurred only once in the evolutionary history of an organism, SNPs represent highly stable phylogenetic markers that can be used for identifying key phylogenetic positions (Keim et al., 2004). However, SNPs discovered

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

from a limited number of whole genome sequences will have limited resolving power (Keim et al., 2004) since they will only be able to identify phylogenetic groups along the evolutionary path(s) linking the sequenced genomes (Pearson et al., 2004). In contrast, VNTRs possess very high mutation rates and multiple allele states, allowing them to provide a high level of resolution among isolates. Unfortunately, these high mutation rates can lead to mutational saturation and homoplasy, which can obscure deeper phylogenetic relationships, leading to inaccurate phylogenies. Using these two marker types together, in a nested hierarchical approach, with SNPs used to identify major genetic groups followed by VNTRs to provide resolution within those groups, allows for both a deeply rooted phylogenetic hypothesis and high resolution discrimination among closely related isolates (Keim et al., 2004).

We investigated the phylogeography and molecular epidemiology of Y. pestis in Madagascar through extensive genotyping and mapping of genetic groups. We genotyped 262 Malagasy isolates from 25 districts from 1939–2005 using 56 SNPs and a 43-marker MLVA system to identify island specific subclades. We then spatially mapped the subclades to examine island-wide geographic-genetic patterns and potential transmission routes.


Ethics Statement

The DNAs analyzed in this study (Table S1) were extracted from Y. pestis cultures that were previously isolated by the Malagasy Central Laboratory for plague and Institut Pasteur de Madagascar as part of Madagascar’s national plague surveillance plan. The Malagasy Ministry of Health, as part of this national plague surveillance plan, requires declaration of all suspected human plague cases and collection of biological samples from those cases. These biological samples are analyzed by the Malagasy Central Laboratory for plague and Institut Pasteur de Madagascar, which also maintains any cultures derived from these samples. These cultures are all de-linked from the patients from whom they originated and analyzed anonymously if used in any research study. Thus, for purposes of this study, all of the DNAs derived from Y. pestis cultures from human patients were analyzed anonymously. No Malagasy review board existed during the collection period of the cultures (1939–2001) from which the DNAs used in this study were derived. In addition, the Institutional Review Board of Northern Arizona University, where the DNA genotyping was done, did not require review of the research due to the anonymous nature of the samples.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


DNA was obtained from 262 isolates from 25 different districts from 1939– 2005 (Figure S1, Table S1). DNAs consisted of simple heat lysis preparations or whole genome amplification (WGA) (QIAGEN, Valencia, CA) products generated from the heat lysis preps. Most of the isolates were collected by the Malagasy Central Laboratory for plague supervised by the Institut Pasteur de Madagascar and were primarily isolated from human cases with a few isolated from other mammals or fleas. A handful of other isolates were from other institutions (still originally collected by the Malagasy Central Laboratory for plague) or represent publically available whole genome sequences (Table S1).

SNP Genotyping

A total of 56 SNPs were chosen to genotype the Malagasy isolates because they either marked the branches leading to or from the Madagascar clades in a worldwide analysis (Morelli et al., 2010) or were polymorphic among Malagasy isolates (Table S2). These SNPs were either previously identified in a worldwide SNP study on Y. pestis using a combination of denaturing high performance liquid chromatography (dHPLC) and whole genome sequence comparisons (Morelli et al., 2010) or identified here through whole genome sequence comparisons among 2 Malagasy whole genome sequences (MG05-1020 [GenBank:AAYS00000000] and IP275 [GenBank:AAOS00000000] [Morelli et al., 2010]) and 14 other Y. pestis strain sequences (CO92 [GenBank:AL590842] (Parkhill et al., 2001), FV-1 [GenBank:AAUB00000000] (Touchman et al., 2007), CA88-4125 [GenBank:ABCD00000000] (Auerbach et al., 2007), Antiqua [GenBank:CP000308], Nepal 516 [GenBank:CP000305] (Chain et al., 2006), UG05-0454 [GenBank:AAYR00000000] (Morelli et al., 2010), KIM 10 [GenBank:AE009952] (Deng et al., 2002), F1991016 [GenBank:ABAT00000000], E1979001 [GenBank:AAYV00000000], K1973002 [GenBank:AAYT00000000], B42003004 [GenBank:AAYU00000000] (Eppinger et al., 2009), Pestoides F [GenBank:CP000668] (Garcia et al., 2007), Angola [GenBank:CP000901] (Eppinger et al., 2010) and 91001 [GenBank:AE017042] [Song et al., 2004]). These whole genome sequence comparisons involved comparing the predicted gene sequences of the closed genome of Y. pestis strain CO92 (Parkhill et al., 2001) to the completed and draft genomes of all other strains using MUMmer and in-house Perl scripts (Delcher et al., 2002). For genomes with deposited underlying Sanger sequencing read information, a polymorphic site was considered of high quality when its underlying sequence in the query comprised at least three sequencing reads with an average Phred quality score >30 (Eppinger et al., 2010; Ewing et al., 1998).

A TaqMan-minor groove binding (MGB) assay or a melt mismatch amplification mutation assay (Melt-MAMA) was developed for each SNP for use in

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

genotyping the Malagasy DNAs. A TaqMan-MGB assay was designed around one SNP known to divide Malagasy isolates into two major groups (Mad-43, Table S2). Melt-MAMA assays were designed around the other 55 SNPs as previously described (Vogler et al., 2009). SNP locations, primer sequences, primer concentrations and other information for these assays are presented in Table S2. Primers and probes were designed using Primer Express 3.0 software (Applied Biosystems, Foster City, CA). Each 5 μl TaqMan-MGB reaction contained primers and probes (for concentrations see Table S2), 1× Platinum Quantitative PCR SuperMix-UDG with ROX (Invitrogen, Carlsbad, CA), water and 1 μl of template. Each 5 μl Melt-MAMA reaction contained 1× SYBR Green PCR Master Mix (Applied Biosystems) or 1× EXPRESS SYBR GreenER qPCR Supermix with Premixed ROX (Invitrogen) (for assay-specific master mix see Table S2), derived and ancestral allele-specific MAMA primers, a common reverse primer (for primer concentrations see Table S2), water and 1 μl of diluted DNA template. DNA templates were diluted 1/10 for heat lysis preparations or 1/50 for WGA products. All assays were performed on an Applied Biosystems 7900HT Fast Real-Time PCR System with SDS software v2.3. Thermal cycling conditions for the TaqMan-MGB assay were as follows: 50°C for 2 min, 95°C for 2 min and 50 cycles of 95°C for 15 s and 66°C for 1 min. Thermal cycling conditions for the Melt-MAMA assays were as follows: 50°C for 2 min, 95°C for 10 min and 40 cycles of 95°C for 15 s and 55–65°C for 1 min (see Table S2 for assay-specific annealing temperatures). Melt-MAMA results were interpreted as previously described (Vogler et al., 2009).


All 262 Malagasy isolates were also genotyped using a 43-marker MLVA system as previously described (Girard et al., 2004).

Node Assignment

In general, missing SNP data (<0.5% of dataset) were not a factor in node assignment (see SNP phylogenetic analysis below) since data were usually available for an equivalent SNP, thus leading to unambiguous node assignments for most isolates. However, there were four cases where the node assignment was potentially ambiguous. For three isolates missing data for SNP Mad-21 (branch 1.ORI3.k-1.ORI3.o, Table S2), the ancestral allele state was assumed for that SNP for those isolates, since in this and in a previous worldwide analysis (Morelli et al., 2010), only a single isolate, not included among these three, belonged to node “o.” For a single isolate missing data for SNP Mad-46 (branch 1.ORI3.d-1. ORI3.h1, Table S2) the derived state was assumed, due to the placement of that isolate in MLVA subclade II.B in a neighbor-joining analysis and the observed

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


FIGURE A7-2 Neighbor-joining dendrograms based upon MLVA data. Dendrograms for Group I (A) and Group II (B) are indicated. The SNP phylogeny from Figure 1 is also indicated (C) for comparison. Subclades within Groups I and II are collapsed in the full phylogenies (dotted boxes) for those groups (colored triangles) and are then individually expanded to show the structure within each subclade. The expanded subclades are labeled based upon their membership in Group I or II and by a capital letter (e.g., I.A) and are indicated by colored bars. Bootstrap values ≥50 supporting individual subclades are indicated on the expanded subclade phylogenies. SNP locations are indicated by vertical red lines. These red lines are labeled with the SNP ID numbers presented in Table S2 on the full phylogenies for unaffiliated isolate-specific SNPs and on the expanded phylogenies for all other SNPs. The years of isolation for isolates within each full and expanded phylogeny are indicated beside the panel label and underneath the individual phylogeny, respectively. The gray subcluster marked by the gray arrow in subclade I.A represents the “Mahajanga I.A subcluster,” a subcluster containing most of the isolates from the Mahajanga plague focus. Seven isolates from the central highlands that also fell within this subcluster are labeled with a “CH.” Five Mahajanga isolates that did not belong in this subcluster are labeled with a gray “M” (A). Black stars indicate the locations of the two sequenced Malagasy strains, MG05-1020 in subclade I.B and IP275 in subclade I.H.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

congruence between the “h” nodes and MLVA subclade II.B (see phylogenetic analyses below, Table S1).

Phylogenetic Analyses

A hierarchical approach was applied to the phylogenetic analysis of the Malagasy isolates. First, a SNP phylogeny was generated using data from all 56 SNPs (Figure A7-1). Second, neighbor-joining dendrograms based upon MLVA data were constructed using MEGA 3.1 (Kumar et al., 2001) for the two main groups in the SNP phylogeny, Groups I and II (Figure A7-2A–B). These groups corresponded to the two major Malagasy groups in a previous worldwide analysis (Morelli et al., 2010) and so were separated prior to analyzing with MLVA. The remaining SNPs showing variation among the Malagasy isolates mostly defined subclades observed in the MLVA phylogenies or were specific to single isolates, and so were not used to further separate the isolates prior to applying MLVA. The locations of these additional SNPs are marked on the two MLVA phylogenies where applicable (Figure A7-2A–B). A small set of SNPs provided very fine-scale resolution of the lineage leading to the whole genome sequenced MG05-1020 strain and are not marked on the MLVA phylogeny due to disagreement between the SNP and MLVA phylogenies on this small scale. Distance matrices for the two MLVA phylogenies were based upon mean character differences. Bootstrap values were based upon 1,000 simulations and were generated using PAUP 4.0b10 (D. Swofford, Sinauer Associates, Inc., Sunderland, MA). Branches with ≥50% bootstrap support and/or supported by one or more SNPs were identified as subclades. One other cluster (II.A) was also considered a subclade despite a lack of bootstrap support because of the proximity of a SNP-defined subclade (Figure A7-2B).

Geographic Distribution of Subclades

We mapped the geographic distributions of the Group I and II subclades we identified to determine their phylogeographic patterns (Figure A7-3).

Statistical Analyses

Analysis of similarity (ANOSIM) (Clarke, 1993) tests were performed using PRIMER software version 5 to test the hypotheses that 1) Groups I and II form distinct geographic groups and 2) the identified subclades form distinct geographic groups. These tests were performed on all subclades with ≥5 members (N = 221 isolates), thus excluding the unaffiliated isolates and subclades I.C, I.H, I.I and I.G (Table S1). The results of all 55 pairwise comparisons among the subgroups were evaluated at α = 0.000909 (global α of 0.05 divided by 55). To determine if there was a rank relationship between genetic distance and geographic

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


FIGURE A7-3 Geographic distribution of MLVA subclades in Madagascar. The MLVA phylogenies for Groups I and II from Figure A7-2A–B are presented with labeled subclades. Light gray shaded districts indicate Madagascar districts where Y. pestis isolates used in this study were obtained. Colors within the mapped circles and squares correspond to the subclade color designations in the MLVA phylogenies. Divisions within those circles and squares indicate that multiple subclades were found at that location. Circles represent isolates where the city/commune of origin is known. Squares represent isolates where only the district of origin is known and are placed within their corresponding districts near to cities/communes containing the same subclade(s) where possible. Six isolates had unknown districts of origin and were not mapped. Unaffiliated Group I and II isolates are indicated by an “*” and a “+,” respectively; these symbols surrounded by a square indicate unaffiliated isolates where only the district of origin is known. The dark gray-shaded area indicates the geographic area where Group II subclades are found. Note that some Group I subclades are also found in this area.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

distance, a Spearman correlation coefficient was generated using the RELATE function in PRIMER software with significance of the resulting statistics determined using 10,000 random permutations of the data. This analysis utilized all isolates with any geographic data (N = 256), with district centroids used as the geographic location for isolates for which only district level geographic information was available (N = 33); city/commune point geographic data were used for the remaining 223 isolates. Six isolates lacking any geographic information were excluded from both statistical analyses (Table S1).


Genetic Diversity of Y. Pestis in Madagascar

Our hypervariable-locus and genome-based approaches identified a relatively high level of genetic diversity among the 262 Malagasy isolates from 25 districts from 1939–2005. We confirmed the presence of two major genetic groups, Groups I and II, differentiated by a single SNP, Mad-43 (Figure A7-1, Table S2), and many VNTR mutational steps. Groups I and II were further differentiated into eleven (I.A–I.K, Figure A7-2A, Table S1) and four (II.A–II.D, Figure A7-2B, Table S1) subclades, respectively, based upon MLVA and/or SNPs. All but one of these subclades was at least weakly supported by bootstrap values ≥50 and/or one or more SNPs (Figure A7-2A–B). The high mutation rates at VNTR loci can lead to homoplasy and, consequently, to low bootstrap support for deeper phylogenetic relationships when analyzing isolates from regional or worldwide collections (Achtman et al., 2004; Johansson et al., 2004; Keim et al., 2004; Lowell et al., 2007). Nevertheless, subsequent analyses using more phylogenetically stable molecular markers (i.e., SNPs) have confirmed MLVA-determined clades with weak or even no bootstrap support (Achtman et al., 2004; Vogler et al., 2009), leading us to use even weak bootstrap support to validate subclades in this analysis. Of the two MLVA identified subclades without bootstrap support, II.A and II.B, subclade II.B was supported by SNP Mad-46 (Table S2) and subclade II.A was designated due to its proximity to and clear separation from the SNP-identified subclade II.B (Figure A7-2B). Subclades I.B, I.F, and I.H were supported by SNPs Mad-26 to 31, Mad-42, and Mad-09 to 17 (Table S2), respectively, and bootstrap analysis (Figure A7-2A). MLVA also identified 23 and 5 isolates in Groups I and II, respectively, that did not belong to any of the identified subclades within those groups (hereafter referred to as unaffiliated isolates) (Figure A7-2A–B, I.NONE and II.NONE isolates in Table S1). Four of these unaffiliated isolates and isolates in subclades I.B, I.H and II.B were also identified by apparently isolate-specific SNPs (Figure A7-2A–B). Overall, MLVA identified 226 genotypes among the 262 isolates, constituting far better resolution than that achieved using ribotyping (Guiyoule et al., 1997).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

The SNP and MLVA analyses showed remarkable congruence. Nearly all of the nodes in the SNP phylogeny either corresponded to MLVA subclades or were specific to individual isolates, allowing the combined analysis of SNP and MLVA data discussed above. Three nodes (f, m and n, Figure A7-1) did not have representatives in this study, but appeared to be specific for individual isolates in a previous analysis (Morelli et al., 2010). The only exception to this congruence was within the lineage leading to the whole genome sequenced strain, MG05-1020 (q nodes in Figure A7-1 and subclade I.B in Figure A7-2A). In this case, the SNP phylogeny (q nodes, Figure A7-1) was more accurate than and provided nearly as much resolution as the corresponding MLVA phylogeny (I.B, Figure A7-2A). This fine-scale phylogenetic resolution was due to the use of a high resolution SNP discovery method, whole genome sequence comparisons, to discover SNPs along this lineage as opposed to the lower resolution dHPLC method used to discover most of the other Malagasy SNPs (Morelli et al., 2010). Interestingly, comparable resolution was not seen in the lineage leading to the other whole genome sequenced strain, IP275 (l nodes in Figure A7-1 and subclade I.H in Figure A7-2A), likely due to the very low number of isolates (N = 2) within that lineage in this analysis.

Missing data for two SNP assays suggested a potential genomic rearrangement (e.g., deletion) in some of the Malagasy strains. Twenty-five of the 262 isolates were missing data for two SNP assays despite repeated attempts at amplification (Table S1). The two SNPs, Mad-28 and Mad-41, were located <850 bp apart at CO92 positions 2,208,345 and 2,207,531, respectively (Table S2), suggesting that there may have been a genomic rearrangement affecting this region in these strains. Intriguingly, IS100 elements were located flanking these SNPs at CO92 positions 2,135,459-2,137,412 and 2,236,265-2,238,215. IS elements are important facilitators of genomic rearrangements in Y. pestis (Auerbach et al., 2007; Chain et al., 2006) and may have played a role in this result. If so, the same or a similar genomic rearrangement must have occurred multiple times since the 25 isolates were members of six different nodes in the SNP phylogeny (Table S1). This hypothesis is supported by the fact that IS100 elements are known potential hotspots for genomic rearrangements and excisions in Y. pestis (Achtman, 2004; Auerbach et al., 2007).

Geographic Distribution of Isolates

Significant geographic separation was observed among the identified subclades. Overall, there was a small, but highly significant relationship between genetic and geographic distance (Spearman correlation coefficient ρ = 0.226, p<0.0001). In addition, the two main genetic groups, Groups I and II, formed distinct geographic groups based upon an ANOSIM (R = 0.091, p = 0.0007). Group II isolates, which possessed the derived state for SNP Mad-43 (Table S2), were essentially restricted to three of the most active plague districts in the central

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

highlands, Betafo, Manandriana and Ambositra (Chanteau et al., 2000), and an adjacent district, Ambatofinandrahana (Figure A7-3, S1). The only exceptions to this were the five unaffiliated Group II isolates, which were scattered in districts to the east and north (+ symbols, Figure A7-3). In contrast, Group I isolates were found in all three foci, both the central and northern highlands and Mahajanga. Geographic separation among the individual Group I and II subclades was also apparent (Figure A7-3) and statistically supported in an ANOSIM (R = 0.232, p<0.0001). Post-hoc analyses of the pairwise comparisons among subclades indicated that most of the eleven tested subclades formed distinct geographic groups (data not shown). Indeed, several interesting geographic patterns were apparent for the different subclades, only some of which are described below. Separate Group I subclades were found in the northern (I.C, I.G, and I.I, Figure A7-3, Table S1) versus the central (I.A, I.B, I.D, I.E, I.F, I.H, I.J, and I.K, Figure A7-3, Table S1) highlands. Subclade I.A, the largest single subclade, was the dominant subclade found in the capital, Antananarivo, and the surrounding area (Figure A7-3, S1). With the exception of two isolates, it was also the only subclade found in Mahajanga (Figure A7-3, S1, Table S1), indicating a central highlands origin for the Y. pestis responsible for the series of Mahajanga plague outbreaks from 1991–1999 (Boisier et al., 1997, 2002; Laventure et al., 1991; Rasolomaharo et al., 1995). Subclade I.B was the only subclade found in the northeastern portion of the central highlands (Figure A7-3). Geographic analysis of the corresponding SNP phylogeny (q nodes, Figure A7-1) for this subclade revealed some additional geographic-genetic patterns. Isolates with the same SNP genotype tended to be clustered geographically, although no distinct spreading pattern could be discerned, possibly due to the limited number of isolates (Figure A7-4). Subclade I.E was predominantly found in the southern central highlands, in district Fianarantsoa, and also appears to be the subclade responsible for the reemergence of plague in the Ikongo district (Migliani et al., 2001), adjacent to Fianarantsoa on the southeast (Figure A7-3, S1).

Three subclades, I.F, I.H and I.K, did not show distinct geographic patterns (Figure A7-3). In the cases of subclades I.F and I.H, this may be due to the limited numbers of isolates within those subclades (Figure A7-2A, Table S1). The geographically widespread nature of subclade I.K isolates, however, may be related to their older dates of isolation. All of the subclade I.K isolates were isolated between 1940 and 1955 (Figure A7-2A, Table S1), just 19–34 years after plague was introduced to the central highlands. Therefore, these isolates may represent a subclade that was formerly spread throughout much of the central highlands but that currently does not exist in nature in Madagascar. Similarly, subclade I.I, although it was not geographically widespread (Figure A7-3), only contained isolates isolated from 1971–1976 (Figure A7-2A, Table S1) and may represent a former, now extinct subclade from the northern highlands. However, the limited number of isolates makes this difficult to determine. Alternatively,

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


FIGURE A7-4 Geographic distribution of SNP-defined nodes in the strain MG05-1020 lineage. The strain MG05-1020 lineage portion of the SNP phylogeny from Figure A7-1 is indicated as well as an enlarged cutout of the map from Figure A7-3 showing the geographic distribution of isolates from this lineage. For an explanation of the mapped circles and squares see the figure legend for Figure A7-3. Circles, squares and pie chart slices in the map are numbered based upon the node number in the SNP phylogeny for the isolates represented by those shapes. The isolate in node “q7” is not mapped due to its geographic origin being unknown.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

these subclades may still exist, but may have decreased in frequency and/or be very rare in nature.

Interestingly, the other older isolates tended to be the unaffiliated isolates. Eighteen of the 28 unaffiliated isolates were isolated between 1939 and 1978. Another 3 had unknown dates of isolation (Table S1). Given their older dates of isolation, these unaffiliated isolates may also be representatives of older, now extinct subclades from Madagascar. The lack of comparable isolates to these unaffiliated isolates among the rest of the isolate collection could be due to the limited sampling from earlier years (Table S1). Alternatively, the unaffiliated isolates may simply be representatives of very rare subclades. A final possibility could involve the accumulation of VNTR mutations due to repeated passages associated with prolonged storage in the laboratory, which could lead to the older isolates being inaccurate representatives of the original isolates. This is unlikely, however, as the rate of VNTR evolution in the laboratory, even with passaging, should be much slower than in nature. Thus, while these isolates may not be exactly the same as when they were first isolated, they should be close. Also, multiple copies of a subset of the Malagasy isolates in this study that were stored at different temperatures showed identical MLVA genotypes (data not shown), indicating that these VNTR loci are relatively stable in these isolates under the storage conditions used. Regardless, the unaffiliated nature of many of the older isolates is consistent with and most likely related to their older dates of isolation.

Several cities and communes yielded isolates of subclades predominantly found elsewhere, suggesting importation from other locations. Antananarivo, in particular, contained isolates from five subclades in addition to the dominant subclade (Figure A7-3, S1). Commune Andina Firaisana in the Ambositra district is another example, containing representatives of four different subclades (Figure 3, S1). One of these, subclade I.A, was also found in the nearby surrounding area. However, this area is considerably south of the area where the majority of subclade I.A isolates were found, suggesting that this subclade may have been imported to this area from further north or vice versa (Figure A7-3). Of the other three subclades found in Andina Firaisana, subclades II.A and II.B are also found in nearby areas and so may be naturally occurring in Andina Firaisana rather than due to transfer events. Subclade II.C, in contrast, appears to have been transferred to Andina Firaisana from the Betafo district in the northwest or vice versa (Figure A7-3, S1). Another nearby commune, Ivato, contained a single subclade I.E isolate, suggesting a transfer event from district Fianarantsoa in the south (Figure A7-3, S1).

Plague in Mahajanga

Our data suggest that Y. pestis was reintroduced to Mahajanga from the central highlands. The majority of the Mahajanga isolates (39 of 44) belonged to a single subcluster within subclade I.A (hereafter referred to as the Mahajanga

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

I.A subcluster) (Figure A7-2A), suggesting that there was an introduction to Mahajanga from the central highlands that became established in Mahajanga and then underwent local cycling. Though this Mahajanga I.A subcluster did not have either SNP or MLVA support (Figure A7-2A), close examination of the isolates within this subcluster revealed very close genetic relationships, with most differences involving only a single repeat change at a single VNTR locus (data not shown). This is consistent with an outbreak scenario originating from a single introduction and strengthens the identification of this subcluster as a genetic group. In contrast, subclade I.A isolates outside of the Mahajanga I.A subcluster exhibited much greater variation both in the number of VNTR loci displaying polymorphisms and the number of alleles observed at those loci (data not shown), consistent with an older, more geographically dispersed and more differentiated set of isolates.

Our data also suggest that there have been multiple transfers of Y. pestis between Mahajanga and the central highlands. Specifically, seven isolates within the Mahajanga I.A subcluster were isolated from central highland locations rather than from Mahajanga (Figure A7-2A), suggesting that Y. pestis was also transferred back from Mahajanga to the central highlands. Two other Mahajanga isolates belonged to subclade I.F and were unaffiliated, respectively (Figure A7-2A), suggesting that there has been more than one introduction of Y. pestis to Mahajanga as well. The final three Mahajanga isolates, although they belonged to subclade I.A, were not part of the Mahajanga I.A subcluster and were instead more closely related to subclade I.A isolates from the central highlands (Figure A7-2A), again suggesting multiple introductions. However, it is unclear as to whether any of these other introductions became established in Mahajanga due to the lack of other Mahajanga isolates similar to these five outliers. Finally, although our data suggest that there have been multiple transfers of Y. pestis between Mahajanga and the central highlands, there is no evidence in these data for an introduction to Mahajanga from the northern highlands, as was previously suggested by PFGE analyses (Boisier et al., 2002; Duplantier et al., 2005).


Madagascar is one of the most active plague regions in the world. However, few studies have investigated the molecular epidemiology of Y. pestis from Madagascar and none have done so using very high resolution genomic methodologies. Here, we investigated the phylogeography and molecular epidemiology of Y. pestis in Madagascar by using a combination of SNPs and MLVA to analyze 262 Malagasy isolates from 25 districts from 1939–2005. In contrast with previous analyses that utilized ribotyping or SNPs alone (Guiyoule et al., 1997; Morelli et al., 2010), we identified a very high level of genetic diversity with 226 MLVA genotypes among the 262 isolates. These genotypes were distributed amongst 15

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

subclades that displayed significant geographic separation (Figure A7-3), leading to insights into the maintenance and spread of plague in Madagascar.

The use of MLVA was particularly effective at identifying genetic groups in Madagascar. SNPs, though useful, mostly provided confidence in genetic groups that were already apparent via MLVA. This is somewhat counter to the conventional hierarchical approach wherein SNPs are used first to identify major genetic groups followed by MLVA to provide resolution within those groups, thus minimizing the problems of mutational saturation/homoplasy that can occur with highly variable markers such as VNTRs (Keim et al., 2004). In this study, only SNP Mad-43 (Table S2), which differentiated Groups I and II, was useful in this conventional sense to identify “major genetic groups” that were obscured in the MLVA phylogeny (data not shown). All of the other subclades identified by SNPs were also identified by MLVA, suggesting that at this regional scale, MLVA alone may be effective at identifying robust genetic groups. Importantly, though MLVA was excellent at identifying these genetic groups, the relationships among those groups, such as the division between Groups I and II, remained unclear using MLVA alone (data not shown) whereas they were very clearly depicted as a star phylogeny in the SNP phylogeny (Figure A7-1). Where knowledge of deeper genetic relationships or fine-scale phylogenetic analysis of specific lineages (e.g., the strain MG05-1020 lineage here) is desired, SNPs will remain the preferred methodology for clonal pathogens such as Y. pestis. However, until whole genome sequencing for entire isolate collections becomes feasible, MLVA will continue to be a useful tool for examining genetic diversity whether used in conjunction with SNPs or alone.

Our analyses suggest that plague is being maintained in Madagascar in multiple geographically separated subpopulations. We revealed significant geographic separation among the identified subclades (Figure A7-3), suggesting that these subclades are undergoing local cycling with limited gene flow from other subclades. This is consistent with the population genetics and ecology of the black rat (Rattus rattus), the primary plague host in rural Madagascar (Brygoo, 1966; Duplantier, 2001). The black rat in Madagascar exhibits limited gene flow between subpopulations (Gilabert et al., 2007) as well as limited geographic ranges (Rahelinirina et al., 2010). This limited mobility, a high reproduction rate (Duplantier and Rakotondravony, 1999), and the development of some resistance to plague (Tollenaere et al., 2010) are all likely important factors that allow the black rat to maintain plague in these genetically distinct, geographically separated subpopulations. The two flea vectors, X. cheopis and S. fonquerniei (Duplantier, 2001; Duplantier and Rakotondravony, 1999), may also play a role in maintaining genetically distinct subpopulations (i.e., Groups I and II), though more data would be needed to confirm this hypothesis.

In contrast, transport of Y. pestis across longer distances in Madagascar is likely human-mediated. Historically, there is ample evidence for the influence of human traffic on the spread of plague, including transport along trade routes

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

such as the Silk Road in the early pandemics and transport via steam ship to numerous new locations during the “third” pandemic (Morelli et al., 2010; Perry and Fetherston, 1997). The SNP phylogeny determined by Morelli et al. (2010) suggests the progression of plague from Israel to Madagascar to Turkey (Figure A7-1), a series of transfer events that were almost certainly human-mediated, though the details remain unknown. In Madagascar, plague was most likely transported from its introduction point on the coast to the central highlands, where it became permanently established, via the railroad linking Toamasina and Antananarivo (Brygoo, 1966). More recently, plague was most likely reintroduced to Mahajanga via the transport of infected rats and fleas together with foodstuffs from the central highlands. Indeed, our data suggest multiple transfers between Mahajanga and the central highlands, all likely human-mediated. Additional long distance transfers of Y. pestis in Madagascar are suggested by the multiple subclades identified in cities/communes such as Antananarivo and Andina Firaisana (Figure A7-3, S1, Table S1).

Though long distance transfers of Y. pestis undoubtedly occur, it is unclear how often such transfers result in the successful establishment of the transferred genotypes in new locations. At least one transfer to Mahajanga became successfully established and underwent local cycling as evidenced by the Mahajanga I.A subcluster described here (Figure A7-2A). However, many of the other examples of long distance transfers where multiple subclades were found in a single location are not as clear regarding the establishment of the transferred subclade(s). Antananarivo, for example, is clearly dominated by subclade I.A with only 1–2 representatives of each of the other five subclades identified there (Figure A7-3, S1, Table S1), suggesting that the presence of these alternative subclades may have been only transitory.

Successful establishment of subclades in new locations following a long distance transfer may be related to adaptive advantages possessed by some genotypes (Keim and Wagner, 2009). For instance, subclade I.A appears to be particularly successful in our analysis. The earliest subclade I.A isolate in our dataset was collected in 1974 from the Ambositra district (Table S1), one of the most active plague districts in Madagascar (Chanteau et al., 2000). Subsequent isolates indicate that this subclade continued to exist in a small area of the Ambositra district but also became well established over a large geographic area including and surrounding the capital, Antananarivo. This subclade was also successfully introduced to and established in Mahajanga and appears to have been transferred to the Fianarantsoa district, though it is unclear whether or not it became established there (Figure A7-3, S1, Table S1). This widespread geographical success may indicate that this subclade possesses an adaptive advantage that enhances its ability to be transferred long distances and become established in new locations (Keim and Wagner, 2009). Alternatively, the particular success of this subclade may simply be due to chance.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

The central highlands focus remains the most active plague focus in Madagascar (Chanteau et al., 2000) and is, consequently, a likely place for new genotypes to emerge. This is particularly true for those central highlands districts with the highest plague activity. For instance, the three unique ribotypes identified in a previous study belonged to isolates from two highly active districts, Ambositra and Ambohimahasoa (Guiyoule et al., 1997). Here, isolates belonging to Group II and its subclades were found in three highly active districts, Betafo, Manandriana, and Ambositra (Figure A7-3, S1). As discussed above, Ambositra may also have been the district of origin for the highly successful subclade I.A. Overall, the Ambositra district was one of the two most diverse districts in our analysis, containing representatives from six different subclades (Figure A7-3, Table S1). This diversity is consistent with the Ambositra district’s status as one of the three most important plague districts in Madagascar (Chanteau et al., 1998; 2000).

The maintenance and spread of Y. pestis in Madagascar is a dynamic and highly active process, depending on the natural cycle between the black rat and its flea vectors as well as human activity. Y. pestis in Madagascar is maintained in multiple, genetically distinct, geographically separated subpopulations, likely via the black rat. The exact geographic landscape of these subpopulations is probably ever changing, with some subclades going extinct or decreasing in frequency (e.g., subclade I.K), new subclades emerging and becoming established, and some subclades being transferred to new locations, where they may become established either temporarily or more long-term. Much of the long distance spread of Y. pestis in Madagascar is likely due to human activities that allow for the transport of plague infected rats and fleas from one location to another.


We would like to thank Dr. Kimothy Smith for initially suggesting the collaboration that led to this work. Note that the use of products/names does not constitute endorsement by the DHS of the United States.

Author Contributions

Conceived and designed the experiments: AJV DMW SC PK. Performed the experiments: AJV FC JL RN. Analyzed the data: AJV DMW. Contributed reagents/materials/analysis tools: FC PR ME JR LR BWR SMB-S MA SC. Wrote the paper: AJV PK.


This work was funded by the Department of Homeland Security Science and Technology Directorate (award numbers NBCH2070001 and HSHQDC-08-C-00158), the Cowden Endowment in Microbiology at Northern Arizona

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

University, and the National Institute of Allergy and Infectious Diseases (NIAID), US National Institutes of Health (NIH), Department of Health and Human Services (HHS) (award number AI065359). This work was also supported by the Science Foundation of Ireland (award number 05/FE1/B882) (MA), the NIAID NIH HHS (award number N01 AI-30071) (ME JR), the Malagasy Ministry of Health (contract Nu01/95 IDA 2252-MAG) (FC LR BWR SC), and the French Cooperation (FAC Nu 94008 300) (FC LR BWR SC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Achtman M, Morelli G, Zhu P, Wirth T, Diehl I, et al. (2004) Microevolution and history of the plague bacillus, Yersinia pestis. Proc Natl Acad Sci U S A 101: 17837-17842.

Achtman M, Zurth K, Morelli G, Torrea G, Guiyoule A, et al. (1999) Yersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis. Proc Natl Acad Sci U S A 96: 14043-14048.

Auerbach RK, Tuanyok A, Probert WS, Kenefic L, Vogler AJ, et al. (2007) Yersinia pestis evolution on a small timescale: comparison of whole genome sequences from North America. PLoS One 2: e770.

Boisier P, Rahalison L, Rasolomaharo M, Ratsitorahina M, Mahafaly M, et al. (2002) Epidemiologic features of four successive annual outbreaks of bubonic plague in Mahajanga, Madagascar. Emerg Infect Dis 8: 311-316.

Boisier P, Rasolomaharo M, Ranaivoson G, Rasoamanana B, Rakoto L, et al. (1997) Urban epidemic of bubonic plague in Majunga, Madagascar: epidemiological aspects. Trop Med Int Health 2: 422-427.

Brygoo ER (1966) Epidémiologie de la peste à Madagascar. Arch Inst Pasteur Madagascar 35: 9-147.

Chain PS, Hu P, Malfatti SA, Radnedge L, Larimer F, et al. (2006) Complete genome sequence of Yersinia pestis strains Antiqua and Nepal516: evidence of gene reduction in an emerging pathogen. J Bacteriol 188: 4453-4463.

Chanteau S, Ratsifasoamanana L, Rasoamanana B, Rahalison L, Randriambelosoa J, et al. (1998) Plague, a reemerging disease in Madagascar. Emerg Infect Dis 4: 101-104.

Chanteau S, Ratsitorahina M, Rahalison L, Rasoamanana B, Chan F, et al. (2000) Current epidemiology of human plague in Madagascar. Microbes Infect 2: 25-31.

Clarke KR (1993) Non-parametric multivariate analyses of changes in community structure. Aust J Ecol 18: 117-143.

Cui Y, Li Y, Gorgé O, Platonov ME, Yan Y, et al. (2008) Insight into microevolution of Yersinia pestis by clustered regularly interspaced short palindromic repeats. PLoS ONE 3: e2652.

Delcher AL, Phillippy A, Carlton J, Salzberg SL (2002) Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res 30: 2478-2483.

Deng W, Burland V, Plunkett G, 3rd, Boutin A, Mayhew GF, et al. (2002) Genome sequence of Yersinia pestis KIM. J Bacteriol 184: 4601-4611.

Duplantier JM (2001) The black rat’s role in spreading human plague in Madagascar. L’Institut de recherche pour le développement Scientific Bulletin 131: 1-3.

Duplantier JM, Duchemin JB, Chanteau S, Carniel E (2005) From the recent lessons of the Malagasy foci towards a global understanding of the factors involved in plague reemergence. Vet Res 36: 437-453.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Duplantier JM, Rakotondravony D (1999) The rodent problem in Madagascar: agricultural pest and threat to human health. In: Singleton G, Hinds L, Leirs H, Zhang Z, editors. Ecologically-based management of rodent pests. Canberra: Australian Centre for International Agricultural Research. pp. 441-459.

Eppinger M, Guo Z, Sebastian Y, Song Y, Lindler LE, et al. (2009) Draft genome sequences of Yersinia pestis isolates from natural foci of endemic plague in China. J Bacteriol 191: 7628-7629.

Eppinger M, Worsham PL, Nikolich MP, Riley DR, Sebastian Y, et al. (2010) Genome sequence of the deep-rooted Yersinia pestis strain Angola reveals new insights into the evolution and pangenome of the plague bacterium. J Bacteriol 192: 1685-1699.

Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res 8: 175-185.

Galimand M, Guiyoule A, Gerbaud G, Rasoamanana B, Chanteau S, et al. (1997) Multidrug resistance in Yersinia pestis mediated by a transferable plasmid. N Engl J Med 337: 677-680.

Garcia E, Worsham P, Bearden S, Malfatti S, Lang D, et al. (2007) Pestoides F, an atypical Yersinia pestis strain from the former Soviet Union. Adv Exp Med Biol 603: 17-22.

Gilabert A, Loiseau A, Duplantier JM, Rahelinirina S, Rahalison L, et al. (2007) Genetic structure of black rat populations in a rural plague focus in Madagascar. Can J Zool 85: 965-972.

Girard JM, Wagner DM, Vogler AJ, Keys C, Allender CJ, et al. (2004) Differential plague-transmission dynamics determine Yersinia pestis population genetic structure on local, regional, and global scales. Proc Natl Acad Sci U S A 101: 8408-8413.

Guiyoule A, Grimont F, Iteman I, Grimont PA, Lefévre M, et al. (1994) Plague pandemics investigated by ribotyping of Yersinia pestis strains. J Clin Microbiol 32: 634-641.

Guiyoule A, Rasoamanana B, Buchrieser C, Michel P, Chanteau S, et al. (1997) Recent emergence of new variants of Yersinia pestis in Madagascar. J Clin Microbiol 35: 2826-2833.

Huang XZ, Chu MC, Engelthaler DM, Lindler LE (2002) Genotyping of a homogeneous group of Yersinia pestis strains isolated in the United States. J Clin Microbiol 40: 1164-1173.

Johansson A, Farlow J, Larsson P, Dukerich M, Chambers E, et al. (2004) Worldwide genetic relationships among Francisella tularensis isolates determined by multiple-locus variable-number tandem repeat analysis. J Bacteriol 186: 5808-5818.

Keim P, Van Ert MN, Pearson T, Vogler AJ, Huynh LY, et al. (2004) Anthrax molecular epidemiology and forensics: using the appropriate marker for different evolutionary scales. Infect Genet Evol 4: 205-213.

Keim PS, Wagner DM (2009) Humans and evolutionary and ecological forces shaped the phylogeography of recently emerged diseases. Nat Rev Microbiol 7: 813-821.

Kingston JJ, Tuteja U, Kapil M, Murali HS, Batra HV (2009) Genotyping of Indian Yersinia pestis strains by MLVA and repetitive DNA sequence based PCRs. Antonie Van Leeuwenhoek 96: 303-312.

Klevytska AM, Price LB, Schupp JM, Worsham PL, Wong J, et al. (2001) Identification and characterization of variable-number tandem repeats in the Yersinia pestis genome. J Clin Microbiol 39: 3179-3185.

Kumar S, Tamura K, Jakobsen IB, Nei M (2001) MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17: 1244-1245.

Laventure S, Andrianaja V, Rasoamanana B (1991) Epidémie de peste à Majunga en 1991. Rapport de mission de l’Institut Pasteur de Madagascar 1991: 1-26. 1-26.

Li Y, Cui Y, Hauck Y, Platonov ME, Dai E, et al. (2009) Genotyping and phylogenetic analysis of Yersinia pestis by MLVA: insights into the worldwide expansion of Central Asia plague foci. PLoS One 4: e6000.

Li Y, Dai E, Cui Y, Li M, Zhang Y, et al. (2008) Different region analysis for genotyping Yersinia pestis isolates from China. PLoS ONE 3: e2166.

Lowell JL, Zhansarina A, Yockey B, Meka-Mechenko T, Stybayeva G, et al. (2007) Phenotypic and molecular characterizations of Yersinia pestis isolates from Kazakhstan and adjacent regions. Microbiology 153: 169-177.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Lucier TS, Brubaker RR (1992) Determination of genome size, macrorestriction pattern polymorphism, and nonpigmentation-specific deletion in Yersinia pestis by pulsed-field gel electrophoresis. J Bacteriol 174: 2078-2086.

Migliani R, Chanteau S, Rahalison L, Ratsitorahina M, Boutin JP, et al. (2006) Epidemiological trends for human plague in Madagascar during the second half of the 20th century: a survey of 20,900 notified cases. Trop Med Int Health 11: 1228-1237.

Migliani R, Ratsitorahina M, Rahalison L, Rakotoarivony I, Duchemin JB, et al. (2001) [Resurgence of the plague in the Ikongo district of Madagascar in 1998. 1. Epidemiological aspects in the human population]. Bull Soc Pathol Exot 94: 115-118.

Morelli G, Song Y, Mazzoni CJ, Eppinger M, Roumagnac P, et al. (2010) Yersinia pestis genome sequencing identifies patterns of global phylogenetic diversity. Nat Genet 42: 1140-1143.

Motin VL, Georgescu AM, Elliott JM, Hu P, Worsham PL, et al. (2002) Genetic variability of Yersinia pestis isolates as predicted by PCR-based IS100 genotyping and analysis of structural genes encoding glycerol-3-phosphate dehydrogenase (glpD). J Bacteriol 184: 1019-1027.

Parkhill J, Wren BW, Thomson NR, Titball RW, Holden MT, et al. (2001) Genome sequence of Yersinia pestis, the causative agent of plague. Nature 413: 523-527.

Pearson T, Busch JD, Ravel J, Read TD, Rhoton SD, et al. (2004) Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing. Proc Natl Acad Sci U S A 101: 13536-13541.

Perry RD, Fetherston JD (1997) Yersinia pestis-etiologic agent of plague. Clin Microbiol Rev 10: 35-66.

Pourcel C, André-Mazeaud F, Neubauer H, Ramisse F, Vergnaud G (2004) Tandem repeats analysis for the high resolution phylogenetic analysis of Yersinia pestis. BMC Microbiol 4: 22.

Rahelinirina S, Duplantier JM, Ratovonjato J, Ramilijaona O, Ratsimba M, et al. (2010) Study on the movement of Rattus rattus and evaluation of the plague dispersion in Madagascar. Vector Borne Zoonotic Dis 10: 77-84.

Rasolomaharo M, Rasoamanana B, Andrianirina Z, Buchy P, Rakotoarimanana N, et al. (1995) Plague in Majunga, Madagascar. Lancet 346: 1234.

Rotz LD, Khan AS, Lillibridge SR, Ostroff SM, Hughes JM (2002) Public health assessment of potential biological terrorism agents. Emerging infectious diseases 8: 225-230.

Song Y, Tong Z, Wang J, Wang L, Guo Z, et al. (2004) Complete genome sequence of Yersinia pestis strain 91001, an isolate avirulent to humans. DNA Res 11: 179-197.

Tollenaere C, Rahalison L, Ranjalahy M, Duplantier JM, Rahelinirina S, et al. (2010) Susceptibility to Yersinia pestis experimental infection in wild Rattus rattus, reservoir of plague in Madagascar. Ecohealth 7: 242-247.

Touchman JW, Wagner DM, Hao J, Mastrian SD, Shah MK, et al. (2007) A North American Yersinia pestis draft genome sequence: SNPs and phylogenetic analysis. PLoS One 2: e220.

Van Ert MN, Easterday WR, Huynh LY, Okinaka RT, Hugh-Jones ME, et al. (2007) Global genetic population structure of Bacillus anthracis. PLoS One 2: e461.

Vogler AJ, Birdsell D, Price LB, Bowers JR, Beckstrom-Sternberg SM, et al. (2009) Phylogeography of Francisella tularensis: global expansion of a highly fit clone. J Bacteriol 191: 2474-2484.

Welch TJ, Fricke WF, McDermott PF, White DG, Rosso ML, et al. (2007) Multiple antimicrobial resistance in plague: an emerging public health risk. PLoS One 2: e309.

World Health Organization (WHO) (2010) Human plague: review of regional morbidity and mortality, 2004–2009. Wkly Epidemiol Rec 85: 40-45.

Zhang X, Hai R, Wei J, Cui Z, Zhang E, et al. (2009) MLVA distribution characteristics of Yersinia pestis in China and the correlation analysis. BMC Microbiol 9: 205.

Zhang Z, Hai R, Song Z, Xia L, Liang Y, et al. (2009) Spatial variation of Yersinia pestis from Yunnan Province of China. Am J Trop Med Hyg 81: 714-717.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.



Folker Meyer30and Elizabeth M. Glass30


Next-generation sequencing (NGS) has opened up access to genomic data from diverse microbial communities, and studies are emerging that cover a wide variety of systems (Gilbert et al., 2010; Human Microbiome Project, 2012; Nealson and Venter, 2007; Tara Expeditions, 2012; Terragenome Consortium, 2012). A number of techniques are used to extract genome-based information either using single reference genes (usually 16s rDNA) or random shotgun metagenomics using entire genomes. There is an abundance of reviews of the subject (Desai et al., 2012; Thomas et al., 2012). Systems such as MG-RAST (Meyer et al., 2008) now provide access to thousands of metagenomic data sets (see Figure A8-1).

However this newly found data richness is not without its challenges. The main problem stems from dramatic changes that converted an ecosystem that was until recently data poor, to one that is now overflowing with data. Environmental biology and molecular ecology went from being overwhelmed by several hundred megabytes of data generated in 2005 by the Global Ocean Survey (Nealson and Venter, 2007) to generating many terabytes of data in 2012. Biology and medicine, however, lack the tradition and experience to handle big data—only a few areas such as cancer diagnosis have established stable pipelines and data formats that allow exchange.

Shotgun Metagenomics as an Example

Metagenomic shotgun sequencing using NGS technology can serve as a blueprint for how biology is and will be impacted by big data. With sequence data already significantly cheaper than the corresponding analysis (Wilkening et al., 2009), and as the cost of sequencing drops by a factor of 10 annually, it seems clear that a paradigm shift will be required to handle data analysis and storage.

There is significant value in the comparative analysis of metagenomic data sets, yet comparison requires data sets to have undergone more or less the same


30 Argonne National Laboratory, Institute for Genomics and Systems Biology, 9700 S. Cass Ave, Lemont, IL 60439, U.S.A.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


FIGURE A8-1 The MG-RAST system has more than 58,000 metagenomic data sets totaling over 16.5 terabase pairs of information.

analytical processes. The existing paradigm of publishing the raw data and summary statistics in tabular form as auxiliary material does not allow any third parties to benefit from the work already done; instead it requires any future authors to re-analyze data. Consider the Human Microbiome Project (Turnbaugh et al., 2007) that has recently published more than 5 terabases of sequence data from 172 human subjects. Any researcher attempting to compare their finding to the data will discover that they need to re-analyze all of the data.

As an interesting side note, the value and need for comparative analyses also necessitates asking questions of how the reviewing process was handled. Did the reviewers in fact take a look at any of the analysis performed, or did they take the results produced by a complex sequence analysis pipeline at face value? Often the information that was derived from the data is a product of a complex pipeline that was not described in sufficient detail. Further, the reviewer has no way to know whether the same results could be obtained from the same data. A rigorous review process cannot possibly be maintained under current mechanisms and requirements. The prohibitive cost renders such analyses effectively irreproducible. Thus, it is all the more critical to require detailed documentation of data handling in analyses.

One of the key missing concepts is the notion of rigorous analysis of data quality prior to deriving any statements about biology from the data. Many factors contribute to data quality; in DNA sequencing, noise can be added at various steps in the pipeline. While some vendor-specific schemes exist to determine sequence

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

noise, in the past there were no generally accepted vendor-neutral ways to characterize the noise in sequence data. For 16s rDNA-based amplicon metagenomic data, this has already led to a major debate over the amount of microbial diversity (Reeder and Knight, 2010), leading to a number of approaches that will de-noise sequence data prior to analysis (Quince et al., 2009; Reeder and Knight, 2010). For shotgun metagenomics the DRISEE approach (Keegan et al., 2012) provides a vendor-neutral estimate of sequence error. Interestingly, results show that the errors found are not specific to sequencing platforms; variations in quality within the platforms are significant, as highlighted by Figure A8-2.

Taking the data sets underlying Figure A8-2 as an example, the sequence analysis pipelines will be required to use different approaches for data with less than 1 percent error and with more than 45 percent error. The MG-RAST analysis pipeline was the first to include the examination of sequence quality systematically and to highlight sequence quality issues. A surprising amount of data sets submitted to the system are rejected initially because they are, for example, too low in quality or contain contamination.

In addition to “low-level” sequence error, a number of other significant sources of problems exist. Tom Schmidt’s group described the existence of artificially duplicated reads in 454 data (Gomez-Alvarez et al., 2009). (These artifacts also exist in Ion Torrent and Illumina data.) If left uncorrected, such duplicates


FIGURE A8-2 The DRISEE error profiles for two anonymous projects with three shotgun libraries. The predicted cumulative error per position is plotted showing dramatic variation with the green data set near perfect and the purple data exceeding 40 percent error after 70 bases.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


FIGURE A8-3 A simple representation of average base abundance per base demonstrates that data are not distributed randomly. The four bases are represented green (A), blue (C), yellow (G), and red (T); black indicates a missing base call. In this (anonymous) data set, the first 53 base pairs contain an adapter. Note the peak of A’s (green) at positions 54–73, likely indicating the presence of poly-A artifacts in addition to the adapters.

lead to significant biases in the interpretation of sequence data, as some areas of sequence are misrepresented.

Other frequently found artifacts include leftover adapter sequences or primer di-mers (see Figure A8-3 for an example of both problems). Most researchers will agree that for a shotgun sample, a more or less even distribution of each base on each position of all reads is to be expected. Unfortunately, in cases that deviate from the expected distribution, as in Figure A8-3, the only information that can be derived from the sequence data is that the sequencing run has failed. Yet, the data shown in Figure A8-3 have been interpreted biologically and were accepted for publication.

Once the potential obstacles with data quality have been eliminated, a number of bioinformatics tools can be used to predict genes of interest for downstream analysis. While significant progress has been made in recent years for the prediction of genes in more or less complete microbial genomes (Overbeek et al., 2007), the same cannot be said for the state of the art for predicting genes in noisy metagenomic data. Trimble et al. (2012) show that only one of the existing tools accounts for the possibility that sequences might contain sequencing error, despite that, as we have mentioned above, the presence of noise (or imperfect

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

data) is a reality in shotgun metagenomics where data are frequently not or only partially assembled.

A side effect of assuming all data to be perfect is the significant impairment of tool performance when tools are used on data with realistic error properties. This mismatch of assumption with reality leads to performance reductions of 10–20 percent accuracy in the presence of 3 percent error (see Trimble et al., 2012).

Following gene prediction, functional assignments are computed by mapping the prediction features against a database of known proteins in some form. Again different pipelines apply different approaches; however, they all rely on some flavor of sequence similarity searching (e.g., BLAST [Altschul et al., 2009], BLAT [Kent, 2002], HMMer [Eddy, 1998]). What all these approaches have in common is their failure to identify novelty. Any unknown gene will remain uncharacterized, and a non-homologous replacement for a known protein function (no matter how important for the sample) will be represented as an unknown protein in the metagenomic analysis presented.

While this might present a problem in some cases, the majority of the protein databases contain a variety of annotations for proteins of the same function. Often, multiple annotations for proteins that are 100 percent sequence identical can be found (e.g., the alcohol dehydrogenase gene from various Streptococcus strains has 20 different annotations). While some annotations would be informative for a human reader, in the majority of cases, a computer would not be able to recognize the fact that the two functions described are identical. When comparing the relative abundance of alcohol dehydrogenase genes, results would be affected by the fact that several alcohol dehydrogenase genes would have slightly different names. As a result, annotations derived from similarity searches are less then useful for quantitative studies in microbial ecology unless carried out on a higher functional level as for example KEGG (Kanehisa, 2002) pathways or the very successful SEED subsystems (Overbeek et al., 2005). These higher-level aggregations subsume a significant number of genes in categories (e.g., KEGG pathways).

Using those categories to represent gene function abundance, we can represent the genetic material in environmental samples as an abundance vector, allowing cross-sample comparison of gene abundance. With the newly minted BIOM (McDonald et al., 2012) format, various existing tools, such as MG-RAST (Meyer et al., 2008) and QIIME (Caporaso et al., 2010) can be used together to analyze functional abundance data.

An Attempt to Estimate the Scale of the Problem

“Raw” sequence data can be transformed into abundance vectors that describe the abundance of specific gene categories in environmental samples. However even slight variations in one of the transforming steps will introduce

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

significant variations in the outcome, as would be the case for two data sets that have used different gene prediction algorithms with different levels of tolerance for sequence error.

Because there is no culture of sharing data beyond raw sequences in the INSDC archives, consumers of data from any specific study can either rely on tables published as auxiliary material with a study, or they can re-analyze the data. As mentioned before, the computational cost makes that undesirable (Wilkening et al., 2009), Imagine yourself in the position of comparing your data to the Human Microbiome Project jump-start project (described above). Instead of simply analyzing their own data, researchers would find themselves re-analyzing various other data sets they are using in comparison. While this type of approach was common in the early days of genomics and still is used for single microbial genomes—such as SEED (Aziz et al., 2008), IMG (Markowitz et al., 2006), and GenDB (Meyer et al., 2003)—using the same approach for more computationally demanding data types like metagenomes will lead to a situation in which groups are no longer limited by their ability to acquire sequence data, but by their ability to analyze it. While it is likely that metagenomic data analysis will undergo significant improvements (as compared to the improvements for individual microbial genomes discussed in Overbeek et al. [2007]), that will not suffice. With sequencing costs continuing to drop, data acquisition costs will soon be a small fraction of the data analysis cost.

Sharing of computational results (e.g., gene calling results, computed similarities, and other intermediate data types) would alleviate this problem. But this only works if the community can agree on a small number of standard formats to represent the data, and only if the majority of the tools support those standards.

Ways Out of the Current Dilemma

While for the first time biology now has access to abundant data, the challenges in handling the data and using the data in a robust way to define new research hypotheses seem insurmountable. We have described the challenge of using big data in the context of metagenomics above.

A number of aspects to this challenge need to be addressed separately. Perhaps the most important aspect is a change of culture recognizing that in the presence of abundant data, the standard operating procedures from before are no longer sufficient. Among the things that need to change are data archives that now can no longer attempt to capture all data, computational approaches that need to take computational costs into account, and last but not least the individual researchers that need to learn that data analysis must be planned and budgeted for appropriately.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

One suggestion likely acceptable to most readers is that, in the presence of big data, computational data analysis needs to take a more prominent role in the training of the next generation of bio-scientists.

There are also some technical steps that can be taken to improve the current situation. Standards for describing sequence data have been established by the Genomics Standards Consortium (GSC) (Field et al., 2011). These standards are currently enabling data exchange at a hitherto unprecedented scale, enabling data consumption by many third parties for many purposes.

Based on these positive experiences, the GSC now has initiated a long-term project to define standards for sharing processed data. The results of this M5 project will enable researchers to consume data from a published study for their analysis without having to re-analyze everything from scratch and yet allowing them to both change and understand all the fine details of these studies.

In addition, the M5 project aims to define encodings for data that will allow existing and new analysis service providers to exchange analyzed data. This ability will allow comparison of different analytical approaches and will help with the evolution of analysis approaches. We predict that this new openness will lead to more acceptance for the established analysis approaches and reduce the number of ad hoc analysis pipelines that reinvent analysis processes. By embracing the needed cultural changes, adhering to standards, and promoting openness, many third parties will be liberated to innovate like never before.


Altschul, S. F., E. M. Gertz, R. Agarwala, A. A. Schaffer, and Y. K. Yu. 2009. PSI-BLAST pseudocounts and the minimum description length principle. Nucleic Acids Research 37(3):815-824.

Aziz, R. K., D. Bartels, A. A. Best, M. DeJongh, T. Disz, R. A. Edwards, K. Formsma, S. Gerdes, E. M. Glass, M. Kubal, F. Meyer, G. J. Olsen, R. Olson, A. L. Osterman, R. A. Overbeek, L. K. McNeil, D. Paarmann, T. Paczian, B. Parrello, G. D. Pusch, C. Reich, R. Stevens, O. Vassieva, V. Vonstein, A. Wilke, and O. Zagnitko. 2008. The RAST server: Rapid annotations using subsystems technology. BMC Genomics 9:75.

Caporaso, J. G., J. Kuczynski, J. Stombaugh, K. Bittinger, F. D. Bushman, E. K. Costello, N. Fierer, A. G. Pena, J. K. Goodrich, J. I. Gordon, G. A. Huttley, S. T. Kelley, D. Knights, J. E. Koenig, R. E. Ley, C. A. Lozupone, D. McDonald, B. D. Muegge, M. Pirrung, J. Reeder, J. R. Sevinsky, P. J. Turnbaugh, W. A. Walters, J. Widmann, T. Yatsunenko, J. Zaneveld, and R. Knight. 2010. QIIME allows analysis of high-throughput community sequencing data. Nature Methods 7(5):335-336.

Desai, N., D. Antonopoulos, J. A. Gilbert, E. M. Glass, and F. Meyer. 2012. From genomics to metagenomics. Current Opinion in Biotechnology 23(1):72-76.

Eddy, S. R. 1998. Profile hidden Markov models. Bioinformatics 14(9):755-763.

Field, D., L. Amaral-Zettler, G. Cochrane, J. R. Cole, P. Dawyndt, G. M. Garrity, J. Gilbert, F. O. Glöckner, L. Hirschman, and I. Karsch-Mizrachi. 2011. The Genomic Standards Consortium. PLoS Biology 9(6):e1001088.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Gilbert, J. A., F. Meyer, J. Jansson, J. Gordon, N. Pace, J. Tiedje, R. Ley, N. Fierer, D. Field, N. Kyrpides, F. O. Glöckner, H. P. Klenk, K. E. Wommack, E. Glass, K. Docherty, R. Stevens, and R. Knight. 2010. The Earth Microbiome Project: Meeting report of the “1 EMP meeting on sample selection and acquisition” at Argonne National Laboratory, October 6, 2010. Standards in Genomic Sciences 3(3):249-253.

Gomez-Alvarez, V., T. K. Teal, and T. M. Schmidt. 2009. Systematic artifacts in metagenomes from complex microbial communities. ISME Journal 3(11):1314-1317.

HMP (Human Microbiome Project). 2012. Human Microbiome Project. (accessed October 2, 2012).

Kanehisa, M. 2002. The KEGG database. Novartis Foundation symposium 247:91-101; discussion 101-103, 119-128, 244-252.

Keegan, K. P., W. L. Trimble, J. Wilkening, A. Wilke, T. Harrison, M. D’Souza, and F. Meyer. 2012. A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE. PLoS Computational Biology 8(6):e1002541.

Kent, W. J. 2002. BLAT—the BLAST-like alignment tool. Genome Research 12(4):656-664.

Markowitz, V. M., F. Korzeniewski, K. Palaniappan, E. Szeto, G. Werner, A. Padki, X. Zhao, I. Dubchak, P. Hugenholtz, I. Anderson, A. Lykidis, K. Mavromatis, N. Ivanova, and N. C. Kyrpides. 2006. The integrated microbial genomes (IMG) system. Nucleic Acids Research 34 (Database issue):D344-D348.

McDonald, D., J. C. Clemente, J. Kuczynski, J. Rideout, J. Stombaugh, D. Wendel, A. Wilke, S. Huse, J. Hufnagle, F. Meyer, R. Knight, and J. Caporaso. 2012. The Biological Observation Matrix (BIOM) format or: How I learned to stop worrying and love the ome-ome. Gigascience 1:7.

Meyer, F., A. Goesmann, A. C. McHardy, D. Bartels, T. Bekel, J. Clausen, J. Kalinowski, B. Linke, O. Rupp, R. Giegerich, and A. Pühler. 2003. GenDB—an open source genome annotation system for prokaryote genomes. Nucleic Acids Research 31(8):2187-2195.

Meyer, F., D. Paarmann, M. D’Souza, R. Olson, E. M. Glass, M. Kubal, T. Paczian, A. Rodriguez, R. Stevens, A. Wilke, J. Wilkening, and R. A. Edwards. 2008. The metagenomics RAST server—A public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics [electronic resource] 9:386.

Nealson, K. H., and J. C. Venter. 2007. Metagenomics and the global ocean survey: What’s in it for us, and why should we care? ISME Journal 1(3):185-187.

Overbeek, R., T. Begley, R. M. Butler, J. V. Choudhuri, N. Diaz, H-Y. Chuang, M. Cohoon, V. de Crécy-Lagard, T. Disz, R. Edwards, M. Fonstein, E. D. Frank, S. Gerdes, E. M. Glass, A. Goesmann, A. Hanson, D. Iwata-Reuyl, R. Jensen, N. Hamshidi, L. Krause, M. Kubal, N. Larsen, B. Linke, A. C. McHardy, F. Meyer, H. Neuweger, G. Olsen, R. Olson, A. Osterman, V. Portnoy, G. D. Pusch, D. A. Rodionov, C. Rückert, J. Steiner, R. Stevens, I. Thiele, O. Vassieva, Y. Ye, O. Zagnitko, and V. Vonstein. 2005. The subsystems approach to genome annotation and its use in the Project to Annotate 1000 Genomes. Nucleic Acids Research 33(17).

Overbeek, R., D. Bartels, V. Vonstein, and F. Meyer. 2007. Annotation of bacterial and archaeal genomes: Improving accuracy and consistency. Chemical Reviews 107(8):3431-3447.

Quince, C., A. Lanzen, T. P. Curtis, R. J. Davenport, N. Hall, I. M. Head, L. F. Read, and W. T. Sloan. 2009. Accurate determination of microbial diversity from 454 pyrosequencing data. Nature Methods 6(9):639-641.

Reeder, J., and R. Knight. 2010. Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nature Methods 7(9):668-669.

Tara Expeditions. 2012. (accessed October 2, 2012).

Terragenome Consortium. 2012. (accessed October 2, 2012).

Thomas, T., J. Gilbert, and F. Meyer. 2012. Metagenomics—a guide from sampling to data analysis. Microbial Informatics and Experimentation 2(1):3.

Trimble, W. L., K. P. Keegan, M. D’Souza, A. Wilke, J. Wilkening, J. Gilbert, and F. Meyer. 2012. Short-read reading-frame predictors are not created equal: Sequence error causes loss of signal. BMC Bioinformatics [electronic resource] 13(1):183.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Turnbaugh, P. J., R. E. Ley, M. Hamady, C. M. Fraser-Liggett, R. Knight, and J. I. Gordon. 2007. The human microbiome project. Nature 449(7164):804-810.

Wilkening, J., A. Wilke, N. Desai, and F. Meyer. 2009. Using clouds for metagenomics: A case study. In: IEEE Cluster 2009.



Nicholas J. Loman,32Chrystala Constantinidou,32
Jacqueline Z. M. Chan,32Mihail Halachev,32Martin Sergeant,32
Charles W. Penn,32Esther R. Robinson,33and Mark J. Pallen32


Here, we take a snapshot of the high-throughput sequencing platforms, together with the relevant analytical tools, that are available to microbiologists in 2012, and evaluate the strengths and weaknesses of these platforms in obtaining bacterial genome sequences. We also scan the horizon of future possibilities, speculating on how the availability of sequencing that is ‘too cheap to metre’ might change the face of microbiology forever.

In bacteriology, the genomic era began in 1995, when the first bacterial genome was sequenced using conventional Sanger sequencing (Fleischmann et al., 1995). Back then, sequencing projects required six-figure budgets and years of effort. A decade later, in 2005, the advent of the first high-throughput (or “next-generation”) sequencing technologies signalled a significant advance in the ease and cost of sequencing (Metzker et al., 2005), delivering bacterial genome sequences in hours or days rather than months or years. High-throughput sequencing now delivers sequence data thousands of times more cheaply than is possible with Sanger sequencing. The availability of a growing abundance of platforms and instruments presents the user with an embarrassment of choice. Better still, vigorous competition between manufacturers has resulted in sustained


31 Reprinted with kind permission from Nature Publishing Group.

32 Nicholas J. Loman, Chrystala Constantinidou, Jacqueline Z. M. Chan, Mihail Halachev, Martin Sergeant, Charles W. Penn and Mark J. Pallen are at the Institute of Microbiology and Infection, University of Birmingham, Birmingham B15 2TT, UK.

33 Esther R. Robinson is at the Nuffield Department of Clinical Laboratory Sciences, University of Oxford, Oxford OX3 9DU, UK.

Correspondence to: Mark J. Pallen—Email:

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

technical improvements on almost all platforms. This means that in recent years our sequencing capability has been doubling every 6–9 months—much faster than Moore’s law.

Here, we describe the sequencing technologies themselves, examine the practicalities of producing a sequence-ready template from bacterial cultures and clinical samples, and weigh up the costs of labour and kits. We look at the types of data that are delivered by each instrument, and describe the approaches, programs and pipelines that can be used to analyse these data and thus move from draft to complete genomes.

Several high-throughput sequencing platforms are now chasing the US$1,000 human genome (Venter, 2010). Given that the average bacterial genome is less than one-thousandth the size of the human genome, a back-of-the-envelope calculation suggests that a $1 bacterial genome sequence is an imminent possibility. In closing, we assess how close to reality the $1 bacterial genome actually is and explore the ways in which high-throughput sequencing might change the way that all microbiologists work.

A Variety of Approaches

High-throughput sequencing platforms can be divided into two broad groups depending on the kind of template used for the sequencing reactions. The earliest, and currently most widely used, platforms depend on the production of libraries of clonally amplified templates. These are produced through amplification of immobilized libraries made from a single DNA molecule in the initial sample. More recently, we have seen the arrival of single-molecule sequencing platforms, which determine the sequence of single molecules without amplification. Within these broad categories, there is considerable variation in performance—including in throughput, read length and error rate—as well as in factors affecting usability, such as cost and run time.

Template amplification technologies In general terms, all of the platforms that are currently on the market rely on a three-stage workflow of library preparation, template amplification and sequencing (Figure A9-1). Library preparation begins with the extraction and purification of genomic DNA. Depending on the protocol, the amount of DNA required can vary from a few nanograms to tens of micrograms, meaning that success in this step depends on the ability to grow sufficient biomass. For some microorganisms, obtaining suitable DNA—in terms of quantity and quality—can prove difficult. Therefore, before using expensive reagents for library preparation and sequencing, it is advisable to confirm, by fluorometry, that DNA of sufficient quantity and quality has been obtained. However, purchasing a suitable instrument to do this adds to the costs of establishing a sequencing capability (Box A9-1).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


FIGURE A9-1 High-throughput sequencing platforms. The schematic shows the main high-throughput sequencing platforms available to microbiologists today, and the associated sample preparation and template amplification procedures. For full details, see main text. PGM, Personal Genome Machine. The tagmentation schematic is modified, with permission, from Adey et al. © (2010) BioMed Central.

For shotgun sequencing, an initial fragmentation step is required to generate random, overlapping DNA fragments. Depending on the platform and application, these fragments can range from 150 bp to 800 bp in length; size selection either involves harvesting from agarose gels or exploits paramagnetic-bead-based technology. The selected fragments must also be sufficiently abundant to provide comprehensive and even coverage of the target genome. Two types of fragmentation are widely used: mechanical and enzymatic. Early protocols relied on mechanical methods such as nebulization or ultrasonication. Nebulization is an

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

BOX A9-1
The Add-on Cost of Sequencing

The costs of sequencing instruments and reagents are not the only issues that need to be taken into account when setting up a sequencing facility for microbial applications. So, what else do you need? Well, first you have to buy a high-end fluorometer such as a Life Technologies Qubit (around US$2,000) and/or an Agilent Technologies 2100 Bioanalyzer (around $18,000). Then, if you want to save time by parallel processing, you should consider investing in an ultrasonicator (for example, from Covaris, at around $45,000) and a liquid-handling robot (for example, the Biomek FXP, at around $310,000, or one of the SPRIworks systems, at around $45,000; both from Beckman Coulter). To carry out sequencing on the 454 GS FLX+ instrument from Roche, you need a bead counter for emulsion PCR (up to $20,000), and for the Genome Analyzer IIx or HiSeq machines from Illumina, you need to buy an Illumina cBot (~$55,000). For some platforms, you may have to buy additional centrifuges and/or rotors; for example, the ULTRA-TURRAX Tube Drive system from IKA ($1,000) is required by the Ion Torrent platform (from Life Technologies) if the OneTouch system is not used. You also need to buy a server to take receipt of the data coming off your instrument (for example, a $5,000 desktop), and then a cluster of servers for analysing and storing the data (ranging from $20,000 upwards). In addition, you may have to update your laboratory infrastructure by investing in a dedicated electrical connection and appropriate air-conditioning units for your sequencing instrument, and uninterruptible power supplies for your sequencer and servers. Most laboratories also want to invest in a backup solution that is both fast and available. This may be a mirrored set of hard drives, or even a shelf full of disconnected USB drives. Illumina offers a cloud-based backup and basic-analysis solution called BaseSpace which can store sequence results as they are generated on the Illumina MiSeq. Currently, this is a free solution, but users are likely to have to pay a subscription in the future.

inexpensive method that can be easily adopted by any laboratory, but it results in large losses of input material and a broad range of fragment sizes, runs the risk of cross-contamination and cannot handle parallel processing. By contrast, ultrasonication instruments such as systems from Covaris or the Bioruptor systems from Diagenode allow parallel sample processing and minimize hands-on time and sample loss but come at a price that could be prohibitive for small laboratories. Mechanically generated fragments require repair and end-polishing before platform-specific adaptors can be ligated to the ends of the target molecules. These adaptors act as primer-binding sites for the subsequent template amplification reaction.

More recently, enzymatic methods have provided an alternative approach to producing random fragments of the desired length. These require less input DNA and offer easier, faster sample processing. Fragmentase (from New England Biolabs) is a mixture of a nuclease, which randomly nicks double-stranded

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

DNA, and a T7 endonuclease, which cleaves the DNA. Together, these enzymes generate random double-strand DNA breaks in a time-dependent manner, allowing the user to tailor protocols in order to obtain products of the required length. Adaptors can then be ligated to these fragments in the usual way. Tagmentation (Caruccio, 2011) is a promising transposase-based approach that, in a single step, fragments DNA and incorporates sequence tags, which then take the place of adaptors. Currently, the only available implementation of tagmentation is within the Nextera system, which is only available for the Illumina platform. Several companies have produced automated liquid-handling machines that greatly reduce the hands-on time required for fragmentation approaches but significantly increase costs (Box A9-1).

In addition to supporting fragment-based sequencing, all template amplification platforms support mate pair sequencing, in which the ends of DNA fragments of a certain size (typical sizes are 3 kb, 6 kb, 8 kb or 20 kb) are joined together to form circular molecules. These molecules are then fragmented a second time. Fragments flanking the joins are then selected and end adaptors added. Sequencing through the joins provides valuable information about the location of sequences dispersed across the genome, facilitating assembly.

Paired-end sequencing has similarities to mate pair sequencing, but DNA fragments are sequenced from each end without the need for additional library preparation steps. The Illumina platform has direct support for paired-end sequencing. Short fragments that are less than the read length from the forward and reverse ends (for example, 180 bp fragments combined with 2 × 100 base sequencing) permits overlapping pseudo long reads to be generated. Alternatively, fragments of up to ~800 bp can be used. Longer fragments may result in a loss of amplification efficiency. The Ion Personal Genome Machine (PGM) (using the Ion Torrent platform, from Life Technologies) also has a bidirectional sequencing protocol that requires the removal of the chip after the initial run, a digestion step and a second sequencing run using a different sequencing primer. All platforms can handle PCR products, allowing adaptor sequences to be incorporated into the 5′ ends of primers.

For all platforms, it is highly advisable to assess the quality and quantity of the sequence library before subjecting it to amplification. Different instruments for quality assessment are recommended by different manufacturers. Examples include the 2100 Bioanalyzer (from Agilent Technologies), fluorometers such as the NanoDrop 3300 (from Thermo Scientific) or the Qubit (from Life Technologies), and quantitative PCR using any of a number of available quantitative PCR machines along with either own-design or commercially available assays. Purchasing a suitable instrument for this step can add several thousand dollars to the costs of establishing a sequencing capability (Box A9-1).

In preparation for amplification, template molecules are immobilized on a solid surface, which is a flow cell for sequencing with the Illumina platform and solid beads or ion sphere particles for other approaches. Simultaneous solid-phase

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

amplification of millions or billions of spatially separated template fragments prepares the way for massively parallel sequencing. For the Illumina platform, template amplification is automated and is performed either directly on the instrument (for the MiSeq, and the HiSeq 2500 sequencer in rapid-run mode) or using the cBot, a separate instrument that is dedicated to this task (used in conjunction with the Genome Analyzer IIx and the HiSeq 2000 machine). Clusters are generated by bridge amplification on the surface of the flow cell. For platforms that use bead-based immobilization (the SOLiD [from Life Technologies], 454 and Ion Torrent platforms), amplified template sequence libraries are prepared off-instrument, relying on an emulsion PCR, in which the beads are enclosed in aqueous-phase microreactors and are kept separated from each other in a water-in-oil emulsion.

Sequencing chemistry Although these platforms rely on a sequencing-by-synthesis design, they differ in the details of the sequencing chemistry and the approach used to read the sequence. The Illumina sequencing platform depends on Solexa chemistry (Bentley et al., 2008), which includes reversible termination of sequencing products. In each sequencing cycle, a mixture of fluorescently labelled ‘reversible terminator’ nucleotides with protected 3′-OH groups (and a different emission wavelength for each nucleotide) is perfused across the flow cell. Wherever a complementary nucleotide is present on the template strand, the terminator is incorporated and imaged, and then the signal is quenched and the terminator nucleotide is chemically deprotected at the 3′-OH group.

The 454 and Ion Torrent sequencing platforms avoid the use of terminators. Instead, in each cycle a single kind of dNTP is flowed across the template. When there is base complementarity between the dNTP and the next available position in the template, the DNA polymerase incorporates the base onto the extending strand, liberating pyrophosphate and hydrogen ions. When there is no complementarity, DNA synthesis is halted temporarily; each type of dNTP is flowed across the template in turn according to the dispensing cycle, and DNA synthesis is thus re-initiated when the next complementary dNTP is added. The 454 platform exploits a pyrosequencing approach (Margulies et al., 2005; Ronaghi et al., 1998) whereby the presence of pyrophosphate is signalled by visible light as the result of an enzyme cascade. The order and intensity of the light peaks are recorded as “flowgrams.” The Ion Torrent platform relies on a modified silicon chip to detect hydrogen ions that are released during base incorporation; the resulting lack of reliance on imaging makes this platform the first “post-light” sequencing instrument (Rothberg et al., 2011).

The SOLiD platform (Valouev et al., 2008) and the platform from Complete Genomics (Drmanac et al., 2010) depend on sequencing by ligation. In this approach, fluorescent probes undergo iterative steps of hybridization and ligation to complementary positions in the template strand at the 5′ end of the extending strand, followed by fluorescence imaging to identify the ligated probe.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Single-molecule sequencing Single-molecule sequencing brings the promise of freedom from amplification artefacts as well as from onerous sample and library preparations. The HeliScope Single-Molecule Sequencer (from Helicos BioSciences) was the first platform for single-molecule sequencing to hit the market place in 2009 (Bowers et al., 2009). This technology applies one-colour reversible-terminator sequencing to unamplified single-molecule templates. However, this platform has been hampered by its high price and poor instrument sales and, following the delisting of the company from the stock market, there are significant doubts over the future of the platform.

More recently, Pacific Biosciences has delivered “real-time sequencing,” in which dye-labelled nucleotides are continuously incorporated into a growing DNA strand by a highly processive, strand-displacing ø29-derived DNA polymerase (Eid et al., 2009). Each DNA polymerase molecule is tethered within a zero-mode waveguide detector, which allows continuous imaging of the labelled nucleotides as they enter the strand (Levene et al., 2003).

Choosing a Platform

High-end instruments The high-throughput sequencing market presents the user with a challenging choice between bulky, expensive high-end instruments and the new generation of bench-top instruments (Tables A9-1, A9-2). The high-end machines include PacBioRS (from Pacific Bioseciences), the HiSeq instruments, Genome Analyzer IIx, the SOLiD 5500 series and the 454 GS FLX+ system. These deliver a high throughput and/or long read lengths but come with set-up costs of hundreds of thousands of dollars, placing them beyond the reach of the average research laboratory or even department. These machines are thus only suitable for large sequencing centres or core facilities. This raises the important question of where an ‘average’ microbiologist should source sequencing from.

These instruments can deliver dozens to thousands of bacterial genomes per run, as illustrated by several high-impact publications on bacterial genomes and metagenomes (Harris et al., 2012; Hess et al., 2011; Mutreja et al., 2011; Qin et al., 2010). However, to achieve efficiencies in time and cost, optimum sequencing of microbial samples on such instruments requires onerous and expensive bar-coding and multiplexing of samples and/or subdivision of runs (for example, through gaskets or the use of single channels on the Illumina platform), as well as a sophisticated scheduling system. Compare sequencing a single human genome with the equivalent sequencing throughput for 1,000 average-sized bacterial genomes: although the sequencing run itself may be comparable in both scenarios, >1,000 samples and libraries need to be prepared for the bacterial run, compared with just one for the human genome. The costs and effort involved in sequencing 1,000 bacterial genomes therefore vastly outweigh the requirements for sequencing a single human genome, so the hasty calculation that one human

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

TABLE A9-1 Comparison of Next-Generation Sequencing Platforms

Machine (manufacturer) Chemistry Modal read length* (bases) Run time Gb per run Current approximate cost (US$) Advantages Disadvantages
High-end instruments
454 GS FLX+ (Roche) Pyrosequencing 700–800 23 hours 0.7 500,000

• Long read lengths

• Appreciable hands-on time

• High reagent costs

• High error rate in homopolymers

HiSeq 2000/2500 (Illumina) Reversible terminator 2 × 100 11 days (regular mode) or 2 days (rapid run mode)§ 600 (regular mode) or 120 (rapid run mode)§ 750,000

• Cost-effectiveness

• Steadily improving read lengths

• Massive throughput

• Minimal hands-on time

• Long run time

• Short read lengths

• HiSeq 2500 instrument upgrade not available at time of writing (available end 2012)

5500xl SOLiD (Life Technologies) Ligation 75+35 8 days 150 350,000

• Low error rate

• Massive throughput

• Very short read lengths

• Long run times

PacBio RS (Pacific Biosciences) Real-time sequencing 3,000 (maximum 15,000) 20 minutes 3 per day 750,000

• Simple sample preparation

• Low reagent costs

• Very long read lengths

• High error rate

• Expensive system

• Difficult installation

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.
Machine (manufacturer) Chemistry Modal read length* (bases) Run time Gb per run Current approximate cost (US$) Advantages Disadvantages
Bench-top instruments
454 GS Junior (Roche) Pyrosequencing 500 8 hours 0.035 100,000

• Long read lengths

• Appreciable hands-on time

• High reagent costs

• High error rate in homopolymers

Ion Personal Genome Machine (Life Technologies) Proton detection 100 or 200 3 hours 0.01–0.1 (314 chip), 0.1–0.5 (316 chip) or up to 1 (318 chip) 80,000 (including OneTouch and server)

• Short run times

• Appropriate throughput for microbial applications

• Appreciable hands-on time

• High error rate in homopolymers

Ion Proton (Life Technologies) Proton detection Up to 200 2 hours Up to 10 (Proton I chip) or up to 100 (Proton II chip) 145,000 + 75,000 for compulsory server

• Short run times

• Flexible chip reagents

• Instrument not available at time of writing

MiSeq (Illumina) Reversible terminator 2 × 150 27 hours 1.5 125,000

• Cost-effectiveness

• Short run times

• Appropriate throughput for microbial applications

• Minimal hands-on time

• Read lengths too short for efficient assembly

* Average read length for a fragment-based run

Approximate cost per machine plus additional instrumentation and service contract. See Glenn(2011).

§ Available only on the HiSeq 2500.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

TABLE A9-2 The Applicability of the Major High-Throughput Sequencing Platforms

Example application in bacteriology Desirable characteristics Machine*
454 GS Junior 454 GS FLX+ Ion Personal Genome Machine (318 chip)§ MiSeq|| HiSeq 2000|| 5500xl SOLiD§ PacBio RS
De novo sequencing of novel strains to generate a single-scaffold reference genome

• Long reads

• Paired-end protocol and/or long mate-pair protocol

• Even coverage of genome

✓✓ ✓✓
Rapid characterization of a novel pathogen (draft de novo assembly of a genome for a single strain)

• Total run time (library preparation plus sequencing) of under 48 hours

• Sufficient coverage of a bacterial genome in a single run

✓✓ ✓✓ ✓✓ ✓✓
Rough-draft de novo sequencing of small numbers of strains (<20) for comparative analysis of gene content

• Long or paired-end reads

• High throughput

• Ease of library and sequencing workflow

• Cost-effective

✓✓ ✓✓
Re-sequencing of many similar strains (>50) for the discovery of single nucleotide polymorphisms and for phylogenetics

• Very high throughput

• Low-cost, high-throughput sequence library construction

• High accuracy

Small-scale transcriptomics-by-sequencing experiments (for example, two strains under four growth conditions with two biological replicates, so 16 strains)

• High per-isolate coverage

✓✓ ✓✓ ✓✓
Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.
Example application in bacteriology Desirable characteristics Machine*
454 GS Junior 454 GS FLX+ Ion Personal Genome Machine (318 chip)§ MiSeq|| HiSeq 2000|| 5500xl SOLiD§ PacBio RS
Phylogenetic profiling to genus-level using partial 16S rRNA gene amplicon sequencing

• High coverage

• Long amplicon input (≥500 bp)

• Long reads

• High single-read accuracy (error rate <1%)

✓✓ ✓✓
Whole-genome metagenomics for the reconstruction of multiple genomes in a single sample

• Long reads or paired-end reads

• Very high throughput

• Low error rate


* image particularly well suited; image suitable; image not suitable.

From Roche; § From Life Technologies; || From Illumina; From Pacific Biosciences.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

genome-sequencing project equates to 1,000 bacterial genome-sequencing projects starts to look rather optimistic.

Bench-top instruments Three modestly priced bench-top instruments with throughputs and workflows that are well suited to microbial applications have recently hit the market. The 454 GS Junior was released in early 2010 and is a smaller, lower-throughput version of the 454 GS FLX+ machine, exploiting similar emulsion PCR and pyrosequencing approaches but with lower set-up and running costs (Loman et al., 2012). The Ion PGM was launched in early 2011 and saw almost immediate use in the crowd-sourced analysis of the Shiga toxin-producing Escherichia coli (STEC) outbreak in Germany (Rohde et al., 2011; Mellmann et al., 2011). This platform has also shown the greatest improvement in performance in recent months: an assembly for the STEC outbreak strain was generated in May 2011 using data from five Ion Torrent 314 chips and consisted of more than 3,000 contigs, whereas comparable data from a single newer 316 chip assembled into fewer than 400 contigs. The MiSeq, which began to ship to customers in late 2011, is based on the existing Solexa chemistry but has dramatically reduced run times compared with the HiSeq (hours rather than days). This is made possible by the use of a smaller flow cell, leading to a reduced imaging time and faster microfluidics.

Each of these bench-top instruments is capable of sequencing a whole bacterial genome in days. The performance of all three instruments was recently compared by sequencing a British isolate from the German STEC outbreak of 2011 (Loman et al., 2012). In this evaluation, all three bench-top sequencing platforms generated useful draft genome sequences with assemblies that mapped to ≥95% of the reference genome, so by these criteria all could be judged fit for purpose. However, no instrument was able to generate accurate one-contig-per-replicon assemblies that might equate to a finished genome.

The MiSeq was found to have the highest throughput per run, lowest error rate and most user-friendly workflow of the three instruments: hands-on time is low because template amplification is carried out directly on the instrument without manual intervention. However, a paired-end 150-base sequencing run took more than 27 hours. The MiSeq is notable for being able to sequence fragments from both ends (paired-end mode) without changes to the library preparation stage or additional intervention during sequencing.

The 454 GS Junior produced the longest reads (mean 522 bases) and generated the least fragmented assemblies but had the lowest throughput and a cost-per-base that was at least one order of magnitude higher than the cost for the other two platforms. The Ion PGM delivered the fastest throughput per hour (80–100 Mb) and had the shortest run time (around 3 hours) but also had the shortest reads (mean 121 bases), although kits producing 200 bases have since been made available for this instrument. The Ion PGM and 454 GS Junior were both prone to making mistakes in homopolymeric tracts, and these mistakes caused assembly

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

errors that resulted in frame-shifts in coding regions, even when data were assembled at high read coverage.

Coping with the Data

The high-end sequencing platforms make considerable demands on the local information technology infrastructure in terms of data tracking and analysis, short-term storage and long-term archiving. Bench-top instruments have more modest information technology requirements. However, each platform delivers data in a slightly different format, and saying that one has sequenced a bacterial genome means different things on different platforms and can create difficulties when comparing or combining data generated on different platforms (Table A9-2).

There are two main analytical approaches to the exploitation of high-throughput sequencing data: reads can be aligned—that is, mapped—to a known reference sequence or subjected to de novo assembly. The choice of strategy depends on the read length obtained (short reads are better mapped to a reference), the availability of a good reference sequence and the intended biological application (for example, genomic epidemiology versus pathogen biology).

To document genetic variation in the genomes of multiple highly related strains, a mapping approach is efficient and often sufficient. In this situation, sequence variants can be called by aligning reads to a reference genome using short-read-mapping tools (see Supplementary information S1 (table)). A mapping approach is problematic when dealing with reads from repetitive regions or from parts of the genome that are absent from the reference genome, or when a closely related reference genome is unavailable.

De novo assembly is more informative when dealing with a new pathogen or a new strain of a well-known pathogen. Sequencing errors can have a significant impact on assembly. When platforms produce random errors, the effect of these errors on assembly can be overcome by increasing the depth of coverage. However, when errors are systematic and occur in predictable contexts (for example, in homopolymers), increasing the depth of coverage is unlikely to help, and it may be necessary to sequence the troublesome regions using an alternative technology. Very high-quality, near complete references may be obtained by a hybrid approach, such as in recent studies combining Pacific Biosciences and Illumina data (Bashir et al., 2012; Koren et al., 2012).

A variety of commonly used assemblers is now available (see Supplementary information S1 (table)), ranging from the platform specific (for example, Newbler from Roche) to the more generally applicable (for example, MIRA (Chevreax, et al., 2004), Velvet (Zerbino and Birney, 2008), and the CLC Genomics Workbench from CLC Bio). De novo assemblies can be compared using Mauve (Darling et al., 2004) or Mugsy (Angiuoli and Salzberg, 2011), and the assemblies can be manually examined using the Tablet viewer (Milne et al., 2010). For

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

annotation of assemblies, Glimmer (Delcher et al., 1999) works well for coding-sequence prediction, while tRNAScan-SE (Schattner et al., 2005) and RNAmmer (Lagesen et al., 2007) work well for stable-RNA prediction. There are numerous pipelines for automatic annotation of de novo assemblies, including RAST (Aziz et al., 2008), IMG/ER (Markowitz et al., 2009) and the IGS Annotation Engine (developed by the Institute for Genome Sciences, University of Maryland School of Medicine, USA), although care must be taken when interpreting results from such services, as the public databases used contain annotation errors that are then propagated to newly sequenced genomes (Richardson and Watson, 2012).

For microbial applications, all of the above programs run quickly (in minutes or hours) and are not particularly processor intensive. Some workflows combine a series of programs and provide an accessible interface for microbiologists who are not bioinformatics specialists. For example, xBASE-NG provides a “one-stop shop” for assembly, annotation and comparison of bacterial genome sequences (Chaudhuri et al., 2008). Sophisticated phylogenetic analyses are more demanding and may be beyond the capability of the average research group. One particular issue when constructing bacterial whole-genome phylogenies is the clouding of phylogenetic signal by recombination events and homoplasy (Marttinen et al., 2012). Algorithms such as ClonalFrame (Didelot and Falush, 2007) and ClonalOrigin (Didelot et al., 2010) take multiple whole-genome alignments as input and attempt to identify blocks of recombination. These approaches are computationally very expensive, and there is no “off the shelf” solution to comparing hundreds or thousands of bacterial genomes. There is a growing interest in alignment-free approaches for constructing bacterial phylogenies, as it is thought that these approaches may help address the computational challenges of these analyses (Köser et al., 2012).

A recurring problem with data from high-throughput sequencing is meeting the requirement, as stipulated by journals and funders, that data be lodged in the public domain. Unannotated assembled sequences can be uploaded to conventional sequence databases, such as GenBank, fairly easily. However, submission of annotated sequences can be onerous, slowing down the process of publication even further. Submission of sequence reads to short-read archives may be hampered by slow data transfer rates, and it remains uncertain how sustainable such archives will prove to be in the future. There may come a time when the easiest way to obtain such data will be to re-sequence the sample, rather than upload, archive and retrieve large data sets.

Current Applications and Future Prospects

High-throughput sequencing has already transformed microbiology. Rapid, low-cost genome sequencing has helped make genomic epidemiology a reality, allowing us to track the spread of pathogens through hospitals (Köser et al., 2012; Lewis et al., 2010), communities (Gardy et al., 2011; Mellmann et al., 2011;

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Rohde et al., 2011) and across the globe (Beres et al., 2010; Harris et al., 2011; Mutreha et al., 2011). High-throughput sequencing has already had a huge impact on our understanding of microbial evolution, whether within a single patient over years or decades (for example, Pseudomonas aeruginosa in a patient with cystic fibrosis [Cramer et al., 2011]) or globally across the centuries (for example, influenza virus in the 1918 influenza pandemic [Dunham et al., 2009] or mediaeval Yersinia pestis in the Black Death [Bos et al., 2011]). Genome sequences have even been obtained from single microbial cells (Woyke et al., 2009).

There are many applications beyond mere genome sequencing. High-throughput sequencing has opened up new avenues for sequence-based profiling and metagenomics of complex microbial communities, including those associated with human health and disease (Hess et al., 2011; Qin et al., 2010). Particularly exciting is the promise of culture-independent approaches to pathogen discovery and detection (Lipkin, 2010). In the research laboratory, sequencing is taking over from microarrays as the method of choice for studying gene expression (using RNA sequencing (RNA-seq)) (Passalacqua et al., 2009; Sharma et al., 2010; Sorek and Cossart, 2010), mutant libraries (using Tn-seq and transposon-directed insertion site sequencing (TraDIS)) (Langridge et al., 2009; van Opinjnen et al., 2009) and protein–DNA interactions (using chromatin immunoprecipitation followed by sequencing (ChIP–Seq)) (Grainger et al., 2009).

So, what does the future hold? For current platforms, we can expect to see cheaper, easier library preparation methods and ever-higher sequencing throughputs. However, with the arrival of transformative new technologies (Branton et al., 2008) (Box A9-2), this might be seen as tinkering around the edges. The tipping point has already been reached such that the staff and infrastructure costs of handling and analysing sequence data outweigh the costs of generating that data. If the promise of portable, single-molecule, long-read-length sequencing bears fruit and these technologies show the same steady increase in functionality and cost-effectiveness that we have seen with earlier high-throughput sequencing platforms, we could be just a few years away from user-friendly, “$1-a-pop” bacterial genome sequencing.

As we have argued elsewhere (Pallen and Loman, 2011), high-throughput sequencing may well be poised to make a decisive impact on clinical microbiology, but there are still many difficulties to be overcome—for example, in presenting complex information to clinicians, in agreeing common formats for data sharing, in integrating genomics with clinical informatics and clinical practice, in benchmarking novel technologies and in gaining regulatory approval (from the US FDA and other bodies) for clinical applications of these technologies. One thing is certain: thanks to the expected relentless progress in sequencing technology, microbiology in the next 20 years will look nothing like it does now.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

BOX A9-2
Oxford Nanopore: The Game Changer?

In February 2012, at a conference in the United States, the British company Oxford Nanopore Technologies announced a new, near-market “strand sequencing” technology that exploits protein nanopores embedded in an industrially fabricated polymer membrane. As a DNA strand is fed through a nanopore by a processive enzyme, the trinucleotides in contact with the pore are detected through electrochemistry.

The manufacturers have already claimed that they can sequence the 50 kb phage λ genome on both strands, and they claim that there is no theoretical read length limit. They also claim that sequencing can be paused, the sample recovered and replaced, and sequencing then started again. Plus, there is no need for onerous sample preparations: sequences can be read directly from blood (and probably also bacterial lysates).

Oxford Nanopore Technologies has announced two products, both scheduled to ship in late 2012. The MinION is a disposable US$900 sequencer housed in a USB stick, with 512 nanopores, each capable of running 120–1,000 bases per minute per pore for up to 6 hours. The MinION can generate 150 Mb of sequence per hour, all without fluidics or imaging, and bases are streamed live to a laptop through the USB connection. The GridION is a rack-mountable sequencer with 2,000 nanopores and is capable of generating tens of gigabases over 24 hours. Both machines promise astonishing read lengths at low cost and with minimal sample preparation. However, this technology currently suffers from a high error rate (~4%) that is chiefly due to deletion errors but, according to their February 2012 press conference, the manufacturers are confident that they can fix this.

How will access to a disposable sequencer change the way we do microbiology? With no capital costs or cumbersome set-up and installation, this technology certainly has the power to democratize sequencing even further. Will prices fall enough for it to be worth sequencing one bacterial genome per MinION, or will the long read lengths mean that we can mix samples and then disaggregate the genomes with little effort? If read lengths really can be obtained in the ≥100 kb range, then all the existing problems of short-read assembly in genomics and metagenomics will be rendered obsolete.

Furthermore, we can now take the sequencer to the patient’s bedside or out into the field. Microbial ecologists need no longer depend on molecular bar codes such as the 16S rRNA gene when they can have whole genomes instead, and latter-day John Snows can use disposable sequencing, not just to detect cholera, but also to track its evolution and spread.

Of course, the reality may not match the hype, and we eagerly await the first independent evaluation of this technology. But if the dream comes true, most of the rest of this article will soon be redundant.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


The authors thank the anonymous reviewers for their help and suggestions.

Competing Interests Statement

The authors declare competing financial interests. Mark J. Pallen was a winner of an Ion Personal Genome Machine (PGM) (from Ion Torrent, part of Life Technologies) in the European Ion PGM Grant Programme. Nicholas J. Loman has received expenses to speak at an Ion Torrent meeting organized by Life Technologies and has received honoraria and expenses from Illumina to speak at Illumina meetings. Chrystala Constantinidou, Jacqueline Z. M. Chan, Mihail Halachev, Martin Sergeant, Charles W. Penn and Esther R. Robinson declare no competing financial interests.


Adey, A. et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Res. 11, R119 (2010).

Angiuoli, S. V. & Salzberg, S. L. Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics 27, 334–342 (2011).

Aziz, R. K. et al. The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9, 75 (2008).

Bashir, A. et al. A hybrid approach for the automated finishing of bacterial genomes. Nature Biotech. 1 Jul 2012 (doi: 10.1038/nbt.2288).

Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

Beres, S. B. et al. Molecular complexity of successive bacterial epidemics deconvoluted by comparative pathogenomics. Proc. Natl Acad. Sci. USA 107, 4371–4376 (2010).

Bos, K. I. et al. A draft genome of Yersinia pestis from victims of the Black Death. Nature 478, 506–510 (2011).

Bowers, J. et al. Virtual terminator nucleotides for next-generation DNA sequencing. Nature Methods 6, 593–595 (2009).

Branton, D. et al. The potential and challenges of nanopore sequencing. Nature Biotech. 26, 1146– 1153 (2008).

Caruccio, N. in High-Throughput Next Generation Sequencing: Methods and Applications. Methods in Molecular Biology Vol. 733 (eds Kwon, Y. M. & Ricke, S. C.) 241–255 (Humana Press, 2011).

Chaudhuri, R. R. et al. xBASE2: a comprehensive resource for comparative bacterial genomics. Nucleic Acids Res. 36, D543–D546 (2008).

Chevreux, B. et al. Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 14, 1147–1159 (2004).

Cramer, N. et al. Microevolution of the major common Pseudomonas aeruginosa clones C and PA14 in cystic fibrosis lungs. Environ. Microbiol. 13, 1690–1704 (2011).

Darling, A. C., Mau, B., Blattner, F. R. & Perna, N. T. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 14, 1394–1403 (2004).

Delcher, A. L., Harmon, D., Kasif, S., White, O. & Salzberg, S. L. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 27, 4636–4641 (1999).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Didelot, X. & Falush, D. Inference of bacterial microevolution using multilocus sequence data. Genetics 175, 1251–1266 (2007).

Didelot, X., Lawson, D., Darling, A. & Falush, D. Inference of homologous recombination in bacteria using whole-genome sequences. Genetics 186, 1435–1449 (2010).

Domazet-Lošo, M. & Haubold, B. Alignment-free detection of local similarity among viral and bacterial genomes. Bioinformatics 27, 1466–1472 (2011).

Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010).

Dunham, E. J. et al. Different evolutionary trajectories of European avian-like and classical swine H1N1 influenza A viruses. J. Virol. 83, 5485–5494 (2009).

Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

Fleischmann, R. D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995).

Gardy, J. L. et al. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. N. Engl. J. Med. 364, 730–739 (2011).

Glenn, T. C. A field guide to next generation DNA sequencers. Mol. Ecol. Res. 11, 759–769 (2011).

Grainger, D. et al. Direct methods for studying transcription regulatory proteins and RNA polymerase in bacteria. Curr. Opin. Microbiol. 12, 531–535 (2009).

Harris, S. R. et al. Evolution of MRSA during hospital transmission and intercontinental spread. Science 327, 469–474 (2011).

Harris, S. R. et al. Whole-genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing. Nature Genet. 44, 413–419 (2012).

Hess, M. et al. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331, 463–467 (2011).

Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotech. 1 Jul 2012 (doi: 10.1038/nbt.2280).

Köser, C. U. et al. Rapid whole-genome sequencing for investigation of a neonatal MRSA outbreak. N. Engl. J. Med. 366, 2267–2275 (2012).

Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108 (2007).

Langridge, G. C. et al. Simultaneous assay of every Salmonella typhi gene using one million transposon mutants. Genome Res. 19, 2308–2316 (2009).

Levene, M. J. et al. Zero-mode waveguides for single-molecule analysis at high concentrations. Science 299, 682–686 (2003).

Lewis, T. et al. High-throughput whole-genome sequencing to dissect the epidemiology of Acinetobacter baumannii isolates from a hospital outbreak. J. Hosp. Infect. 75, 37–41 (2010).

Lipkin, W. I. Microbe hunting. Microbiol. Mol. Biol. Rev. 74, 363–377 (2010).

Loman, N. J. et al. Performance comparison of bench-top high-throughput sequencing platforms. Nature Biotech. 30, 434–439 (2012).

Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).

Markowitz, V. M. et al. IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25, 2271–2278 (2009).

Marttinen, P. et al. Detection of recombination events in bacterial genomes from large population samples. Nucleic Acids Res. 40, e6 (2012).

Mellmann, A. et al. Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology. PLoS ONE 6, e22751 (2011).

Metzker, M. L. Emerging technologies in DNA sequencing. Genome Res. 15, 1767–1776 (2005).

Milne, I. et al. Tablet—next generation sequence assembly visualization. Bioinformatics 26, 401–402 (2010).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Mutreja, A. et al. Evidence for several waves of global transmission in the seventh cholera pandemic. Nature 477, 462–465 (2011).

Pallen, M. J. & Loman, N. J. Are diagnostic and public health bacteriology ready to become branches of genomic medicine? Genome Med. 3, 53 (2011).

Passalacqua, K. D. et al. Structure and complexity of a bacterial transcriptome. J. Bacteriol. 191, 3203–3211 (2009).

Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010).

Richardson, E. J. & Watson, M. The automatic annotation of bacterial genomes. Brief. Bioinform. 9 Mar 2012 (doi: 10.1093/bib/bbs007).

Rohde, H. et al. Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. N. Engl. J. Med. 365, 718–724 (2011).

Ronaghi, M., Uhlen, M. & Nyren, P. A sequencing method based on real-time pyrophosphate. Science 281 363–365 (1998).

Rothberg, J. M. et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348–352 (2011).

Schattner, P., Brooks, A. N. & Lowe, T. M. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 33, W686–W689 (2005).

Sharma, C. M. et al. The primary transcriptome of the major human pathogen Helicobacter pylori. Nature 464, 250–255 (2010).

Sorek, R. & Cossart, P. Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nature Rev. Genet. 11, 9–16 (2010).

Valouev, A. et al. A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res. 18, 1051–1063 (2008).

van Opijnen, T., Bodi, K. L. & Camilli, A. Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nature Methods 6, 767–772 (2009).

Venter, J. C. Multiple personal genomes await. Nature 464, 676–677 (2010).

Woyke, T. et al. Assembling the marine metagenome, one cell at a time. PLoS ONE 4, e5299 (2009).

Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.



Ankur Mutreja,35,* Dong Wook Kim,36,37,* Nicholas R. Thomson,35,*
Thomas R. Connor,35Je Hee Lee,36,38Samuel Kariuki,39
Nicholas J. Croucher,35Seon Young Choi,36,38Simon R. Harris,35
Michael Lebens,40Swapan Kumar Niyogi,41Eun Jin Kim,36T. Ramamurthy,41
Jongsik Chun,38James L. N. Wood,42John D. Clemens,36Cecil Czerkinsky,36
G. Balakrish Nair,41Jan Holmgren,40Julian Parkhill,35and Gordon Dougan35

Vibrio cholerae is a globally important pathogen that is endemic in many areas of the world and causes 3–5 million reported cases of cholera every year. Historically, there have been seven acknowledged cholera pandemics; recent outbreaks in Zimbabwe and Haiti are included in the seventh and on-going pandemic (Chin et al., 2011). Only isolates in serogroup O1 (consisting of two biotypes known as “classical” and “El Tor”) and the derivative O139 (Chun et al., 2009; Hochhut and Waldor, 1999) can cause epidemic cholera (Chun et al., 2009). It is believed that the first six cholera pandemics were caused by the classical biotype, but El Tor has subsequently spread globally and replaced the classical biotype in the current pandemic (Chin et al., 2011). Detailed molecular epidemiological mapping of cholera has been compromised by a reliance on sub-genomic regions such as mobile elements to infer relationships, making El Tor isolates associated with the seventh pandemic seem superficially diverse. To understand the underlying phylogeny of the lineage responsible for the current pandemic, we identified high-resolution


34 Reprinted with kind permission from Nature Publishing Group.

35 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.

36 International Vaccine Institute, SNU Research Park, Bongchun 7 dong, Kwanak, Seoul 151-919, Korea.

37 Department of Pharmacy, College of Pharmacy, Hanyang University, Kyeonggi-do 426-791, Korea.

38 Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 151-742, Korea.

39 Centre for Microbiology Research, KEMRI at Kenyatta Hosp Compound, Off Ngong Road, PO Box 43640-00100, Kenya.

40 Department of Microbiology and Immunology and University of Gothenburg Vaccine Research Institute, The Sahlgrenska Academy at the University of Gothenburg, Box 435, 40530 Göteborg, Sweden.

41 National Institute of Cholera and Enteric Diseases, P-33, CIT Scheme XM, Beliaghata, Kolkata 700 010, India.

42 University of Cambridge, Department of Veterinary Medicine, Madingley Road, Cambridge CB3 0ES, UK.

* These authors contributed equally to this work.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

markers (single nucleotide polymorphisms; SNPs) in 154 whole-genome sequences of globally and temporally representative V. cholerae isolates. Using this phylogeny, we show here that the seventh pandemic has spread from the Bay of Bengal in at least three independent but overlapping waves with a common ancestor in the 1950s, and identify several transcontinental transmission events. Additionally, we show how the acquisition of the SXT family of antibiotic resistance elements has shaped pandemic spread, and show that this family was first acquired at least ten years before its discovery in V. cholerae.


Whole-genome analysis is perhaps the ultimate approach to building a robust phylogeny in recently emerged pathogens, through the identification of SNPs and other rare genetic variants (Harris et al., 2010). Therefore, we sequenced the genomes of 136 isolates of V. cholerae, the causative agent of several million cholera cases each year ( These sequences, including 113 isolates from the seventh pandemic, were added to 18 previously published genomes (CDC, 2010; Chin et al., 2011; Chun et al., 2009) to produce a global genomic database from isolates collected in the course of a century. We included representative El Tor isolates collected in the past four decades and compared these to previously reported and novel genome sequences of both classical and non-O1 types (Chin et al., 2011; Chun et al., 2009).

The sequence reads were mapped to the reference sequence of El Tor N16961 (Heidelberg et al., 2000), a seventh-pandemic V. cholerae that was isolated in Bangladesh in 1975 (see footnote to Supplementary) and the resulting consensus tree identified eight distinct phyletic lineages (L1–L8, see Supplementary Fig. 1 and Supplementary Table 1 for strain and lineage information), six of which incorporated O1 clinical isolates. The classical isolates formed a distinct, highly clustered group (L1), distant from the El Tor isolates of the seventh pandemic (L2). It is clear from Supplementary Fig. 1 that the classical and El Tor clades did not originate from a recent common ancestor and instead seem to be independent derivatives with distinct phylogenetic histories, consistent with previous proposals (Chun et al., 2009). Isolates of L4 share a common ancestor with previously reported non-conventional O1 isolates (Chun et al., 2009) (Supplementary Fig. 2), and are likely to have acquired the O1 antigen genes by a recombination event onto a genetically distinct genome backbone. Isolates of L7 also have a distinct backbone, whereas L2, L3 (USA Gulf coast strains), L5, L6 and L8 share a more “El-Tor-like” genome backbone, and the L1 backbone is of the “classical” type.

Genome-wide SNP analysis showed that the 123 El Tor isolates in the L2 cluster (Supplementary Fig. 1) differed from the reference by only 50–250 SNPs. With this large sample size we were able to construct a high-resolution phylogeny that shows unequivocally that the current pandemic is monophyletic and

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

originated from a single source, providing a framework for future epidemiological and phenotypic analysis of V. cholerae, including transmission-tracking and typing.

Predicted recombined regions were identified, and along with genomic islands and mobile genetic elements, these were initially excluded from the phylogenetic analysis of seventh-pandemic isolates, to determine the underlying phylogeny. Notably, analysis of the tree (Figure A10-1; see Supplementary Fig. 3 for a tree with strain names) provides clear evidence of a clonal expansion of the


FIGURE A10-1 A maximum-likelihood phylogenetic tree of the seventh pandemic lineage of V. cholerae based on SNP differences across the whole core genome, excluding probable recombination events. The pre-seventh-pandemic isolate M66 was used as an outgroup to root the tree. Branches are coloured on the basis of the region of isolation of the strains. The branches representing the three major waves are indicated on the far right. The nodes representing the MRCAs of the seventh pandemic, and subsequent waves 2 and 3, are indicated with arrows and labelled with inferred dates. The presence and type of CTX and SXT elements in each strain are shown to the right of the tree. The presence of toxin-linked cryptic (TLC) and repeated sequence 1 (RS1) elements is shown, but their number and position, respectively, are arbitrarily assigned. Cases of sporadic intercontinental transmission are marked A–D. The dates shown are the median estimates for the indicated nodes, taken from the results of the BEAST analysis. The scale is given as the number of substitutions per variable site; asterisks indicate that no data were available.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

lineage, with a strong temporal signature. This is most clearly illustrated by the fact that the most divergent isolates from the N16961 reference are represented by the oldest seventh-pandemic isolate in our collection, A6, collected in 1957, together with the most recent Haitian isolates (CDC, 2010) from late 2010. We performed a linear regression analysis on all the L2 isolates to calculate the rate of SNP accumulation on the basis of the date of isolation and the root-to-tip distance. The shape of the tree and temporal signatures in Fig. A10-1 show a very consistent rate of SNP accumulation, 3.3 SNPs year–1 (R2 = 0.73, Supplementary Fig. 4) in the core genome, emphasizing the tree’s robustness and utility for transmission studies. The only exception to this is V. cholerae A4, a repeatedly passaged laboratory strain that was originally isolated in 1973 (Supplementary Figs 3 and 4). The estimated rate of mutation for our seventh-pandemic V. cholerae collection was 8.3×10–7 SNPs site–1 year–1: between 5 and 2.5 times slower than the rate estimated for recent clonal expansions of some other human-pathogenic bacteria (Croucher et al., 2011; Harris et al., 2010).

The seventh-pandemic tree can be subdivided into three major groups or clades by clustering using Bayesian analysis of population structure (Corander et al., 2003, 2008) (shown as waves 1–3 in Figure A10-1); this clustering is mostly consistent with the cholera toxin (CTX) type of the three clades, which represent independent waves of transmission. Although examples of genetic determinants differentiating these three CTX types have previously been published (Safa et al., 2010), they have not been put into a phylogenetic context, undermining efforts to investigate the evolutionary aspects of their emergence. Perhaps as a result, there has been substantial uncertainty in naming new CTX types as they have been discovered. Our data shows that the first CTX type is canonical CTX El Tor and we propose that it is renamed CTX-1; for the other two we propose a new expandable nomenclature and class them as CTX-2 and CTX-3 (Supplementary Table 2).

Isolates spanning A18 to PRL5 (the lower clade in Figure A10-1) represent wave 1, covering about 16 years (1977–1992). All isolates in this group lack the integrative and conjugative element (ICE) of the SXT/R391 family, encoding resistance to several antibiotics (Garriss et al., 2009; Wozniak et al., 2009). It is within this time period that seventh-pandemic cholera occurred in South America (Heidelberg et al., 2000). Our data show that the South American isolates form a discrete cluster, which also includes a single Angolan isolate collected in 1989. The position of the Angolan isolate at the base of the South American group indicates that transmission to South America may have been via Africa, as previously proposed (Lam et al., 2010). We used BEAST (Drummond et al., 2006) to translate evolutionary distance in SNPs into time (Supplementary Fig. 5) and this indicated that transmission to South America is likely to have occurred between 1981 and 1985. The branch harbouring this West African–South American (WASA) clade is distinguished from all other V. cholerae by the acquisition of novel VSP-2 genes (O’Shea et al., 2004) and a novel genomic island that we

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

have denoted WASA1 (Supplementary Table 3). Notably, the Angolan isolate A5 and all the South American isolates are discriminated by just ten SNPs. Based on the accumulation rate of 3.3 SNPs year-1 (Supplementary Fig. 4), the 3-year time period between the isolation of A5 and the oldest South American isolate included in this study, A32, is consistent with previous studies indicating that cholera spread as a single epidemic (Lam et al., 2010).

The first acquisition of an SXT/R391 ICE lies at the point of transition from the wave-1 cluster to the wave-2 cluster. Using our dated phylogeny (Supplementary Fig. 5) (Drummond et al., 2006), we were able to date this transition and the first acquisition of SXT/R391 ICE to 1978–84, ten years before its discovery in O139 strains, which also fits with the otherwise surprising discovery of SXT in a Vietnamese strain isolated before 1992 (Bani et al., 2007). This date would also correspond to the most recent common ancestor (MRCA) of the O1 and O139 serogroup isolates. Analysis of the diversity of the common regions of SXT/R391 ICEs in our seventh-pandemic collection (Supplementary Fig. 6) shows that they are discriminated by 3,161 SNPs, compared to only 1,757 SNPs used to define the core whole-genome phylogeny in Figure A10-1. This indicates either that there have been several recombination events within these ICEs, or that they have been acquired independently several times on the tree (Garriss et al., 2009). Isolates from wave 2 represent a discrete cluster that shows a complex pattern of accessory elements in the CTX locus (Figure A10-1) and a wide phylogeographical distribution. It is also notable that isolates collected in Vietnam in 1995–2004 and strain A109 are the only wave-2 isolates studied from this time period that lack an SXT/R391 ICE. We examined the genomic locus in these clones that marks the point of insertion of SXT/R391 ICE in all other V. cholerae isolates and found no remnants of this conjugative element, which may have been lost from this lineage (no “scar” in DNA sequence is expected after the precise excision of SXT/R391 ICE).

Ignoring the CTX-related genomic regions, the seventh-pandemic L2 isolates show relatively little evidence of recombination either within or from outside the tree. On the basis of the SNP distribution, 1,930 out of 2,027 SNPs (Supplementary Table 4) are congruent with the tree, leaving 97 homoplasies that could be due to selection or homologous recombination among the L2 isolates. Only 270 SNPs were predicted to be due to homologous recombination from outside the tree. The only two branches in which the SNP distribution indicated considerable recombination were those leading to the WASA cluster (Supplementary Fig. 7) and the O139 serogroup. Aside from the acquisitions of CTX and the SXT/R391 ICEs, we found evidence of gene flux affecting only 155 other genes (Supplementary Figs 8 and 9 and Supplementary Table 3).

Also represented in our collection are two isolates of serogroup O139, which are known to have arisen from a homologous replacement of their O-antigen determinant into an El Tor genomic backbone (Chun et al., 2009; Hochhut and Waldor, 1999; Lam et al., 2010). CTX types that are different from El Tor,

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

classical, CTX-2 and CTX-3 have been reported for the O139 serogroup (Basu et al., 2000; Faruque and Mekalanos, 2003; Faruque et al., 2000; Nair et al., 1994); however, the phylogenetic position of the two strains included in this study shows that O139 was derived from O1 El Tor and therefore represents another distinct but spatially restricted wave from the common source.

We were also able to date the ancestor of the El Tor seventh-pandemic lineage, L2, as having existed in 1827–1936 (Supplementary Fig. 5), which is consistent with the predicted date of origin from the linear regression plot (1910, Supplementary Fig. 4). This also corresponds well with the date of isolation of the first El Tor biotype strain in 1905 (Cvjetanovic and Barua, 1972).

It is apparent from Figure A10-1 that V. cholerae wave 1, which spread globally, was later replaced by the more geographically restricted wave 2 and wave 3, a phenomenon supported by local clinical observations and phage analysis (Safa et al., 2010). This also reflects the fact that V. cholerae epidemics since 2003–2010 have been restricted to Africa and south Asia. Notably, the rates of SNP accumulation calculated independently for wave 1, wave 3 and wave 2 (2.3, 2.6 and 3.5 SNPs year–1 respectively) are consistent with the rate calculated over the whole collection period (Supplementary Fig. 4).

The clonal clustering of L2 isolates, the constant rate of SNP accumulation and the temporal and geographical distribution support the concept that the seventh pandemic has spread by periodic radiation from a single source population located in the Bay of Bengal, followed by local evolution and ultimately local extinction in non-endemic areas. This is evidenced by the disappearance of wave-1 isolates, followed by the independent expansion of waves 2 and 3, both derived from the same original population, occurring within seven years of each other. These two waves are clearly distinguished from the first by the acquisition of SXT/R391 ICEs (Figure A10-1). Plotting the intercontinental spread of each wave onto the world map (Figure A10-2) clearly shows that the V. cholerae seventh pandemic is sourced from a single, restricted geographical location but has spread in overlapping waves. In these ancestral waves, there are at least four recent long-range transmission events (A–D in Figure A10-1), in which isolates clearly share a common ancestor with recent strains at distant locations, indicating that such events are not uncommon. The most recent example of this is the Haitian outbreak, in which strains share a very recent common ancestor with south-Asian strains at the tip of wave 3. The number of SNP differences, even at whole-genome resolution, between the Haitian and the most closely related Indian and Bangladeshi strains is very low. This demonstrates that the Haitian strains must have come from south Asia, at most within the last six years. However, the limited discrimination means that it may prove challenging to make country-specific inferences as to the origins of the Haitian strains on the basis of DNA sequence alone. For such conclusions to be robust, great care must be taken in the selection of samples for analysis.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Despite clear evidence of sporadic long-range transmission events that are likely to be associated with direct human carriage, the overall pattern seen in our data is one of continued local evolution of V. cholerae in the Bay of Bengal, with several independent waves of global transmission resulting in short-term epidemics in non-endemic countries. Although our sample set is substantial, there are clearly areas where geographical coverage is limited. However, the structure of the tree, with deep branches between the major waves, means that increasing the number of strains and the resolution further should only identify further independent waves of transmission. Indeed, we cannot rule out the possibility of an El Tor population persisting or evolving as a new wave of the seventh pandemic; for example, in areas such as China that were not sampled in this study.

One notable factor in the ongoing evolution of pandemic cholera was the acquisition of the SXT/R391-family antibiotic resistance element. The clinical use of the antibiotics tetracycline and furazolidone for cholera treatment started in 1963 and 1968 respectively, about 15 years before our prediction of the first acquisition of an SXT/R391 ICE (1978–1984). Our analysis provides a robust framework for elucidating the evolution of the seventh pandemic further, and for studying the local evolution, particularly in the Bay of Bengal, that has such a key role in the evolution of cholera.


Genomic Library Creation and Multiplex Sequencing

Unique index-tagged libraries for each sample were created, and up to 12 separate libraries were sequenced in each of eight channels in Illumina Genome Analyser GAII cells with 54-base paired-end reads. The index-tag sequence information was used for downstream processing to assign reads to the individual samples (Harris et al., 2010).

Detection of SNPs in the Core Genome

The 54-base paired-end reads were mapped against the N16961 El Tor reference (accession numbers AE003852 and AE003853) and SNPs were identified as described in Croucher et al. (2011). The unmapped reads and the sequences that were not present in all genomes were not considered a part of the core genome, and therefore SNPs from these regions were not included in the analysis. Appropriate SNP cutoffs were chosen to minimize the number of false-positive and false-negative calls; SNPs were filtered to remove those at sites with a SNP quality score lower than 30, and SNPs at sites with heterogeneous mappings were filtered out if the SNP was present in fewer than 75% of reads at that site. From the seventh-pandemic data set, high-density SNP clusters indicating possible recombination were excluded (Croucher et al., 2011). In total, 2,027 SNPs

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

were detected in the core genome of the El Tor lineage. Of these, 270 SNPs were predicted to be due to recombination. Removing these provided a data set characterized by 1,757 SNPs: these were used to produce the final phylogeny.

Comparative Genomics

Raw Illumina data were split to generate paired-end reads, and assembled using a de novo genome-assembly program, Velvet v0.7.03 (Zerbino et al., 2008), to generate a multi-contig draft genome for each of 133 V. cholerae strains (Harris et al., 2010). The overlap parameters were optimized to give the highest N50 value. Because seventh-pandemic V. cholerae strains are closely related in the core, Abacas (Assefa et al., 2009) was used to order the contigs using the N16961 El Tor strain as a reference, followed by annotation transfer from the reference strain to each draft genome (Harris et al., 2010). Using the N16961 sequence as a database to perform a TBLASTX (Altschul et al., 1990) for each draft genome, a genome comparison file was generated that was subsequently used in the Artemis comparison tool (Carver et al., 2008) to compare the genomes manually and search for novel genomic islands.

Phylogenetic Analysis

A phylogeny was drawn for V. cholerae using RAxML v0.7.4 (Stamatakis, 2006) to estimate the trees for all SNPs called from the core genome. The general time-reversible model with gamma correction was used for among-site rate variation for ten initial trees (Harris et al., 2010). USA Gulf coast strains A215 and A325, which have substantially different core genomes from all other strains in our collection, were used as an outgroup to root the global phylogeny (Supplementary Fig. 1), whereas a pre-seventh-pandemic strain, M66 (accession numbers CP001233 and CP001234), and strain A6 (from our collection), were used to root the seventh-pandemic phylogenetic tree (Figure A10-1).

CTX Prophage Analysis

For each strain, the CTX structure and the sequence of rstA, rstR and ctxB was determined as in Lee et al. (2009) and Nguyen et al. (2009).

Linear Regression and Bayesian Analysis

The phylogram for the seventh pandemic was exported to Path-O-Gen v1.3 ( and a linear regression plot for isolation date versus root-to-tip distance was generated. The same plot was also constructed individually for the three waves, but A4, being a laboratory strain, was excluded from the latter analysis.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

The presence of three waves was checked, and their makeup was determined, using a BAPS analysis performed on the SNP alignment containing the unique SNP patterns from the seventh-pandemic isolates. The program was run using the BAPS individual mixture model, and three independent iterations were performed using an upper limit for the number of populations of 20, 21 and 22 to obtain optimal partitioning of the sample. The dates for the acquisition of SXT and the ancestors of the three waves were inferred using the Bayesian Markov chain Monte Carlo framework BEAST (Drummond and Rambaut, 2007). We used the final SNP alignment with recombinant sites removed and fixed the tree topology to the phylogeny produced by RAxML, as described above. We used BEAST to estimate the rates of evolution on the branches of the tree using a relaxed molecular clock (Drummond et al., 2006), which allows rates of evolution to vary amongst the branches of the tree. BEAST produced estimates for the dates of branching events on the tree by sampling dates of divergence between isolates from their joint posterior distribution, in which the sequences are constrained by their known date of isolation. The data were analysed using a coalescent constant population size and a general time-reversible model with gamma correction. The results were produced from three independent chains of 50 million steps each, sampled every 10,000 steps to ensure good mixing. The first 5 million steps of each chain were discarded as a burn-in. The results were combined using Log Combiner, and the maximum clade credibility tree was generated using Tree Annotator, both parts of the BEAST package ( Convergence and the effective sample-size values were checked using Tracer 1.5 (available from ESS values in excess of 200 were obtained for all parameters.


The seventh-pandemic cholera strains were clearly distinguished by three waves and we therefore propose their CTX types to be CTX-1, CTX-2 and CTX-3 under the new nomenclature scheme (see Supplementary Table 2). Our nomenclature system is expandable and would be suitable for naming any new seventh-pandemic V. cholerae strains. With CTX-1 representing canonical El Tor, we followed the rationale: (1) For CTX-1 to CTX-2, because there was a shift of rstREl Tor to rstRClassical, rstAEl Tor to rstAClassical + El Tor and ctxBEl Tor to ctxBClassical, we called it CTX-2; (2) for CTX-1 to CTX-3, because there was a shift of ctxBEl Tor to ctxBClassical, we called it CTX-3; (3) for CTX-3 to CTX-3b, because there was only one SNP mutation in ctxBClassical from CTX-2 and rest was identical, we called it the next variant of CTX-3, which is CTX-3b.

In summary, if there is a shift of any gene from one biotype to another, the new CTX will be called CTX-n: thus the next strains fitting these criteria will be called CTX-4. However, if there is a mutation(s) that does not lead to a shift

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

of the gene to another biotype gene, CTX-1b, CTX-1c or CTX-2b; CTX-2c or CTX-3b; CTX-3c and so on should be followed as appropriate.

Methods Summary

Genomic libraries were created for each sample, followed by multiplex sequencing on an Illumina GAIIx analyser. The 54-base paired-end reads obtained were mapped against N16961 El Tor as a reference and SNPs in the core genome were identified as described in Methods. The SNPs were used to draw a whole coregenome phylogeny as described in Harris et al. (2010). The final SNP alignment was used to perform BEAST (Drummond et al., 2006) analysis and to confirm the output of linear regression analysis. The three cholera waves reported in the seventh-pandemic phylogeny were confirmed using BAPS (Corander et al., 2003, 2008). The raw Illumina data were also assembled de novo (see Methods) so that pairwise genome comparisons could be made. A new and expandable nomenclature system describing the CTX trends seen in the last 40 years was proposed following the rationale described in Methods.

Full methods and any associated references are available in the online version of the paper at


This work was supported by The Wellcome Trust grant 076964. The IVI is supported by the Governments of Korea, Sweden and Kuwait. D.W.K. was partially supported by grant RTI05-01-01 from the Ministry of Knowledge and Economy (MKE), Korea and by R01-2006-000-10255-0 from the Korea Science and Engineering Foundation; and J.L.N.W. was supported by the Alborada Trust and the RAPIDD program of the Science & Technology Directorate, Department of Homeland Security. Thanks to A. Camilli at Tufts University Medical School for providing the corrected N16961 sequence, to B.M. Nguyen at NIHE, Vietnam, and M. Ansaruzzaman at ICDDR, Bangladesh for providing strains, and to M. Fookes at WTSI for training support.

Author Contributions

A.M., D.W.K. and N.R.T. collected the data, analysed it and performed phylogenetic analyses and comparative genomics. J.H.L., S.Y.C., E.J.K. and J.C. analysed the CTX types. S.K., S.K.N. and T.R. were involved in strain collection and serogroup analysis. T.R.C. performed Bayesian analysis; N.J.C. and S.R.H. did the computational coding. J.L.N.W., J.D.C., C.C., G.B.K., J.H., N.R.T., J.P. and G.D. were involved in the study design. A.M., N.R.T., J.P., G.D., J.H., G.B.K., N.J.C., S.R.H., T.R.C., D.W.K. and M.L. contributed to the manuscript writing.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

Assefa, S., Keane, T. M., Otto, T. D., Newbold, C. & Berriman, M. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics 25, 1968–1969 (2009)

Bani, S. et al. Molecular characterization of ICEVchVie0 and its disappearance in Vibrio cholerae O1 strains isolated in 2003 in Vietnam. FEMS Microbiol. Lett. 266, 42–48 (2007)

Basu, A. et al. Vibrio cholerae O139 in Calcutta, 1992–1998: incidence, antibiograms, and genotypes. Emerg. Infect. Dis. 6, 139–147 (2000)

Carver, T. et al. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics 24, 2672–2676 (2008)

CDC. 2010. Update: cholera outbreak—Haiti, 2010. MMWR Morb. Mortal Wkly Rep. 59, 1473–1479 (2010)

Chin, C. S. et al. The origin of the Haitian cholera outbreak strain. N. Engl. J. Med. 364,33–42 (2011)

Chun, J. et al. Comparative genomics reveals mechanism for short-term and long-term clonal transitions in pandemic Vibrio cholerae. Proc. Natl Acad. Sci. USA 106,15442–15447 (2009)

Corander, J., Marttinen, P., Siren, J. & Tang, J. Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinformatics 9, 539 (2008)

Corander, J., Waldmann, P. & Sillanpaa, M. J. Bayesian analysis of genetic differentiation between populations. Genetics 163, 367–374 (2003)

Croucher, N. J. et al. Rapid pneumococcal evolution in response to clinical interventions. Science 331, 430–434 (2011)

Cvjetanovic, B. & Barua, D. The seventh pandemic of cholera. Nature 239, 137–138 (1972)

Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007)

Drummond, A. J., Ho, S. Y., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, e88 (2006)

Faruque, S. M. & Mekalanos, J. J. Pathogenicity islands and phages in Vibrio cholerae evolution. Trends Microbiol. 11, 505–510 (2003)

Faruque, S. M. et al. The O139 serogroup of Vibrio cholerae comprises diverse clones of epidemic and nonepidemic strains derived from multiple V. cholerae O1 or non-O1 progenitors. J. Infect. Dis. 182, 1161–1168 (2000)

Garriss, G., Waldor, M. K. & Burrus, V. Mobile antibiotic resistance encoding elements promote their own diversity. PLoS Genet. 5, e1000775 (2009)

Harris, S. R. et al. Evolution of MRSA during hospital transmission and intercontinental spread. Science 327, 469–474 (2010)

Heidelberg, J. F. et al. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406, 477–483 (2000)

Hochhut, B. & Waldor, M. K. Site-specific integration of the conjugal Vibrio cholerae SXT element into prf. C. Mol. Microbiol. 32, 99–110 (1999)

Lam, C., Octavia, S., Reeves, P., Wang, L. & Lan, R. Evolution of seventh cholera pandemic and origin of 1991 epidemic, Latin America. Emerg. Infect. Dis. 16, 1130–1132 (2010)

Lee, J. H. et al. Classification of hybrid and altered Vibrio cholerae strains by CTX prophage and RS1 element structure. J. Microbiol. 47, 783–788 (2009)

Nair, G. B., Bhattacharya, S. K. & Deb, B. C. Vibrio cholerae O139 Bengal: the eighth pandemic strain of cholera. Indian J. Public Health 38, 33–36 (1994)

Nguyen, B. M. et al. Cholera outbreaks caused by an altered Vibrio cholerae O1 El Tor biotype strain producing classical cholera toxin B in Vietnam in 2007 to 2008. J. Clin. Microbiol. 47, 1568–1571 (2009)

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

O’Shea, Y. A. et al. The Vibrio seventh pandemic island-II is a 26.9 kb genomic island present in Vibrio cholerae El Tor and O139 serogroup isolates that shows homology to a 43.4 kb genomic island in V. vulnificus. Microbiology 150, 4053–4063 (2004)

Safa, A., Nair, G. B. & Kong, R. Y. Evolution of new variants of Vibrio cholerae O1. Trends Microbiol. 18, 46–54 (2010)

Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006)

Wozniak, R. A. et al. Comparative ICE genomics: insights into the evolution of the SXT/R391 family of ICEs. PLoS Genet. 5, e1000786 (2009)

Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008)



Koty H. Sharp44,* and Kim B. Ritchie45


Recent research has explored the possibility that increased sea-surface temperatures and decreasing pH (ocean acidification) contribute to the ongoing decline of coral reef ecosystems. Within corals, a diverse microbiome exerts significant influence on biogeochemical and ecological processes, including food webs, organismal life cycles, and chemical and nutrient cycling. Microbes on coral reefs play a critical role in regulating larval recruitment, bacterial colonization, and pathogen abundance under ambient conditions, ultimately governing the overall resilience of coral reef systems. As a result, microbial processes may be involved in reef ecosystem-level responses to climate change. Developments of new molecular technologies, in addition to multidisciplinary collaborative research on coral reefs, have led to the rapid advancement in our understanding of bacterially mediated reef responses to environmental change. Here we review new discoveries regarding (1) the onset of coral-bacterial associations; (2) the functional roles that bacteria play in healthy corals; and (3) how bacteria influence coral reef response to environmental change, leading to a model describing how reef microbiota direct ecosystem-level response to a changing global climate.


43 Sharp, K. H., and K. B. Ritchie. 2012. Biological Bulletin 174: 319–329. Used with permission from the Marine Biological Laboratory, Woods Hole, MA.

44 Eckerd College, 4200 54th Avenue South, St. Petersburg, Florida 33711.

45 Mote Marine Laboratory, 1600 Ken Thompson Parkway, Sarasota, Florida 34236.

* To whom correspondence should be addressed. E-mail:

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


The health of coral reefs is declining on a global scale and continues to be threatened by overfishing and habitat destruction. Anthropogenically induced global climate change has been identified as a significant threat to these sensitive ecosystems. As temperatures rise, bleaching and diseases are increasing, and excess atmospheric carbon dioxide is greatly altering reef ecosystems by changing seawater chemistry through decreases in pH (Anthony et al., 2011).

In a recent review, Bosch and McFall-Ngai (2011) highlight the significance of viewing animals as “metaorganisms”—multicellular organisms consisting of a macroscopic host and multiple microorganisms that interact synergistically to shape the ecology and evolution of the entire association. In this sense, the term metaorganism can be applied to a broad range of animal-microbe symbioses, ranging from humans to sponges (Bosch and McFall-Ngai, 2011). Coral research within this perspective has revolutionized the way that researchers study corals. In scleractinian (hard) corals, the term “holobiont” (Knowlton and Rohwer, 2003) was adapted to indicate that corals are dynamic, multi-domain assemblages consisting of an animal host, symbiotic dinoflagellates in the genus Symbiodinium, bacteria, archaea, fungi, and viruses (Rohwer et al., 2001, 2002; Stat et al., 2006; Wegley et al., 2007; Thurber et al., 2009). The term metaorganism is especially useful for describing corals and reflecting that corals’ response to environmental change is driven by physiological interactions among the various microorganisms associated with the tissue, skeleton, and mucous layer. Corals harbor Symbiodinium, which provides fixed carbon to the host via photosynthesis, serving as the trophic foundation for coral reef ecosystems. It has been proposed that corals have additionally evolved to exploit specific bacterial metabolic capabilities that, in turn, directly modulate the survival of the coral holobiont in the marine environment (Zilber-Rosenberg and Rosenberg, 2008). An extensive characterization of the diverse microorganisms in corals will guide our understanding of the ecology of corals and coral reef ecosystems in response to a changing global climate.

Coral microbiology is a rapidly growing area of study. Early culture-based studies of coral-associated bacteria provided a foundation from which genomics, metagenomics, and transcriptomics approaches were established in corals, leading to exciting new advances in our current understanding of the diversity and dynamics of coral-associated bacterial communities. Evidence is accumulating that bacteria have an enormous influence on coral health and resilience, particularly with respect to changing reef environments (Azam and Worden, 2004; Rosenberg et al., 2007; Bourne et al., 2009; Ainsworth et al., 2010; Garren and Azam, 2012). The field of marine microbial ecology underwent a revolution in the 1990s, when culture-independent molecular techniques revealed that bacterial diversity from culture-based assessments was largely underestimated (Azam, 1998). Studies of persistent associations between corals and bacteria, both beneficial and pathogenic, were enhanced by new methods and approaches from this revolution. Those techniques were adopted by coral microbiologists, resulting in

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

the discovery that particular components of bacterial communities are specific to some coral host species (Rohwer et al., 2002).

The cost and time associated with characterizing these complex bacterial assemblages initially posed a challenge to scientists attempting to identify patterns of diversity across a large scale. However, the gradually decreasing cost and increasing efficiency of high-throughput methods, including 454 pyrosequencing technology, allowed researchers to perform community 16S rRNA gene profiling and metagenome sequencing in a broad range of coral specimens. Recent applications of 16S pyrosequencing in corals have produced hundreds of thousands of 16S sequences—in contrast to hundreds of sequences from cloning methods. Results from pyrosequencing-based studies provide evidence of the presence of “coral-specific” groups of bacterial ribotypes (Reis et al., 2009; Kvennefors et al., 2010; Sunagawa et al., 2010; Ceh et al., 2011). Experiments investigating the bacterial component of coral surface mucous layers suggest that the composition of bacterial communities in coral mucus is distinct from other surface-associated biofilms and is influenced by the physical and biochemical properties of the mucus (Barott et al., 2011; Sweet et al., 2011b). Although corals maintain specific groups of bacteria, variation among individuals of a coral species may occur according to location (Guppy and Bythell, 2006; Littman et al., 2009; Kvennefors et al., 2010; Ceh et al., 2011).

Bacterial communities are maintained in microhabitats within an individual coral host, spatially structured within chemical micro-niches, or compartments, in the skeleton, tissues, and surface mucous layer of corals (Rohwer et al., 2001, 2002; Daniels et al., 2011; Sweet et al., 2011a). This spatial microheterogeneity is similar to previously described trends in the speciation of the dinoflagellate Symbiodinium in branching acroporid corals (Rowan and Knowlton, 1995). With that in mind, new collection techniques and apparatuses have recently been developed to enable collection from specific compartments of the coral, with minimized contamination by bacteria from other compartments (Sweet et al., 2011a).

Recent research surveying bacterial communities in a large number of marine sponges suggests that bacteria detected in sponges can be classified in three categories (Schmitt et al., 2011): core (groups of bacteria that are shared across many sponges), species-specific (groups of bacteria that are specific to certain sponge hosts), and variable (groups of bacteria that are transiently associated with the host, probably due to passive attachment from seawater). The recent composition analyses of bacterial assemblages in corals indicate that a similar classification scheme can be applied to coral-associated bacteria. An interesting difference between corals and sponges is that while many sponges have been documented to transmit diverse, specific bacterial communities in their gametes or larvae (Schmitt et al., 2007; Sharp et al., 2007), most corals appear to acquire specific bacteria from the seawater each generation (Apprill et al., 2009; Sharp et al., 2010). The mechanisms by which corals selectively and specifically recruit their core and specific bacterial components are largely undescribed, but they

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

likely involve the physical properties and the chemical structure of the mucous layer, which is thought to be unique in specific coral species (Bythell and Wild, 2011). Bacteria that successfully colonize the mucus are, in turn, involved in cycling nutrients and organic compounds in corals and on the reefs, and the resident microbes have the potential to modulate the bacterial community structure in coral mucus and tissue.

Here we review recent advances in the study of the coral metaorganism and specifically address (1) the onset of coral-bacterial associations; (2) the functional roles that bacteria play in healthy corals; and (3) how bacteria influence coral reef response to environmental change. These new discoveries are the basis for a model of how coral-associated and reef-inhabiting microbiota influence ecosystem-level responses to global climate change.

Onset of Coral-Bacterial Associations

The Caribbean coral Porites astreoides has been shown to transmit a bacterial component to its offspring (Sharp et al., 2012). However, this seems to be an exception to the rule in scleractinian corals. In eight other coral species that have been examined (Apprill et al., 2009; Sharp et al., 2010), corals do not appear to inherit bacteria from parents; rather, bacterial colonization occurs in planula larvae or post-settlement stages. Many bacterial phylotypes detected in planulae and post-settlement stages of P. astreoides have also been documented in the adult (Wegley et al., 2007), suggesting that corals acquire specific bacterial phylotypes.

Exploration of bacterial communities in early life stages of corals has not only provided new information about bacterial infection in corals, but it has also simplified analysis of diversity and dynamics of bacterial communities in corals across spatiotemporal scales. In contrast to their adult counterparts, swimming planula larvae of most corals have not yet accumulated a high bacterial load from the surrounding environment or by feeding (Apprill et al., 2009; Sharp et al., 2010); as a result, it is more tractable to characterize and quantify the associated bacterial component in these larvae. Similar phylogenetic clades of bacteria were detected in 16S rRNA gene sequence clone libraries from multiple larval specimens of the Caribbean coral Porites astreoides (Sharp et al., 2012) and in the Pacific coral Pocillopora meandrina (Apprill et al., 2009), suggesting that some groups of bacteria are common across different coral species. A number of bacterial types have been commonly detected in multiple species of corals, but of particular interest are those belonging to the phylum α-proteobacteria (Apprill et al., 2009; Raina et al., 2009; Sharp et al., 2012). The α-proteobacteria (particularly the Roseobacteriales) are abundant in the oceans, often constituting a third of the bacterioplankton (Wagner-Dobler and Biebl, 2006). This same group of bacteria is also closely associated with phytoplankton, including the dinoflagellate coral endosymbiont Symbiodinium (Webster et al., 2004). Many of these bacteria, now classified as Ruegeria spp., were originally designated Silicibacter

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

spp. (Yi et al., 2007). It is unknown whether these bacteria play a functional role in corals, but their consistent detection in early life stages of corals and in seawater during coral spawning may be an indication that they are significant to the health of larvae, or even to adult colonies (Apprill et al., 2009; Apprill and Rappe, 2011; Sharp et al., 2012).

New research focusing on the molecular basis of bacterial colonization of the coral tissues or surface mucous layer indicates that coral mucous biofilm communities are a result of selection processes driven by the coral holobiont rather than by incidental attachment by bacteria in the seawater (Sweet et al., 2011b). This is consistent with recent findings from studies in the cnidarians Hydra, in which researchers found that the composition of the surface-associated bacterial community is driven directly by host metabolism and production of compounds in the surface layer of Hydra (Augustin et al., 2010). It is likely that there are specific molecules that influence colonization in the coral mucous layer. Lectin-mediated uptake of Symbiodinium has been demonstrated in corals (Wood-Charlson et al., 2006), but very little is known about bacterial uptake or invasion in corals.

Functional immunological molecules with bacterial binding capacity have been found in corals, describing a means by which the host may control associated microbial composition (Kvennefors et al., 2008; Kvennefors and Roff, 2009). Molecules that control the activities of other coral-associated microbes are thought to be derived from the coral host and in some cases from the associated bacteria (Ritchie, 2006; Teplitski and Ritchie, 2009; Vidal-Dupiol et al., 2011a,b). As previously described in a broad range of other animal-microbe systems (McFall-Ngai et al., 2012), molecules that direct bacterial infection of animal tissue-associated bacteria may be conserved, regardless of whether the bacteria are beneficial, commensal, or pathogenic.

Role of Bacteria in Health of Coral and Coral Reefs

Recent coral microbiology research has described how bacterial communities contribute to the overall physiology and ecology of apparently healthy corals. These discoveries were made possible both by new molecular technologies and by novel fieldwork-based approaches. Bacteria within corals govern the biogeochemical cycling within coral tissues. In addition, bacteria on surfaces in the reef environment influence and facilitate settlement of coral larval, and resident microbes in corals play a role in defining the composition of the bacterial community in corals.

Studies over the past several years indicate that coral-associated bacteria influence biogeochemical cycling within corals and on reefs. Metagenomic data from the bacterial fraction of DNA from the coral Porites astreoides indicate the presence of numerous genes capable of degrading diverse aromatic compounds (Wegley et al., 2007). Coral-associated bacteria have been shown to be involved in cycling mucous-derived particulate and dissolved organic compounds in the

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

reef environment (Wild et al., 2004, 2009; Huettel et al., 2006). In addition, the bacterial metagenome of P. astreoides consists of genes encoding enzymes involved in cycling nitrogen via nitrogen fixation, ammonification, nitrification, and denitrification (Wegley et al., 2007). The detection of bacterial nitrogen fixation genes is consistent with previous biochemical research in which cyanobacterial nitrogen fixation was detected (Lesser et al., 2007). Further research focusing on nifH gene diversity in two species of Montipora (Olson et al., 2009) suggests that nitrogen-fixing bacteria in corals are not limited to cyanobacteria but also belong to taxa representing the α-, β-, γ-, and δ-proteobacterial classes (Olson et al., 2009). Bacteria have been shown to be significant players in transforming nitrogen (Fiore et al., 2010) as well as sulfur and carbon compounds (Ferrier-Pages et al., 2001; Raina et al., 2009; Kimes et al., 2010) in corals and on coral reefs.

Bacteria outside of the coral animal also exert influence on the behavior of corals during their early life stages. Particular species of crustose coralline algae (CCAs) have been shown to facilitate larval settlement of the threatened coral species Acropora cervicornis and A. palmata in the Florida Keys and the Caribbean (Ritson-Williams et al., 2010). The integration of microbiological and chemical ecology approaches suggests that the facilitation of larval settlement by CCAs may be regulated by bacteria growing in biofilms on the surface of CCAs (Negri et al., 2001; Webster et al., 2004; Tebben et al., 2011). To date, all of the CCA-associated bacteria implicated in inducing coral metamorphosis and settlement belong to the γ-proteobacteria. A strain of the γ-proteobacterium Pseudoalteromonas sp. isolated from the surface of the CCA species Hydrolithon onkodes induces significant levels of larval metamorphosis in the corals Acropora willisae and A. millepora in laboratory experiments (Negri et al., 2001). Researchers have recently shown that exposure to Pseudoalteromonas isolates cultured from Negoniolithon fosliei and Hydrolithon onkodes significantly increases rates of metamorphosis on the Pacific coral Acropora millepora (Tebben et al., 2011). Bioassay-guided isolation identified the inductive molecule as tetrabromopyrrole (Tebben et al., 2011). Other strains of Pseudoalteromonas and Thalassomonas have also been shown to induce larval settlement and metamorphosis in the coral Pocillopora damicornis (Tran and Hadfield, 2011). Not all tested isolates of Pseudoalteromonas and Thalassomonas were inductive in that study, indicating that the ability to induce settlement is taxon-specific. In addition, the isolation source of the bacteria (algal surface vs. coral surface) was not linked to the strains’ inductive properties (Tran and Hadfield, 2011). Together, these studies indicate that coral recruitment and successful larval attachment and metamorphosis (which is crucial for continued repopulation of coral reef ecosystems) is strongly governed by the activity of specific bacteria in reef environments.

Recent research has focused on the role of bacteria native to the coral surface mucous layer that control bacterial colonization within the mucus, ultimately regulating resistance to disease. Corals have been shown to protect themselves

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

against pathogen infection via the presence of allelopathic properties in the mucus (Geffen and Rosenberg, 2005; Ritchie, 2006) or the coral tissue (Koh, 1997; Kelman et al., 2006; Gochfeld and Aeby, 2008). However, antimicrobial assays with numerous Red Sea corals reveal that the capabilities of coral species for antibiotic production are highly variable (Kelman et al., 2006). Bacteria isolated from corals are able to inhibit the colonization and growth of many other types of bacteria, including potentially invasive coral pathogens (Reshef et al., 2006; Ritchie, 2006; Wegley et al., 2007; Gochfeld and Aeby, 2008; Nissimov et al., 2009; Shnit-Orland and Kushmaro, 2009; Sharon and Rosenberg, 2010; Kvennefors et al., 2012). In addition, the presence of a high number of genes involved in antibacterial compound biosynthesis have been detected in metagenomes from multiple corals (Wegley et al., 2007; Thurber et al., 2009). It is not clear to what extent these bacteria and the metabolites they produce play a role in community structure. In situ antibiotic production by bacteria is known to be a means of securing a niche by controlling microbial populations competing for the same resources (Nielsen et al., 2000; Rao et al., 2005). It is therefore likely that bacteria in and on the coral host govern the dynamics of coral microbiota.

Although the mechanisms by which mucous-associated bacteria prevent pathogenic infection are still unknown, the data indicate that a sophisticated system of bacterial cell-cell chemical signaling known as quorum sensing (QS) may be involved in microbial pathogenesis in corals. QS is modulated by small diffusible compounds called autoinducers, which are molecules that, when accumulated to a threshold concentration within a diffusion-limited environment, result in synchronized group behaviors. This density-dependent regulation allows bacterial populations to act in unison, effectively magnifying their ecological impact. Though the cell-cell communication systems differ among bacterial species, QS has been demonstrated to regulate many bacterial behaviors, including biofilm formation, antibiotic production, bioluminescence, and pathogenesis (Ng and Bassler, 2009), and it commonly drives important interactions between bacterial communities and their hosts (Rasmussen and Givskov, 2006; Dobretsov et al., 2009).

Quorum sensing in bacterial pathogens is the mechanism by which virulence genes are expressed relative to pathogen density in the host, thereby initiating a coordinated attack once bacterial cell numbers reach a critical mass (Dobretsovet al., 2009). Both eukaryotes and prokaryotes have evolved to recognize and counter QS in pathogens, and there is evidence that eukaryotic signal-mimics can stimulate QS responses in bacteria (Teplitski et al., 2011). Other bacteria can counter-attack by producing quorum-quenching acylases or lactonases that break down signaling molecules (Teplitski et al., 2011). In addition to the signal-degrading enzymes, eukaryotes can inhibit or activate bacterial QS by producing compounds that mimic QS signals. For example, Rajamani et al. (2008) demonstrated that lumichrome, a derivative of the vitamin riboflavin that

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

is produced by the unicellular alga Chlamydomonas reinhardtii (as well as other prokaryotes and eukaryotes) can interact with the bacterial receptor for QS signals and elicit QS responses.

Quorum sensing may inhibit or activate pathogenesis, antibiotic production, exoenzyme production, and attachment by beneficial bacteria within coral tissues and on surfaces. Coral extracts contain compounds capable of interfering with QS activities (Skindersoe et al., 2008; Alagely et al., 2011) that may be involved in regulating the colonization of coral mucus by pathogens, commensal bacteria, or beneficial bacteria. The source of this activity is difficult to pinpoint and could originate from the coral, the dominant endosymbiont, or any associated bacteria. Alagely et al. (2011) recently showed that both coral- and Symbiodinium-associated bacteria alter swarming and biofilm formation in the coral pathogen Serratia marcescens. These phenotypes are typically controlled by QS, although inhibition of QS by these isolates remains to be demonstrated. There are few studies on the in situ roles of QS in corals, but this process is likely to be used in both pathogenesis and mutualistic interactions (Krediet et al., 2009a,b; Teplitski and Ritchie, 2009; Tait et al., 2010). While it is clear that at least some coral-associated commensals and pathogens produce QS signals under laboratory conditions (Tait et al., 2010; Alagely et al., 2011), it is not clear whether these signals accumulate to threshold concentrations in natural environments.

It is feasible that Symbiodinium spp. also produce signaling molecules that control bacterial cell-cell communication, which would influence the specific complement of bacteria that associate with corals. Perhaps bacterial species-specificity in corals is, in part, driven by Symbiodinium within the coral, but this has yet to be tested. The potential for Symbiodinium to be a source of antibacterial compounds in corals represents an aspect of bioactive compound production that is not yet described. It is likely that the source of antibacterial activity in corals is a combination of allelopathic chemicals produced by the coral, by associated bacteria, or by endosymbiotic dinoflagellates. In a study conducted by Marquis et al. (2005), eggs from 11 coral species were tested for antibacterial activity, and the only species exhibiting antibiotic activity was the one coral species in the study that incorporates Symbiodinium into the egg before the egg is released, suggesting a potential allelopathic contribution of Symbiodinium. It is also possible that coral-associated bioactive compounds are derived from bacteria whose presence or activity is influenced by Symbiodinium, but this has yet to be tested.

Role of Bacteria in Reef Ecosystem Responses to Environmental Change

The latest research on how coral-associated bacterial communities mediate responses of corals and coral reef ecosystems to environmental change addresses shifts in both the phylogenetic structure and metabolic capabilities of bacterial assemblages in corals. Multiple approaches and tools from microbiology, molecular biology, microscopy, and chemical ecology have been used to identify the role

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

of bacterial communities in response to threats such as increased sea-surface temperature, increased organic carbon and nutrient levels in seawater, increased macroalgal and cyanobacterial cover on reefs, and decreased seawater pH.

Rising sea-surface temperatures are linked to increases in coral diseases worldwide. However, the study of microbial coral diseases has been challenging due to many factors including microbial dynamics in the marine environment, the complications of proving unequivocal disease causation, and insufficient diagnostic tools (Pollock et al., 2011; Weil and Rogers, 2011). Some bacteria identified as coral pathogens include Serratia marcescens (Sutherland et al., 2011), Aurantimonas coralicida (Denner et al., 2003), and a consortium of bacterial and cyanobacteria phylotypes that make up what is known as Black Band Disease (Sekar et al., 2008). The most common bacteria present and problematic for corals are members of the Vibrionaceae that have been implicated in coral bleaching (Kushmaro et al., 1997; Ben-Haim and Rosenberg, 2002) and a myriad of coral diseases (Patterson et al., 2002; Frias-Lopez et al., 2003, 2004; Kline et al., 2006; Cervino et al., 2008). The Vibrionaceae are a common but diverse group of heterotrophic marine bacteria, collectively referred to as vibrios. Vibrios have been shown to be present in higher abundance on coral surfaces before obvious signs of distress (Ritchie, 2006; Mao-Jones et al., 2010). This group includes human pathogens and benign planktonic and animal-associated marine bacteria. Bleaching of the scleractinian coral Oculina patagonica in the eastern Mediterranean Sea was shown to be caused by Vibrio shiloi (Kushmaro et al., 1997). Vibrio coralliilyticus was isolated from bleached corals of the genus Pocillopora damicornis and shown to cause coral bleaching and tissue sloughing (Ben-Haim and Rosenberg, 2002). In these pathogens, toxin production and the ability to infect coral tissue have a strong temperature dependence (Kushmaro et al., 1997; Ben-Haim and Rosenberg, 2002). Vibrio dynamics are affected by water temperature and salinity, yet little else is known about environmental drivers of their abundance and distribution in the marine environment (Johnson et al., 2010). These organisms are often cultured rapidly and are able to utilize a wide range of carbon sources, suggesting that the biogeochemical significance of vibrios may vary with the nutrient state of the environment (Thompson et al., 2004). Some reef organisms are thought to be vectors for coral disease agents, specifically vibrios. These include organisms that come into contact with, or feed on, corals such as fireworms, snails, and corallivorous fishes (Weil and Rogers, 2011). Several recent reviews offer a comprehensive summary of the occurrence and possible environmental determinants of coral diseases (Rosenberg et al., 2009; Pollock et al., 2011; Weil and Rogers, 2011). Research on processes governing pathogen dynamics, abundance, and pathogenesis has informed us on coral defense mechanisms.

The coral surface mucous layer and its resident microbes appear to be significant in defending corals from microbial diseases. Mucus harvested from the coral Acropora palmata during a period of increased seawater temperatures does

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

not exhibit significant antibiotic activity compared to mucus sampled at lower temperatures (Ritchie, 2006). This suggests that the protective capacity of some corals may be lost when temperatures increase, providing a mechanism to explain how increased temperatures lower coral resistance and increase susceptibility to diseases. In addition, when temperatures increase, the dominant bacterial flora in coral mucus shifts from antibiotic-producing bacteria to pathogens (Ritchie, 2006). This finding indicates that a balance of potentially beneficial microbes may be important for the overall physiological health of reef corals. Rising sea-surface temperatures can cause a breakdown of coral-Symbiodinium symbiosis. In addition, shifting seawater temperatures can simultaneously affect interactions among other microbes, particularly bacteria present in or on the coral, rendering the host susceptible to opportunistic or secondary infection by certain bacteria (Ritchie, 2006; Lesser et al., 2007). Research on the Pacific coral Acropora millepora indicates that after bleaching (the loss of Symbiodinium) there is a dramatic shift to a Vibrio-dominated community (Bourne et al., 2007), but it is unclear whether the bacterial communities are responding to the absence of the Symbiodinium, to physiological changes in the coral host, or to the increased light and sea-surface temperature. Following bleaching-induced coral mortality, nitrogen-fixing bacteria increase in abundance on coral skeletons (Holmes and Johnstone, 2010). The resulting increase in available nitrogen in the seawater has the potential to affect the growth of macroalgae and other nitrogen-limited primary producers, including benthic cyanobacteria (Holmes and Johnstone, 2010). Taken together, these results demonstrate that temperature stress and coral bleaching have the potential to alter the composition and metabolism of coral-associated bacterial assemblages, with significant impacts on the health of corals and coral reef communities.

As a result of heightened fishing pressure, decline in herbivore populations, and increased nutrient levels, reefs are undergoing a “phase shift” from coral-dominated ecosystems to algal-dominated ecosystems (Pandolfi et al., 2003). Overgrowth by turf macroalgae and benthic cyanobacteria has been documented on adult coral colonies on reefs (Ritson-Williams et al., 2005). Concern is growing for how this shift in ecosystems affects bacterial communities within coral reefs (Dinsdale et al., 2008). Recent research demonstrates that allelochemicals from macroalgae and benthic cyanobacteria have the potential to mediate shifts in abundance and community composition of microbiota associated with adult corals (Morrow et al., 2011). When tested against a library of strains isolated from algal surfaces, from mucus of the Caribbean corals Montastraea faveolata and Porites astreoides in direct contact with algal surfaces, and from the mucus without direct contact of algae, chemical extracts from six species of macroalgae and two species of benthic cyanobacteria stimulated the growth of some strains but inhibited the growth of other strains (Morrow et al., 2011). While some of the algal extracts had broad-spectrum activity against the collection of test isolates from phylogenetically diverse environmental bacteria, other extracts specifically

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

increased the growth rates of the bacterial genus Vibrio (Morrow et al., 2011). Many of the active compounds in the study were hydrophilic, indicating that the bioactive compounds from algae or cyanobacteria may be readily solubilized and transported throughout seawater, providing a potential mechanism for algae to regulate microbial activity without direct contact, especially in low-flow benthic systems (Morrow et al., 2011). Allelopathic interactions among algae and corals have been shown to have detrimental effects on coral larval behavior, recruitment, and survival (Kuffner and Paul, 2004; Kuffner et al., 2006; Ritson-Williams et al., 2009). It is unknown how the bioactive compounds influence health of the early life stages, but it is feasible that the observed effects are linked to shifting bacterial communities associated with the coral planulae and recruits.

Smith et al. (2006) explored the effects of macroalgae on bacterial growth in the coral surface mucopolysaccharide layer. The results of that research, together with prior work on controlled exposure of coral fragments to seawater with increased dissolved organic carbon (DOC) levels (Kline et al., 2006), suggest that an excess of DOC, exuded from macroalgae, leads to coral mortality (Smith et al., 2006). In addition, Barott et al. (2011) found that the community composition of bacteria on surfaces of multiple reef macroalgal species is distinct from those found on coral surface mucous layers.

On the basis of these studies, it is clear that macroalgae have the potential to act as reservoirs of specific bacteria (beneficial, commensal, or pathogenic) not usually native to the coral mucous layer. Macroalgae also release compounds into the surrounding seawater that can have direct inhibitory or stimulatory effects on the coral-associated microbiota and, hence, on the health of the coral host.

Ocean acidification is a major concern for marine ecosystems in general—particularly those dependent on calcifying organisms, as secretion of calcium carbonate skeletons depends directly on carbonate saturation state in seawater (Caldeira et al., 2007). Recent research suggests that a decrease in seawater pH can alter marine bacterial communities, but very little is known about the large-scale impacts of those changes (Joint et al., 2011). Laboratory manipulations of seawater pH have shown that acidification can result in loss of Symbiodinium endosymbionts, decrease in calcification, depression of overall net productivity in corals (Anthony et al., 2008), and dissolution or slowed deposition of coral skeletons (Fine and Tchernov, 2007). In addition, decreased seawater pH levels have been attributed to a decline in overall abundance of crustose coralline algae (Kuffner et al., 2008), some of which have been shown to facilitate coral recruitment in reefs (Ritson-Williams et al., 2010). Experiments demonstrate that lower PCO2 levels in seawater result in significant detrimental effects on early life stages of the coral Porites astreoides, including fertilization success, larval settlement rates, post-settlement growth, and post-settlement skeleton deposition (Albright et al., 2008, 2010).

Several laboratory-based studies have focused specifically on the impacts of ocean acidification on coral microbiota. Meron et al. (2011) explored shifts in

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

microbial assemblages associated with the coral Acropora eurystoma exposed to ambient seawater and seawater with pH 7.3 over a period of 2 mon using denaturing gradient gel electrophoresis profiles and 16S rRNA gene clone libraries. According to the resulting cluster analysis, a decrease in pH results in an increase in detection of Rhodobacteraceae and a decrease in detection of Bacteroidetes and Deltaproteobacteria (Meron et al., 2011). Relative to libraries from corals exposed to ambient seawater, clone libraries from A. eurystoma exposed to pH 7.3 conditions exhibited a higher percentage of clones representing bacteria closely related to those detected in stressed, injured, or diseased invertebrates (Meron et al., 2011). In another study with the Pacific coral Porites compressa, individuals exposed to an extremely low pH (6.7) exhibited shifts in bacterial community diversity (Thurber et al., 2009). Though the mechanism by which this occurs is not yet clear, it has been suggested that the altered seawater pH indirectly causes a shift in the bacterial diversity by impacting host metabolism, which results in a shift of nutrients and carbon available to the associated microbiota (Meron et al., 2011).

Metagenomic analysis of P. compressa mucus revealed potential functional shifts in the associated microbiota as a result of decreased pH and increased temperature (Thurber et al., 2009), most notably an increase in the number of detected genes for antibiotic and toxin production. Mucus from corals exposed to a decreased pH exhibits low antimicrobial activity (Meron et al., 2011), and mucus of Acropora palmata exhibits lower antibacterial activity after prolonged warm periods (Ritchie, 2006). Together, these results warn that even slight changes in seawater pH and temperature can have ecologically significant effects on coral-associated microbiota and, hence, on coral’s susceptibility to bacterial pathogens. The shift in the coral microbiome phylogenetic profile has been proposed as a potential indicator for declining coral health before the corals exhibit more obvious signs of stress or disease (Thurber et al., 2009; Ainsworth et al., 2010; Garren and Azam, 2012).

A Model for Climate-Change-Induced Shifts in the Coral Metaorganism

The research reviewed here suggests that alterations in sea surface temperature, algal and cyanobacterial abundance on reefs, and seawater pH can have detrimental effects on corals by decreasing protective qualities of the coral mucous layer, via inhibition of growth or compound production in beneficial bacteria or by alteration of host-associated compound biosynthesis. Another aspect of coral-bacterial interactions that has garnered much attention is the ability of bacteria on reef substrates to influence successful larval recruitment. These surfaces include crustose coralline algae (CCAs), which are coated with microbial biofilms and are thought to be involved in mediating coral larval settlement (Webster et al., 2001, 2011; Ritson-Williams et al., 2009, 2010; Tebben et al., 2011).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Figure A11-1 represents the current model of corals and their interdependence on associated microbes. Both coral tissue and coral mucus contain abundant and diverse microbial communities (Figure A11-1a). When sea-surface temperatures increase, antibacterial compounds in the coral mucus disappear. Simultaneously, antibacterial-producing bacteria normally associated with healthy corals decrease while bacteria with pathogenic capabilities increase (Figure A11-1b).


FIGURE A11-1 Schematic of coral surfaces and associated microbes. (a) Under normal conditions, the coral animal, associated endosymbiotic algae, or native bacteria may produce allelopathic compounds that regulate the abundance and activities of other microbes that come into contact with the coral. (b) Under conditions of coral stress (such as increased temperature or decreased pH), production of allelopathic compounds may be altered within the coral holobiont, either by affecting production by the coral host or by the associated microbes. Simultaneously, native beneficial bacteria are replaced by pathogenic bacteria on the coral surfaces. (c) Crustose coralline algae (CCA) and biofilm microbial communities facilitate attachment and settlement of coral larvae via inductive compounds (settlement cues) produced by the CCA or by recruiting specific bacteria that release these cues. (d) Certain types of environmental change (decreased pH, lower PCO2, increased temperature) may alter the abundance of the inductive bacteria or their production of settlement cue compounds, resulting in reduction of attachment, metamorphosis, and successful settlement of coral larvae.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Mathematical modeling of this system suggests that once this shift to pathogen dominance is established, this state persists long after conditions return to those favorable for the reestablishment of beneficial microbes (Mao-Jones et al., 2010). Recent data from coral mucus bacterial metagenomes exposed to decreased pH (Thurber et al., 2009; Meron et al., 2011) indicate that ocean acidification may also result in a similar shift in the protective properties of coral mucus.

On the basis of this model and the data reviewed in this paper, we present a second model of coral-bacterial interactions in which environmental changes lead to shifts in bacterial communities on reef surfaces (Figure A11-1c and d). It has been shown that increased temperatures change the phylogenetic composition of CCA-associated bacterial communities and the success of larval recruitment (Webster et al., 2011). In addition, it was recently shown that decreased pH inhibits settlement of the coral Porites astreoides (Albright et al., 2008, 2010). Temperature may affect the growth, abundance, or bioactive metabolite biosynthesis of beneficial bacteria, particularly Pseudoalteromonas spp., on reef surfaces that are important for successful recruitment, which can ultimately result in a decline of new recruitment on reefs. Though the effects of decreased pH on surface biofilms have not been well described, this condition may alter the bacterial biofilm community and influence larval settlement success. Figure A11-1c and d shows a schematic model of reef surface-associated microbes before (c) and after (d) increased sea-surface temperature or ocean acidification. In ambient conditions on the reef, CCAs, or bacteria growing on CCA surfaces, produce compounds that facilitate larval settlement (Figure A11-1c). When sea-surface temperatures increase, bacterial communities on CCAs change, resulting in lower larval recruitment rates (Figure A11-1d). Similarly, as pH decreases, larval settlement decreases (Albright et al., 2008, 2010). It is hypothesized that the inductive properties of CCAs, whether they are due to compounds released by bacterial biofilms on CCAs or by the CCAs themselves, decrease (Figure A11-1d). As in the coral mucus (Figure A11-1a and b), there is a shift in the bacterial community of the reef surfaces. In this case, under increased sea-surface temperatures, the bacterial community dominated by inductive bacteria, such as Pseudoalteromonas and Thalassomonas, moves to a community dominated by bacteria that may not have inductive properties.

Next Questions: Microbe-Microbe Interactions in Corals

One of the next steps in increasing our understanding of coral fitness is a comprehensive characterization of coral-associated microbial interactions. For example, it is unclear if Symbiodinium plays a role in selectively recruiting bacteria to corals, if Symbiodinium affects bacterial physiology or secondary metabolite biosynthesis, or if bacterial metabolism influences Symbiodinium activity.

Little is known about the nature of free-living Symbiodinium, including what bacterial mutualisms may be present before coral acquisition of Symbiodinium,

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

in the case that the algal symbiont is not transmitted vertically. Members of the Roseobacteriales group are specifically present in association with Symbiodinium cultures and are able to increase Symbiodinium growth rates in vivo (Ritchie, 2011). This observed association between α-proteobacteria and dinoflagellates may be a true mutualism with benefits for both the bacteria and the algal host. The bacteria may benefit by having a readily available source of organic compounds such as dimethylsulfoniopropionate (DMSP), a preferred source of reduced sulfur (Miller and Belas, 2004; Raina et al., 2010). The algae may derive benefits from the bacterial production of antimicrobials such as tropodithietic acid (Geng and Belas, 2010) and bioactive compounds such as vitamin B-12 (Geng and Belas, 2010). A genomic comparison of the Roseobacter clade of α-proteobacteria indicates that some type of surface-associated lifestyle is central to the ecology of all members of the group (Slightom and Buchan, 2009).

Very little is known about how Symbiodinium affects bacterial communities in corals (or vice versa) or how these interactions impact the fitness of the coral host. Recent studies suggest that bacterial communities in juvenile corals differ significantly if they were initially colonized by different strains of Symbiodinium (Littman et al., 2009) with different photosynthetic efficiencies (Littman et al., 2010). It has been hypothesized that DMSP production by Symbiodinium plays a role in structuring bacterial communities in corals by attracting certain bacteria to the surface mucous layer of corals (Raina et al., 2009, 2010).

An important adaptive property of many α-proteobacteria is the presence of a bacterial system for diversity generation facilitated by gene transfer agents (GTAs) (Paul, 2008). GTAs are defective bacteriophages that are able to randomly package bacterial host DNA and transfer DNA to other α-proteobacteria (Paul, 2008). It has recently been shown that Symbiodinium-associated α-proteobacteria produce GTAs and are able to transfer genes to a range of bacteria in the marine environment (McDaniel et al., 2010). Furthermore, gene transfer via this mechanism is much higher in the coral reef environment than in other marine environments, suggesting an alternate mode of adaptation via swapping of potentially beneficial genes among marine bacteria (McDaniel et al., 2010) and possibly the coral holobiont.

A fundamental requirement of model systems is that they address interspecies interactions in a metaorganism. Research on host-microbe interactions can greatly benefit from a well-documented host-microbe study that spans the spectrum from pathogenicity to mutualism. Much work has been done on the basal metazoan Hydra to illustrate the value of a model systems approach (Weis et al., 2008; Bosch et al., 2009). Because Hydra is associated with a limited number of bacteria, it has provided valuable insight into the molecular basis of immunity and symbiosis in simple animals. Cnidarian and dinoflagellate models can also be used to elucidate roles of bacteria in both coral and Symbiodinium biology. Ideally, these models require cultured symbionts (bacterial and dinoflagellate) and an easily maintained cnidarian host (Weis et al., 2008). Our ability to culture many

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

of these bacterial symbionts will aid in exploring functions that are otherwise impossible to study due to the complex nature of the coral holobiont. Generation of genome sequence data from animal hosts and their associated microorganisms will exponentially enhance our basic understanding of symbiotic associations at the molecular level. This includes reconstruction of host-symbiont phylogenies, analysis of genes important in specific interactions, comparative genomics, and advanced technologies. The sea anemone Aiptasia pallida has recently been proposed as a model for coral biology for a number of reasons (Weis et al., 2008). While corals are difficult to grow in captivity, this species is hardy to laboratory manipulation and grows quickly in aquaria. Many protocols have been developed to manipulate Symbiodinium density in A. pallida without lethal effects on the host, and as a result, this organism has successfully been used to describe mechanisms of coral bleaching (Dunn et al., 2007) and disease (Alagely et al., 2011). Aiptasia pallida represents an opportunity to integrate a model systems approach with novel technologies from the “omics age” to learn more about multipartner interactions in corals in a moment of great environmental change.


This work was funded in part by the Mote Marine Laboratory Protect Our Reefs Grants Program and by the Dart Foundation. We thank Cathleen Sullivan (MML) for assistance with EndNote formatting, and two anonymous reviewers for improvements in the manuscript.


Ainsworth, T. D., R. V. Thurber, and R. D. Gates. 2010. The future of coral reefs: a microbial perspective. Trends Ecol. Evol. 25: 233-240.

Alagely, A., C. J. Krediet, K. B. Ritchie, and M. Teplitski. 2011. Signaling-mediated cross-talk modulates swarming and biofilm formation in a coral pathogen Serratia marcescens. ISME J. 5: 1609-1620.

Albright, R., B. Mason, and C. Langdon. 2008. Effect of aragonite saturation state on settlement and post-settlement growth of Porites astreoides larvae. Coral Reefs 27: 485-490.

Albright, R., B. Mason, M. Miller, and C. Langdon. 2010. Ocean acidification compromises recruitment success of the threatened Caribbean coral Acropora palmata. Proc. Natl. Acad. Sci. USA 107: 20400-20404.

Anthony, K. R. N., D. I. Kline, G. Diaz-Pulido, S. Dove, and O. Hoegh-Guldberg. 2008. Ocean acidification causes bleaching and productivity loss in coral reef builders. Proc. Natl. Acad. Sci. USA 105: 17442-17446.

Anthony, K. R. N., J. A. Kleypas, and J.-P. Gattuso. 2011. Coral reefs modify their seawater carbon chemistry—implications for impacts of ocean acidification. Glob. Change Biol. 17: 3655-3666.

Apprill, A., and M. S. Rappe. 2011. Response of the microbial community to coral spawning in lagoon and reef flat environments of Hawaii, USA. Aquat. Microb. Ecol. 62: 251-266.

Apprill, A., H. Q. Marlow, M. Q. Martindale, and M. S. Rappe. 2009. The onset of microbial associations in the coral Pocillopora meandrina. ISME J. 3: 685-699.

Augustin, R., S. Fraune, and T. C. Bosch. 2010. How Hydra senses and destroys microbes. Semin. Immunol. 22: 54-58.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Azam, F. 1998. Microbial control of oceanic carbon flux: the plot thickens. Science 280: 694-696.

Azam, F., and A. Z. Worden. 2004. Microbes, molecules, and marine ecosystems. Science 303: 1622-1624.

Barott, K. L., B. Rodriguez-Brito, J. Janouskovec, K. L. Marhaver, J. E. Smith, P. Keeling, and F. L. Rohwer. 2011. Microbial diversity associated with four functional groups of benthic reef algae and the reef-building coral Montastraea annularis. Environ. Microbiol. 13: 1192-1204.

Ben-Haim, Y., and E. Rosenberg. 2002. A novel Vibrio sp. pathogen of the coral Pocillopora damicornis. Mar. Biol. 141: 47-55.

Bosch, T. C. G., and M. J. McFall-Ngai. 2011. Metaorganisms as the new frontier. Zoology 114: 185-190.

Bosch, T. C., R. Augustin, F. Anton-Erxleben, S. Fraune, G. Hemmrich, H. Zill, P. Rosenstiel, G. Jacobs, S. Schreiber, M. Leippe et al. 2009. Uncovering the evolutionary history of innate immunity: the simple metazoan Hydra uses epithelial cells for host defence. Dev. Comp. Immunol. 33: 559-569.

Bourne, D., Y. Iida, S. Uthicke, and C. Smith-Keune. 2007. Changes in coral-associated microbial communities during a bleaching event. ISME J. 2: 350-363.

Bourne, D. G., M. Garren, T. M. Work, E. Rosenberg, G. W. Smith, and C. D. Harvell. 2009. Microbial disease and the coral holobiont. Trends Microbiol. 17: 554-562.

Bythell, J. C., and C. Wild. 2011. Biology and ecology of coral mucus release. J. Exp. Mar. Biol. Ecol. 408: 88-93.

Caldeira, K., D. Archer, J. P. Barry, R. G. J. Bellerby, P. G. Brewer, L. Cao, A. G. Dickson, S. C. Doney, H. Elderfield, V. J. Fabry et al. 2007. Comment on “Modern-age buildup of CO2 and its effects on seawater acidity and salinity” by Hugo A. Loaiciga. Geophys. Res. Lett. 34: 10.1029/2006gl027288. L18608.

Ceh, J., M. van Keulen, and D. G. Bourne. 2011. Coral-associated bacterial communities on Ningaloo Reef, Western Australia. FEMS Microbiol. Ecol. 75: 134-144.

Cervino, J. M., F. L. Thompson, B. Gomez-Gil, E. A. Lorence, T. J. Goreau, R. L. Hayes, K. B. Winiarski-Cervino, G. W. Smith, K. Hughen, and E. Bartels. 2008. The Vibrio core group induces yellow band disease in Caribbean and Indo-Pacific reef-building corals. J. Appl. Microbiol. 105: 1658-1671.

Daniels, C. A., A. Zeifman, K. Heym, K. B. Ritchie, C. A. Watson, I. Berzins, and M. Breitbart. 2011. Spatial heterogeneity of bacterial communities in the mucus of Montastraea annularis. Mar. Ecol. Prog. Ser. 426: 29-40.

Denner, E. B. M., G. W. Smith, H. J. Busse, P. Schumann, T. Narzt, S. W. Polson, W. Lubitz, and L. L. Richardson. 2003. Aurantimonas coralicida gen. nov., sp nov., the causative agent of white plague type II on Caribbean scleractinian corals. Int. J. Syst. Evol. Microbiol. 53: 1115-1122.

Dinsdale, E. A., O. Pantos, S. Smriga, R. A. Edwards, F. Angly, L. Wegley, M. Hatay, D. Hall, E. Brown, M. Haynes et al. 2008. Microbial ecology of four coral atolls in the Northern Line Islands. PLoS One 3: e1584.

Dobretsov, S., M. Teplitski, and V. Paul. 2009. Mini-review: quorum sensing in the marine environment and its relationship to biofouling. Biofouling 25: 413-427.

Dunn, S. R., C. E. Schnitzler, and V. M. Weis. 2007. Apoptosis and autophagy as mechanisms of dinoflagellate symbiont release during cnidarian bleaching: every which way you lose. Proc. R. Soc. B Biol. Sci. 274: 3079-3085.

Ferrier-Pages, C., V. Schoelzke, J. Jaubert, L. Muscatine, and O. Hoegh-Guldberg. 2001. Response of a scleractinian coral, Stylophora pistillata, to iron and nitrate enrichment. J. Exp. Mar. Biol. Ecol. 259: 249-261.

Fine, M., and D. Tchernov. 2007. Ocean acidification and scleractinian corals—Response. Science 317: 1032-1033.

Fiore, C. L., J. K. Jarett, N. D. Olson, and M. P. Lesser. 2010. Nitrogen fixation and nitrogen transformations in marine symbioses. Trends Microbiol. 18: 455-463.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Frias-Lopez, J., G. T. Bonheyo, Q. S. Jin, and B. W. Fouke. 2003. Cyanobacteria associated with coral black band disease in Caribbean and Indo-Pacific reefs. Appl. Environ. Microbiol. 69: 2409-2413.

Frias-Lopez, J., J. S. Klaus, G. T. Bonheyo, and B. W. Fouke. 2004. Bacterial community associated with black band disease in corals. Appl. Environ. Microbiol. 70: 5955-5962.

Garren, M., and F. Azam. 2012. New directions in coral reef microbial ecology. Environ. Microbiol. 14: 833–844.

Geffen, Y., and E. Rosenberg. 2005. Stress-induced rapid release of antibacterials by scleractinian corals. Mar. Biol. 146: 931-935.

Geng, H. F., and R. Belas. 2010. Molecular mechanisms underlying roseobacter-phytoplankton symbioses. Curr. Opin. Biotechnol. 21: 332-338.

Gochfeld, D., and G. Aeby. 2008. Antibacterial chemical defenses in Hawaiian corals provide possible protection from disease. Mar. Ecol. Prog. Ser. 362: 119-128.

Guppy, R., and J. C. Bythell. 2006. Environmental effects on bacterial diversity in the surface mucus layer of the reef coral Montastraea faveolata. Mar. Ecol. Prog. Ser. 328: 133-142.

Holmes, G., and R. W. Johnstone. 2010. The role of coral mortality in nitrogen dynamics on coral reefs. J. Exp. Mar. Biol. Ecol. 387: 1-8.

Huettel, M., C. Wild, and S. Gonelli. 2006. Mucus trap in coral reefs: formation and temporal evolution of particle aggregates caused by coral mucus. Mar. Ecol. Prog. Ser. 307: 69-84.

Johnson, C. N., A. R. Flowers, N. F. Noriea III, A. M. Zimmerman, J. C. Bowers, A. DePaola, and D. J. Grimes. 2010. Relationships between environmental factors and pathogenic Vibrios in the northern Gulf of Mexico. Appl. Environ. Microbiol. 76: 7076-7084.

Joint, I., S. C. Doney, and D. M. Karl. 2011. Will ocean acidification affect marine microbes? ISME J. 5: 1-7.

Kelman, D., Y. Kashman, E. Rosenberg, A. Kushmaro, and Y. Loya. 2006. Antimicrobial activity of Red Sea corals. Mar. Biol. 149: 357-363.

Kimes, N. E., J. D. Van Nostrand, E. Weil, J. Z. Zhou, and P. J. Morris. 2010. Microbial functional structure of Montastraea faveolata, an important Caribbean reef-building coral, differs between healthy and yellow-band diseased colonies. Environ. Microbiol. 12: 541-556.

Kline, D. I., N. M. Kuntz, M. Breitbart, N. Knowlton, and F. Rohwer. 2006. Role of elevated organic carbon levels and microbial activity in coral mortality. Mar. Ecol. Prog. Ser. 314: 119-125.

Knowlton, N., and F. Rohwer. 2003. Multispecies microbial mutualisms on coral reefs: the host as a habitat. Am. Nat. 162: S51-S62.

Koh, E. G. L. 1997. Do scleractinian corals engage in chemical warfare against microbes? J. Chem. Ecol. 23: 379-398.

Krediet, C. J., K. B. Ritchie, M. Cohen, E. K. Lipp, K. P. Sutherland, and M. Teplitski. 2009a. Utilization of mucus from the coral Acropora palmata by the pathogen Serratia marcescens and by environmental and coral commensal bacteria. Appl. Environ. Microbiol 75: 3851-3858.

Krediet, C. J., K. B. Ritchie, and M. Teplitski. 2009b. Catabolite regulation of enzymatic activities in a white pox pathogen and commensal bacteria during growth on mucus polymers from the coral Acropora palmata. Dis. Aquat. Org. 87: 57-66.

Kuffner, I. B., and V. J. Paul. 2004. Effects of the benthic cyanobacterium Lyngbya majuscula on larval recruitment of the reef corals Acropora surculosa and Pocillopora damicornis. Coral Reefs 23: 455-458.

Kuffner, I. B., L. J. Walters, M. A. Becerro, V. J. Paul, R. Ritson-Williams, and K. S. Beach. 2006. Inhibition of coral recruitment by macroalgae and cyanobacteria. Mar. Ecol. Prog. Ser. 323: 107-117.

Kuffner, I. B., A. J. Andersson, P. L. Jokiel, K. u. S. Rodgers, and F. T. Mackenzie. 2008. Decreased abundance of crustose coralline algae due to ocean acidification. Nat. Geosci. 1: 114-117.

Kushmaro, A., E. Rosenberg, M. Fine, and Y. Loya. 1997. Bleaching of the coral Oculina patagonica by Vibrio AK-1. Mar. Ecol. Prog. Ser. 147: 159-165.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Kvennefors, E. C. E., and G. Roff. 2009. Evidence of cyanobacterialike endosymbionts in Acroporid corals from the Great Barrier Reef. Coral Reefs 28: 547-547.

Kvennefors, E. C. E., W. Leggat, O. Hoegh-Guldberg, B. M. Degnan, and A. C. Barnes. 2008. An ancient and variable mannose-binding lectin from the coral Acropora millepora binds both pathogens and symbionts. Dev. Comp. Immunol. 32: 1582-1592.

Kvennefors, E. C. E., E. M. Sampayo, T. Ridgway, A. C. Barnes, and O. Hoegh-Guldberg. 2010. Bacterial communities of two ubiquitous Great Barrier Reef corals reveals both site- and species-specificity of common bacterial associates. PLoS One 5: e10401.

Kvennefors, E., E. Sampayo, C. Kerr, G. Vieira, G. Roff, and A. Barnes. 2012. Regulation of bacterial communities through antimicrobial activity by the coral holobiont. Microb. Ecol. 63: 605-618.

Lesser, M. P., L. I. Falcon, A. Rodriguez-Roman, S. Enriquez, O. Hoegh-Guldberg, and R. Iglesias-Prieto. 2007. Nitrogen fixation by symbiotic cyanobacteria provides a source of nitrogen for the scleractinian coral Montastraea cavernosa. Mar. Ecol. Prog. Ser. 346: 143-152.

Littman, R. A., B. L. Willis, and D. G. Bourne. 2009. Bacterial communities of juvenile corals infected with different Symbiodinium (dinoflagellate) clades. Mar. Ecol. Prog. Ser. 389: 45-59.

Littman, R. A., D. G. Bourne, and B. L. Willis. 2010. Responses of coral-associated bacterial communities to heat stress differ with Symbiodinium type on the same coral host. Mol. Ecol. 19: 1978-1990.

Mao-Jones, J., K. B. Ritchie, L. E. Jones, and S. P. Ellner. 2010. How microbial community composition regulates coral disease development. PLoS Biol. 8: e1000345.

Marquis, C. P., A. H. Baird, R. de Nys, C. Holmstrom, and N. Koziumi. 2005. An evaluation of the antimicrobial properties of the eggs of 11 species of scleractinian corals. Coral Reefs 24: 248-253.

McDaniel, L. D., E. Young, J. Delaney, F. Ruhnau, K. B. Ritchie, and J. H. Paul. 2010. High frequency of horizontal gene transfer in the oceans. Science 330: 50-50.

McFall-Ngai, M., E. A. Heath-Heckman, A. A. Gillette, S. M. Peyer, and E. A. Harvie. 2012. The secret languages of coevolved symbioses: insights from the Euprymna scolopes-Vibrio fischeri symbiosis. Semin. Immunol. 24: 3-8.

Meron, D., E. Atias, L. Iasur Kruh, H. Elifantz, D. Minz, M. Fine, and E. Banin. 2011. The impact of reduced pH on the microbial community of the coral Acropora eurystoma. ISME J. 5: 51-60.

Miller, T. R., and R. Belas. 2004. Dimethylsulfoniopropionate metabolism by Pfiesteria-associated Roseobacter spp. Appl. Environ. Microbiol. 70: 3383-3391.

Morrow, K. M., V. J. Paul, M. R. Liles, and N. E. Chadwick. 2011. Allelochemicals produced by Caribbean macroalgae and cyanobacteria have species-specific effects on reef coral microorganisms. Coral Reefs 30: 309-320.

Negri, A. P., N. Webster, R. T. Hill, and A. J. Heyward. 2001. Metamorphosis of broadcast spawning corals in response to bacteria isolated from crustose algae. Mar. Ecol. Prog. Ser. 223: 121-131.

Ng, W.-L., and B. L. Bassler. 2009. Bacterial quorum-sensing network architectures. Annu. Rev. Genet. 43: 197-222.

Nielsen, A. T., T. Tolker-Nielsen, K. B. Barken, and S. Molin. 2000. Role of commensal relationships on the spatial structure of a surface-attached microbial consortium. Environ. Microbiol. 2: 59-68.

Nissimov, J., E. Rosenberg, and C. B. Munn. 2009. Antimicrobial properties of resident coral mucus bacteria of Oculina patagonica. FEMS Microbiol. Lett. 292: 210-215.

Olson, N. D., T. D. Ainsworth, R. D. Gates, and M. Takabayashi. 2009. Diazotrophic bacteria associated with Hawaiian Montipora corals: diversity and abundance in correlation with symbiotic dinoflagellates. J. Exp. Mar. Biol. Ecol. 371: 140-146.

Pandolfi, J. M., R. H. Bradbury, E. Sala, T. P. Hughes, K. A. Bjorndal, R. G. Cooke, D. McArdle, L. McCLenachan, M. J. H. Newman, G. Paredes, R. R. Warner, and J. B. C. Jackson. 2003. Global trajectories of the long-term decline of coral reef ecosystems. Science 301: 955-958.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Patterson, K. L., J. W. Porter, K. E. Ritchie, S. W. Polson, E. Mueller, E. C. Peters, D. L. Santavy, and G. W. Smiths. 2002. The etiology of white pox, a lethal disease of the Caribbean elkhorn coral, Acropora palmata. Proc. Natl. Acad. Sci. USA 99: 8725-8730.

Paul, J. H. 2008. Prophages in marine bacteria: dangerous molecular time bombs or the key to survival in the seas? ISME J. 2: 579-589.

Pollock, F. J., P. J. Morris, B. L. Willis, and D. G. Bourne. 2011. The urgent need for robust coral disease diagnostics. PLoS Pathog. 7: e1002183.

Raina, J.-B., D. Tapiolas, B. L. Willis, and D. G. Bourne. 2009. Coral-associated bacteria and their role in the biogeochemical cycling of sulfur. Appl. Environ. Microbiol 75: 3492-3501.

Raina, J. B., E. A. Dinsdale, B. L. Willis, and D. G. Bourne. 2010. Do the organic sulfur compounds DMSP and DMS drive coral microbial associations? Trends Microbiol. 18: 101-108.

Rajamani, S., W. D. Bauer, J. B. Robinson, J. M. Farrow III, E. C. Pesci, M. Teplitski, M. Gao, R. T. Sayre, and D. A. Phillips. 2008. The vitamin riboflavin and its derivative lumichrome activate the LasR bacterial quorum-sensing receptor. Mol. Plant Microbe Interact. 21: 1184-1192.

Rao, D., J. S. Webb, and S. Kjelleberg. 2005. Competitive interactions in mixed-species biofilms containing the marine bacterium Pseudoalteromonas tunicata. Appl. Environ. Microbiol 71: 1729-1736.

Rasmussen, T. B., and M. Givskov. 2006. Quorum-sensing inhibitors as anti-pathogenic drugs. Int. J. Med. Microbiol. 296: 149-161.

Reis, A. M., S. D. Araujo, Jr., R. L. Moura, R. B. Francini-Filho, G. Pappas, Jr., A. M. Coelho,

R. H. Kruger, and F. L. Thompson. 2009. Bacterial diversity associated with the Brazilian endemic reef coral Mussismilia braziliensis. J. Appl. Microbiol. 106: 1378-1387.

Reshef, L., O. Koren, Y. Loya, I. Zilber-Rosenberg, and E. Rosenberg. 2006. The coral probiotic hypothesis. Environ. Microbiol. 8: 2068-2073.

Ritchie, K. B. 2006. Regulation of microbial populations by coral surface mucus and mucus-associated bacteria. Mar. Ecol. Prog. Ser. 322: 1-14.

Ritchie, K. B. 2011. Bacterial symbionts of corals and Symbiodinium. Pp. 139-150 in Beneficial Microorganisms in Multicellular Life Forms, E. Rosenberg and U. Gophna, eds. Springer, Heidelberg.

Ritson-Williams, R., V. J. Paul, and V. Bonito. 2005. Marine benthic cyanobacteria overgrow coral reef organisms. Coral Reefs 24: 629-629.

Ritson-Williams, R., S. N. Arnold, N. D. Fogarty, R. S. Steneck, M. J. A. Vermeij, and V. J. Paul. 2009. New perspectives on ecological mechanisms affecting coral recruitment on reefs. Pp. 437-457 in Proceedings of the Smithsonian Marine Science Symposium: Smithsonian Contributions to the Marine Sciences, M. A. Lang, I. G. Macintyre, and K. Ru¨tzler, eds. Smithsonian Institution Scholarly Press, Washington, D.C.

Ritson-Williams, R., V. J. Paul, S. N. Arnold, and R. S. Steneck. 2010. Larval settlement preferences and post-settlement survival of the threatened Caribbean corals Acropora palmata and A. cervicornis. Coral Reefs 29: 71-81.

Rohwer, F. R., M. B. Breitbart, J. J. Jara, F. A. Azam, and N. K. Knowlton. 2001. Diversity of bacteria associated with the Caribbean coral Montastraea franksi. Coral Reefs 20: 85-91.

Rohwer, F., V. Seguritan, F. Azam, and N. Knowlton. 2002. Diversity and distribution of coral-associated bacteria. Mar. Ecol. Prog. Ser. 243: 1-10.

Rosenberg, E., O. Koren, L. Reshef, R. Efrony, and I. Zilber-Rosenberg. 2007. The role of micro-organisms in coral health, disease and evolution. Nat. Rev. Microbiol. 5: 355-362.

Rosenberg, E., A. Kushmaro, E. Kramarsky-Winter, E. Banin, and L. Yossi. 2009. The role of microorganisms in coral bleaching. ISME J. 3: 139-146.

Rowan, R., and N. Knowlton. 1995. Intraspecific diversity and ecological zonation in coral-algal symbiosis. Proc. Natl. Acad. Sci. USA 92: 2850-2853.

Schmitt, S., J. B. Weisz, N. Lindquist, and U. Hentschel. 2007. Vertical transmission of a phylogenetically complex microbial consortium in the viviparous sponge Ircinia felix. Appl. Environ. Microbiol. 73: 2067-2078.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Schmitt, S., P. Tsai, J. Bell, J. Fromont, M. Ilan, N. Lindquist, T. Perez, A. Rodrigo, P. J. Schupp, J. Vacelet, N. Webster, U. Hentschel, and M. W. Taylor. 2011. Assessing the complex sponge microbiota: core, variable and species-specific bacterial communities in marine sponges. ISME J. 6: 564-576.

Sekar, R., L. T. Kaczmarsky, and L. L. Richardson. 2008. Microbial community composition of black band disease on the coral host Siderastrea siderea from three regions of the wider Caribbean. Mar. Ecol. Prog. Ser. 362: 85-98.

Sharon, G., and E. Rosenberg. 2010. Healthy corals maintain Vibrio in the VBNC state. Environ. Microbiol. Rep. 2: 116-119.

Sharp, K. H., B. Eam, D. J. Faulkner, and M. G. Haygood. 2007. Vertical transmission of diverse microbes in the tropical sponge Corticium sp. Appl. Environ. Microbiol. 73: 622-629.

Sharp, K. H., K. B. Ritchie, P. J. Schupp, R. Ritson-Williams, and V. J. Paul. 2010. Bacterial acquisition in juveniles of several broadcast spawning coral species. PLoS One 5: e10898.

Sharp, K. H., D. Distel, and V. J. Paul. 2012. Diversity and dynamics of bacterial communities in early life stages of the Caribbean coral Porites astreoides. ISME J. 6: 790–801.

Shnit-Orland, M., and A. Kushmaro. 2009. Coral mucus-associated bacteria: a possible first line of defense. FEMS Microbiol. Ecol. 67: 371-380.

Skindersoe, M. E., P. Ettinger-Epstein, T. B. Rasmussen, T. Bjarnsholt, R. de Nys, and M. Givskov. 2008. Quorum sensing antagonism from marine organisms. Mar. Biotechnol. 10: 56-63.

Slightom, R. N., and A. Buchan. 2009. Surface colonization by marine roseobacters: integrating genotype and phenotype. Appl. Environ. Microbiol. 75: 6027-6037.

Smith, J. E., M. Shaw, R. A. Edwards, D. Obura, O. Pantos, E. Sala, S. A. Sandin, S. Smriga, M. Hatay, and F. L. Rohwer. 2006. Indirect effects of algae on coral: algae-mediated, microbe-induced coral mortality. Ecol. Lett. 9: 835-845.

Stat, M., D. Carter, and O. Hoegh-Guldberg. 2006. The evolutionary history of Symbiodinium and scleractinian hosts—symbiosis, diversity, and the effect of climate change. Perspect. Plant Ecol. Evol. Syst. 8: 23-43.

Sunagawa, S., C. M. Woodley, and M. Medina. 2010. Threatened corals provide underexplored microbial habitats. PLoS One 5: e9554.

Sutherland, K. P., S. Shaban, J. L. Joyner, J. W. Porter, and E. K. Lipp. 2011. Human pathogen shown to cause disease in the threatened elkhorn coral Acropora palmata. PLoS One 6: e23468.

Sweet, M. J., A. Croquer, and J. C. Bythell. 2011a. Bacterial assemblages differ between compartments within the coral holobiont. Coral Reefs 30: 39-52.

Sweet, M. J., A. Croquer, and J. C. Bythell. 2011b. Development of bacterial biofilms on artificial corals in comparison to surface-associated microbes of hard corals. PLoS One 6: e21195.

Tait, K., Z. Hutchison, F. L. Thompson, and C. B. Munn. 2010. Quorum sensing signal production and inhibition by coral-associated vibrios. Environ. Microbiol. Rep. 2: 145-150.

Tebben, J., D. M. Tapiolas, C. A. Motti, D. Abrego, A. P. Negri, L. L. Blackall, P. D. Steinberg, and T. Harder. 2011. Induction of larval metamorphosis of the coral Acropora millepora by tetrabromopyrrole isolated from a Pseudoalteromonas bacterium. PLoS One 6: e19082.

Teplitski, M., and K. Ritchie. 2009. How feasible is the biological control of coral diseases? Trends Ecol. Evol. 24: 378-385.

Teplitski, M., K. Warriner, J. Bartz, and K. R. Schneider. 2011. Untangling metabolic and communication networks: interactions of enterics with phytobacteria and their implications in produce safety. Trends Microbiol. 19: 121-127.

Thompson, J. R., M. A. Randa, L. A. Marcelino, A. Tomita-Mitchell, E. Lim, and M. F. Polz. 2004. Diversity and dynamics of a north Atlantic coastal Vibrio community. Appl. Environ. Microbiol 70: 4103-4110.

Thurber, R. V., D. Willner-Hall, B. Rodriguez-Mueller, C. Desnues, R. A. Edwards, F. Angly, E. Dinsdale, L. Kelly, and F. Rohwer. 2009. Metagenomic analysis of stressed coral holobionts. Environ. Microbiol. 11: 2148-2163.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Tran, C., and M. G. Hadfield. 2011. Larvae of Pocillopora damicornis (Anthozoa) settle and metamorphose in response to surface-biofilm bacteria. Mar. Ecol. Prog. Ser. 433: 85-96.

Vidal-Dupiol, J., O. Ladriere, D. Destoumieux-Garzon, P. E. Sautiere, A. L. Meistertzheim, E. Tambutte, S. Tambutte, D. Duval, L. Foure, M. Adjeroud, and G. Mitta. 2011a. Innate immune responses of a scleractinian coral to vibriosis. J. Biol. Chem. 286: 22688-22698.

Vidal-Dupiol, J., O. Ladriere, A. L. Meistertzheim, L. Foure, M. Adjeroud, and G. Mitta. 2011b. Physiological responses of the scleractinian coral Pocillopora damicornis to bacterial stress from Vibrio coralliilyticus. J. Exp. Biol. 214: 1533-1545.

Wagner-Dobler, I., and H. Biebl. 2006. Environmental biology of the marine Roseobacter lineage. Annu. Rev. Microbiol. 60: 255-280.

Webster, N. S., R. I. Webb, M. J. Ridd, R. T. Hill, and A. P. Negri. 2001. The effects of copper on the microbial community of a coral reef sponge. Environ. Microbiol. 3: 19-31.

Webster, N. S., L. D. Smith, A. J. Heyward, J. E. M. Watts, R. I. Webb, L. L. Blackall, and A. P. Negri. 2004. Metamorphosis of a scleractinian coral in response to microbial biofilms. Appl. Environ. Microbiol. 70: 1213-1221.

Webster, N. S., R. Soo, R. Cobb, and A. P. Negri. 2011. Elevated seawater temperature causes a microbial shift on crustose coralline algae with implications for the recruitment of coral larvae. ISME J. 5: 759-770.

Wegley, L., R. Edwards, B. Rodriguez-Brito, H. Liu, and F. Rohwer. 2007. Metagenomic analysis of the microbial community associated with the coral Porites astreoides. Environ. Microbiol. 9: 2707-2719.

Weil, E., and C. S. Rogers. 2011. Coral reef diseases in the Atlantic-Caribbean. Pp. 465-491 in Coral Reefs: An Ecosystem in Transition, Z. Dubinsky and N. Stambler, eds. Springer, Science Business Media, Dordrecht, The Netherlands.

Weis, V. M., S. K. Davy, O. Hoegh-Guldberg, M. Rodriguez-Lanetty, and J. R. Pringe. 2008. Cell biology in model systems as the key to understanding corals. Trends Ecol. Evol. 23: 369-376.

Wild, C., M. Huettel, A. Klueter, S. G. Kremb, M. Y. M. Rasheed, and B. B. Jorgensen. 2004. Coral mucus functions as an energy carrier and particle trap in the reef ecosystem. Nature 428: 66-70.

Wild, C., M. S. Naumann, A. Haas, U. Struck, F. W. Mayer, M. Y. Rasheed, and M. Huettel. 2009. Coral sand O2 uptake and pelagicbenthic coupling in a subtropical fringing reef, Aqaba, Red Sea. Aquat. Biol. 6: 133-142.

Wood-Charlson, E. M., L. L. Hollingsworth, D. A. Krupp, and V. M. Weis. 2006. Lectin/glycan interactions play a role in recognition in a coral/dinoflagellate symbiosis. Cell. Microbiol. 8: 1985-1993.

Yi, H., Y. W. Lim, and J. Chun. 2007. Taxonomic evaluation of the genera Ruegeria and Silicibacter: a proposal to transfer the genus Silicibacter Petursdottir and Kristjansson 1999 to the genus Ruegeria Uchino et al. 1999. Int. J. Syst. Evol. Microbiol. 57: 815-819.

Zilber-Rosenberg, I., and E. Rosenberg. 2008. Role of microorganisms in the evolution of animals and plants: the hologenome theory of evolution. FEMS Microbiol. Rev. 32: 723-735.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.



Suzanne Joneson,47Jason E. Stahich,48Shin-Han Shiu,49
and Erica Bree Rosenblum47,*


Understanding the molecular mechanisms of pathogen emergence is central to mitigating the impacts of novel infectious disease agents. The chytrid fungus Batrachochytrium dendrobatidis (Bd) is an emerging pathogen of amphibians that has been implicated in amphibian declines worldwide. Bd is the only member of its clade known to attack vertebrates. However, little is known about the molecular determinants of—or evolutionary transition to—pathogenicity in Bd. Here we sequence the genome of Bd’s closest known relative—a non-pathogenic chytrid Homolaphlyctis polyrhiza (Hp). We first describe the genome of Hp, which is comparable to other chytrid genomes in size and number of predicted proteins. We then compare the genomes of Hp, Bd, and 19 additional fungal genomes to identify unique or recent evolutionary elements in the Bd genome. We identified 1,974 Bd-specific genes, a gene set that is enriched for protease, lipase, and microbial effector gene ontology terms. We describe significant lineage-specific expansions in three Bd protease families (metallo-, serine-type, and aspartyl proteases). We show that these protease gene family expansions occurred after the divergence


46 Reprinted from PLoS Pathogens. Originally published as Joneson S, Stajich JE, Shiu S-H, Rosenblum EB (2011) Genomic Transition to Pathogenicity in Chytrid Fungi. PLoS Pathogens 7(11): e1002338. doi:10.1371/


Copyright: ©2011 Joneson et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: We acknowledge NIH funding from the COBRE Program administered by the Initiative for Bioinformatics and Evolutionary Studies (P20RR0116454) and the INBRE Program of the National Center for Research Resources (P20RR016448) to E.B.R., Initial Complement funding provided by the University of California to J.E.S., and NSF funding (DBI-0939454) to the BEACON Center for the Study of Evolution in Action. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail:

47 Department of Biological Sciences, University of Idaho, Moscow, Idaho, USA.

48 Department of Plant Pathology and Microbiology, University of California, Riverside, California, USA.

49 Department of Plant Biology, Michigan State University, East Lansing, Michigan, USA.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

of Bd and Hp from their common ancestor and thus are localized to the Bd branch. Finally, we demonstrate that the timing of the protease gene family expansions predates the emergence of Bd as a globally important amphibian pathogen.

Author Summary

The chytrid fungus Batrachochytrium dendrobatidis (Bd) is an emerging pathogen that has been implicated in decimating amphibian populations around the world. Bd is the only member of an ancient group of fungi (called the Chytridiomycota) that is known to attack vertebrates. The question of how an amphibian-killing fungus evolved from non-pathogenic ancestors is vital to protecting the world’s remaining amphibians from Bd. We sequenced the genome of Bd’s closest known relative—a non-pathogenic chytrid named Homolaphlyctis polyrhiza (Hp). We compared the genomes of Bd, Hp and 18 additional fungi to identify what makes Bd unique. We identified a large number of Bd-specific genes, a gene set that contains a number of possible pathogenicity factors. In particular, we describe a large number of protease genes in the Bd genome and show that these genes were duplicated after the divergence of Bd and Hp from their common ancestor. Studying Bd’s pathogenesis in an evolutionary context provides new evidence for the role of protease genes in Bd’s ability to kill amphibians.


Understanding the emergence of novel pathogens is a central challenge in epidemiology, disease ecology, and evolutionary biology. Emerging pathogens of humans, wildlife, and agriculturally important crops generally have a dynamic recent evolutionary past. For example, many emerging pathogens have become adapted to new environmental conditions, shifted their host range, and/or evolved more virulent forms (Hoskisson and Trevors, 2010; Smith and Guegan, 2010; Woolhouse and Gaunt, 2007). Identifying the genetic basis of these evolutionary shifts can lend insight into the mechanisms of pathogen emergence.

Studies of the amphibian-killing fungus Batrachochytrium dendrobatidis (Bd) provide an opportunity to better understand evolutionary transitions to pathogenicity. Bd is considered the leading cause of amphibian declines worldwide and is found on every continent where amphibians occur (Berger et al., 1998; Lips et al., 2006). Bd infects amphibian skin and the resulting disease, chytridiomycosis, is responsible for population declines and extirpations in hundreds of amphibian species (Lötters et al., 2004; Skerratt et al., 2007). Bd is the only documented vertebrate pathogen in a diverse, early-branching lineage of fungi called the Chytridiomycota. Some chytrids are pathogens of plants, but most chytrids are primarily known to survive on decaying organic material as saprobes (James et al., 2006). The question of how an amphibian-killing fungus evolved

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

from an ancestor that was not a vertebrate pathogen is vital to understanding and mitigating the chytridiomycosis epidemic and will also shed light on the evolution of novel pathogens more broadly.

Investigating the transition to pathogenicity in chytrid fungi requires an explicitly evolutionary perspective. Specifically, identifying elements of the genome that have undergone recent evolution in the branch leading to Bd may help us determine how Bd attacks its amphibian hosts. Previously we identified several families of proteases that may be involved in Bd’s ability to infect amphibian skin. Specifically, we found expanded gene families of metallo- and serine proteases in the Bd genome that exhibit life-stage specific gene expression patterns (Rosenblum et al., 2008). These proteases have been hypothesized to play a role in the ability of other fungal pathogens to invade and degrade host tissue (Burmester et al., 2011; da Silva et al., 2006; Monod, 2008; Monod et al., 2002). However, previous studies could not resolve if these gene family expansions occurred along the branch leading to Bd because the fungal genomes available for comparison were only distantly related to Bd.

To determine what unique features of the Bd genome might relate to its ability to colonize amphibian skin, we compared genomes of Bd and its closest known relative, Homolaphlyctis polyrhiza (Hp) (this isolate has been described by Joyce Longcore [pers. comm.] and has been referred to as “JEL142” in previous publications [James et al., 2006]). Bd and Hp are in the same Rhizophydiales order (Letcher et al., 2006), and Bd is the only member of this clade known to be a vertebrate pathogen (James et al., 2006). We first confirmed that Hp cannot survive on amphibian skin alone. We then sequenced and characterized the genome of Hp using Roche-454 pyrosequencing. Finally, we used a comparative genomics approach to identify differences between Bd and Hp using additional fungal species as outgroups. Based on identified unique elements of the Bd genome, we develop hypotheses for the mechanisms and evolution of Bd pathogenicity.

Materials and Methods

Taxon Sampling

Our focal isolates were the JAM81 strain of Bd and the JEL142 strain of Hp. JAM81 was isolated from Rana muscosa in the Sierra Nevada Mountains in California, where Bd has caused catastrophic declines in R. muscosa populations (Rachowicz et al., 2006). Hp was collected from leaf litter in Maine and is a presumed saprobe. We also used the information from publically available genomes of an additional Bd isolate—JEL423 (, and an additional chytrid, Spizellomyces punctatus, a terrestrial saprobe (Origins of Multicellularity Sequencing Project, Broad Institute of Harvard and MIT []). Finally we used the genome information from 17 additional

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

publicly available fungal genomes (Table S1). We chose these outgroups to represent a broad phylogenetic survey of fungi that span four additional fungal phyla: Blastocladiomycota (Allomyces macrogynus), Zygomycota (Phycomyces blakesleeanus), Basidiomycota (Coprinopsis cinerea, Cryptococcus neoformans, Puccinia graminis f. sp. tritici, Ustilago maydis), and Ascomycota (Arthroderma benhamiae, Aspergillus nidulans, Blastomyces dermatitidis, Botrytis cinerea, Coccidioides immitis, Fusarium graminearum, Microsporum canis, Neurospora crassa, Pyrenophora tritici-repentis, Trichophyton rubrum, and Uncinocarpus reesii). Arthroderma benhamiae, M. canis, and T. rubrum were chosen in particular because they are dermatophytes (i.e., fungal pathogens that infect skin).

We reconstructed the phylogenetic relationships among the 19 taxa used in this study using Bayesian phylogenetic analyses of 51 single-copy genes. The alignment was comprised of 21,182 total trimmed amino acid residues. The orthologous sequences were aligned with T-Coffee (Notredame et al., 2000), concatenated, and trimmed with trimAl (Capella-Gutiérrez et al., 2009). The Basidiomycota phylum was constrained by members Ustilago maydis and Puccinia graminis, and the tree rooted with the Chytridiomycota clade based on James et al. (2006). Bayesian posterior probabilities are shown below internal nodes and ML bootstrap values from 100 replicates above the nodes.

Growth of Bd and Hp on Amphibian Skin

We grew Bd (JAM81) and Hp on the standard growth medium PmTG (made from peptonized milk, tryptone and glucose) (Barr, 1986). After one week of growth, we transferred 3.8×106 zoospores from each isolate to 3 mL of two liquid growth conditions: standard growth media and amphibian skin. For standard growth media we used 1% liquid PmTG, and for amphibian skin we used 10% w/v pulverized and autoclaved cane-toad skin in water. We established six technical replicates of each isolate in each condition. Liquid cultures were gently shaken in 6-well tissue culture plates. To test how long Bd and Hp survived in each growth condition, we tested an aliquot from each culture every day for 14 days. Each day we removed 15 μL from each of the technical replicates, pooled aliquots for each isolate in each treatment group, and inoculated PmTG-agar growth plates. We inspected growth plates every day using 200× magnification to visualize whether active zoospores were produced.

Hp Genome Sequence, Assembly, and Annotation

We grew Hp at room temperature (23–25C) in liquid PmTG medium with gentle agitation for approximately 2 weeks. We extracted Hp DNA using a Zolan and Pukkila (Zolan and Pukkila, 1986) protocol modified by the use of 2% sodium dodecyl sulphate as extraction buffer in place of CTAB. We sequenced the

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Hp genome using a Roche 454 Genome Sequencer FLX with Titanium chemistry and standard Roche protocol. We screened and trimmed 1,100,797 reads of vector sequences and assembled them with Roche’s GS De Novo Assembler. We improved the assembly by synteny-based alignment to the JAM81 genome sequence with Mercator (Dewey, 2007).

We annotated the Hp genome with predicted proteins using the MAKER annotation pipeline (Cantarel et al., 2008). MAKER predicts proteins based on homology with protein-coding sequences of other species, and with the consensus of the ab initio gene prediction algorithms GeneMark, AUGUSTUS, and SNAP. GeneMark is self-training so we simply applied it to determine ab initio parameters. We trained AUGUSTUS using parameters provided in the MAKER package and previously determined Bd training parameters. We trained SNAP by iteratively running MAKER with SNAP Bd models and then retraining on the most confident gene model parameters from the initial run. All parameters files are available in Because MAKER’s final set of predicted proteins (referred to hereafter as “Hp_Maker”) is a conservative estimate that relies upon the consensus of different prediction algorithms, we also used the set of ab initio predicted proteins in MAKER by GeneMark-ES (Ter-Hovhannisyan et al., 2008) as an upper limit (referred to hereafter as “Hp_GeneMark”). Hp_Maker is not a perfect subset of Hp_GeneMark, so we considered both datasets when characterizing the proteome of Hp. We annotated Hp protein models by comparison to the Pfam database of protein domains (Finn et al., 2010) using HMMER 3.0 (

We used two methods that rely on different algorithms to confirm that we successfully identified the majority of Hp proteins. First, we used the eukaryotic genome annotation pipeline CEGMA to predict the number of core eukaryotic genes in the Hp alignment (Parra et al., 2007). Second we determined the number of “chytrid-specific” orthologous groups that were present in the Hp genome. We defined chytrid-specific orthologous groups as those groups shared between all available Chytridiomycota genomes: two Bd isolates (JAM81 and JEL423) and one Spizellomyces punctatus isolate (DAOM BR117) (Table S1). We identified chytrid-specific orthologous groups using BLASTP (Altschul et al., 1990) and OrthoMCL (Li et al., 2003), and determined how many of these were also found within either set of Hp predicted proteins (i.e., Hp_Maker and Hp_GeneMark).

Bd Unique Genomic Features

We also used BLASTP and OrthoMCL to determine orthologous groups for all sampled taxa. These orthologous groups were used to determine “Bd-specific” genes which we defined as those groups or genes that were present in both sequenced Bd genomes (JAM81 and JEL423) but absent from all other sampled fungi. [Note that the Bd-specific gene set is distinct from the more broadly defined

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

chytrid-specific gene set discussed above]. We used GO::TermFinder (Boyle et al., 2004) to determine if the Pfam annotations for the set of Bd-specific genes showed enrichment for particular GO terms.

Bd Gene Family Expansions

We identified several gene family expansions in Bd through inspection of the top ten largest Bd-specific orthologous groups and inspection of enriched GO categories. We found gene family expansions in families with genes containing M36, S41, and Asp (both Asp and Asp_protease) protease Pfam signature domains (see Table S2 for sequences and their Pfam domain delimitation). We conducted an exhaustive search in the focal genomes for M36, S41, and Asp domains using HMMER3 ( For Hp we conducted the HMMER3 search in both the MAKER and GENEMARK datasets. For S41 and Asp, the predicted proteins from Maker were subsets of those from GeneMark, so we only report GeneMark names. For M36 there were several Maker predicted proteins that were not included in the GeneMark set, so we report both Maker and GeneMark names. We then aligned the sequences of the protein domains for all members in each expanded family for the three Chytridiomycota genomes (Bd, Hp, and Spizellomyces punctatus) and one Blastocladiomycota outgroup (Allomyces macrogynus). We generated these alignments using the iterative alignment program MUSCLE (Edgar, 2004). After inspecting the alignments, we found that 8 M36 and 13 Asp protein sequences were missing >50% of their domain sequences. These partial sequences were likely mis-annotation or pseudogenes so we excluded them from further analysis (see Table S2B for identities of excluded partial sequences). After aligning the protein domain sequences of the remaining proteins (see Figure S1 for alignments), we reconstructed gene trees for each family using the Maximum Likelihood method implemented in RAxML (Stamatakis et al., 2005). We used the rapid bootstrap algorithm (400 replicates) with the Jones-Taylor-Thornton substitution matrix assuming a gamma model of rate heterogeneity. We report the Maximum Likelihood trees with the highest log likelihood score and bootstrap support values.

We calculated synonymous and non-synonymous substitution rates (Ks and Ka, respectively) with the yn00 program implemented in the PAML package (Yang, 2007) using full length annotated coding sequences. For each expanded protease gene family (containing M36, S41, and Asp domains) we calculated Ks and Ka of putative orthologs between all focal taxa pairs [i.e., chytrids (Bd, Hp, and Spizellomyces punctatus) and between all focal taxa and the outgroup (Allomyces macrogynus)]. We identified putative orthologs based on a cross-species reciprocal best match between any species pairs (Hanada et al., 2008). In addition, we used a second, more stringent approach that required sequence distances between reciprocal best matches to follow the relationships between the

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

four focal species. Because the rate distributions from these two approaches were similar, we only report results from the first approach. Because yn00 does not robustly correct for multiple substitutions (Yang and Bielawski, 2000), and because Ks values are large between our focal taxa, we use Ks values to make a general comparison (within versus between species) for rates of molecular evolution.

We made rough divergence time estimates for the duplication events in the three expanded protease gene families using “node-Ks” as a proxy of time. The node Ks is defined as follows: for each node N in the mid-point rooted phylogeny, its Ks is the averaged Ks values between all operational taxonomic unit pairs across the two lineages that originated from N. There are no empirical estimates of chytrid substitution rates, so we do not propose specific dates for the duplication events. However, we do use a rough approximation for a reasonable substitution rate (following previous molecular evolution studies in fungi [Lynch and Conery, 2000]) to test whether the timing of gene duplications was likely coincident with the emergence of Bd as a deadly amphibian pathogen.


Taxon Sampling

The phylogenetic relationship among all 19 taxa in this study can be seen in Figure A12-1. As described above, we sampled genomes from across the diversity of five fungal phyla (i.e., Chytridiomycota, Blastocladiomycota, Zygomycota, Basidiomycota, Ascomycota). Our sampling scheme allowed us to determine, in a phylogenetic context, which elements of Bd’s genome are shared with Hp and other fungal taxa.

Growth of Bd and Hp on Amphibian Skin

Both Bd and Hp grew well in standard PmTG growth media and produced viable zoospores throughout the entire 14 day observation period. However, only Bd survived on frog skin alone. Bd produced viable zoospores in the cane-toad skin treatment throughout the entire observation period, and after 14 days of incubation the Bd—frog skin solution was cloudy with chytrid growth and degraded skin (Figure A12-2). Conversely, Hp did not survive and reproduce on cane-toad skin alone. We observed viable zoospores for Hp in the cane-toad skin treatment only for the first three days (these zoospores most likely persisted from the initial inoculation), and after 14 days of incubation the Hp—cane-toad skin solution remained clear of chytrid growth and the cane-toad skin remained intact and not further degraded (Figure A12-2). We did not observe the growth of any bacterial or fungal contaminants in any of the treatments.

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.


FIGURE A12-2 Chytrid growth on cane-toad-skin. A. Negative control (no chytrid): intact skin after 14 days. B. Hp treatment: intact skin and no Hp growth after 14 days. C. Bd treatment: degraded skin and Bd growth after 14 days.

Hp Genome Sequence, Assembly, and Annotation

We achieved a roughly 11.2× coverage of the Hp genome (total number of aligned bases divided by final genome length, assuming that most of the genome is represented in the aligned reads). We assembled 922,085 screened and trimmed sequencing reads into 16,311 contigs (N50 = 36,162). We inferred a haploid genome size for Hp of 26.7 Mb, comparable to other Chytridiomycota genomes [Bd (JAM81) = 24.3 Mb, and Spizellomyces punctatus = 24.1 Mb]. We have deposited the Hp 454 reads in GenBank through the NCBI Sequence Read Archives under the accession SRA037431.1, and we have deposited the Whole Genome Shotgun project at DDBJ/EMBL/GenBank under the accession AFSM00000000 (the version described here is the first version, AFSM01000000).

We generated 5,355 high confidence MAKER predictions and 11,857 GeneMark ab initio predictions for Hp’s protein coding genes. The number of predicted Hp proteins falls within the range of other annotated chytrid genomes (8,732 predicted proteins in Bd (JAM81) and 8,804 in Spizellomyces punctatus). The difference in number of Hp predicted protein numbers between MAKER and GeneMark is due to MAKER’s conservative approach, which relies upon homology with protein-coding sequences of other species, and with the consensus of multiple ab initio gene prediction algorithms. We did not directly validate the number of expressed genes in our predicted protein sets with EST or RNA sequencing. However, we did compare the Hp predicted protein set to gene content in other species, which provides confidence in the Hp annotation and assembly. We recovered 92% (228/249) of the core eukaryotic genes using CEGMA in the Hp_Maker dataset. Similarly, we identified 3,216 orthologous groups of “chytrid-specific” proteins shared among both Bd isolates and S. punctatus (Table S3). Of the predicted chytrid-protein set we recovered 90% (2,885/3,216) in one or both Hp predicted protein sets (2,271 in Hp_Maker and 2,817 in Hp_GeneMark).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

Together, these results indicate that our sequencing efforts recovered a large proportion of genes that are predicted to occur in the Hp genome.

Bd Unique Genomic Features

We identified Bd-specific genes using the genomes of Hp and 17 additional fungi. We considered genes to be Bd-specific if they were present in orthologous groups in both sequenced Bd genomes (JAM81 and JEL423) and absent from all other fungi including Hp. Using OrthoMCL clustered proteins we defined 6,556 orthologous groups in Bd (Table S4). Of the 6,556 orthologous groups in Bd, 1700 were Bd-specific by the above definition. The Bd-specific orthologous groups were comprised of 1,974 protein encoding genes, 417 (21%) of which could be functionally categorized by a Pfam domain (with an e-value <0.01) (Table S4). We did not find any orthologous groups uniquely shared between Bd and the dermatophytes to the exclusion of all other fungal outgroups (Table S4). Although we defined orthologous groups using the sequenced genomes of both Bd isolates (JAM81 and JEL423), below we report gene IDs from JAM81 for simplicity.

We conducted enrichment analyses using gene ontology (GO) terms from the set of 417 Bd-specific genes associated with a Pfam domain and found enrichment in all 3 GO structured vocabularies: Cellular Component, Biological Process, and Molecular Function. We present all significantly enriched GO terms (with a corrected P-value of ≤ = 0.05) for the Bd-specific gene set in Table A12-1. Briefly, in the Biological Process ontology we found enrichment for genes involved in metabolic processes and regulation of carbohydrates, proteins, and transcription. In the Cellular Component ontology we found enrichment of genes located extracellularly, in the nucleus, and in membranes. In the Molecular Function ontology we found enrichment for genes involved in zinc-ion binding, protein dimerization, DNA-binding, hydrolase activity, and protease and triglyceride lipase activity.

Within the set of Bd-specific and GO-enriched genes were several functional groups of particular interest for their possible role in Bd pathogenesis. First, many Bd-specific genes were proteases and were found in expanded gene families (see below). Second, the Bd-specific gene set was enriched for genes containing the Lipase_3 Pfam domain found in triacylglyceride lipases (6 of 417 in the Bd-specific gene list, vs 20 of 8732 in the genome, p<0.03) (BATDEDRAFT 93190, BATDEDRAFT 26490, BATDEDRAFT_86691, BATDEDRAFT 93191, BATDEDRAFT_89307, BATDEDRAFT_26489). Third, we identified 62 genes from the Bd-specific gene set that encode Crinkler or CRN-like microbial effectors (CRN), a class of genes previously reported only in oomycetes and not found in any of the other fungi considered here (Figure A12-3 and Table S5).

Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.

TABLE A12-1 The Enrichment of Cellular Component, Biological Process and Molecular Function GO Terms of 417 Bd Specific Genes Associated with a Pfam Domain

GOID Term Corrected p-value # in Bd specific gene set # in Bd genome
Biological Process
GO:0006508 Proteolysis 1.3E-57 99 292
GO:0019538 Protein metabolic process 1.6E-31 126 845
GO:0044238 Primary metabolic process 5.5E-29 179 1644
GO:0043170 Macromolecule metabolic process 1.0E-26 147 1229
GO:0008152 Metabolic process 4.5E-24 196 2075
GO:0019219 Regulation of nucleobase, nucleoside, Nucleotide and nucleic acid metabolic process 6.6E-17 46 205
GO:0051171 Regulation of nitrogen compound metabolic process 6.6E-17 46 205
GO:0009889 Regulation of biosynthetic process 8.2E-17 43 180
GO:0010556 Regulation of macromolecule biosynthetic process 8.2E-17 43 180
GO:0031326 Regulation of cellular biosynthetic process 8.2E-17 43 180
GO:2000112 Regulation of cellular macromolecule biosynthetic process 8.2E-17 43 180
GO:0031323 Regulation of cellular metabolic process 1.8E-16 46 210
GO:0080090 Regulation of primary metabolic process 1.8E-16 46 210
GO:0045449 Regulation of transcription 2.3E-16 42 176
GO:0060255 Regulation of macromolecule metabolic process 6.0E-16 43 189
GO:0010468 Regulation of gene expression 2.1E-15 42 186
GO:0019222 Regulation of metabolic process 4.9E-15 46 227
GO:0065007 Biological regulation 6.8E-15 67 454
GO:0050789 Regulation of biological process 1.5E-14 66 449
GO:0050794 Regulation of cellular process 3.3E-14 62 409
GO:0006355 Regulation of transcription, DNA-dependent 3.4E-14 35 139
GO:0051252 Regulation of RNA metabolic process 3.4E-14 35 139
GO:0051704 Multi-organism process 3.6E-2 5   12
GO:0005975 Carbohydrate metabolic process 4.5E-2 26 253
Cellular Component
GO:0005623 Cell 1.6E-17 150 1571
GO:0044464 Cell part 1.6E-17 150 1571
GO:0005622 Intracellular 5.5E-10 105 1149
GO:0043231 Intracellular membrane-bounded organelle 4.4E-08 51 425
GO:0043227 Membrane-bounded organelle 5.2E-08 51 427
GO:0005634 Nucleus 1.2E-07 38 272
GO:0043229 Intracellular organelle 1.9E-05 61 659
GO:0043226 Organelle 2.1E-05 61 661
GO:0016021 Integral to membrane 3.9E-05 33 271
GO:0016020 Membrane 1.0E-04 58 645
GO:0044425 Membrane part 1.2E-04 40 381
GO:0031224 Intrinsic to membrane 1.6E-04 33 288
Suggested Citation:"Appendix A." Institute of Medicine. 2013. The Science and Applications of Microbial Genomics: Workshop Summary. Washington, DC: The National Academies Press. doi: 10.17226/18261.
GOID Term Corrected p-value # in Bd specific gene set # in Bd genome
GO:0005576 Extracellular region 2.3E-04 14   69
GO:0044421 Extracellular region part 1.4E-03 12   60