Cover Image

PAPERBACK
$79.00



View/Hide Left Panel

Appendix A

A1

THE MICROBIAL FORENSICS PATHWAY FOR USE OF
MASSIVELY PARALLEL SEQUENCING TECHNOLOGIES

Bruce Budowle,1,2Sarah E. Schmedes,1,2and Randall S. Murch1,3

The Challenge

Eliminating the threat of terrorist or criminal attacks with microorganisms or toxin weapons is a continual challenge for biodefense and biosecurity programs. The task is difficult for several reasons: (1) the relative ease of access to a variety of effective source materials (Srivatsan et al., 2008) and options for the delivery of a bioweapon, (2) the minute quantities of materials that can be transferred and yet still be effective, (3) the difficulties in detection and analysis of microbiological evidence, and (4) the lack of well-defined approaches regarding credible inferences that can be made from microbial forensic evidence given extant data. At the onset of an event, it may be difficult to distinguish between a deliberate attack and a naturally occurring outbreak of an infectious disease (Morse and

_______________________

1 Institute of Applied Genetics, University of North Texas Health Science Center, Fort Worth, TX.

2 Department of Forensic and Investigative Genetics, University of North Texas Health Science Center, Fort Worth, TX.

3 Virginia Tech, National Capital Region, Arlington, VA.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 117
Appendix A A1 THE MICROBIAL FORENSICS PATHWAY FOR USE OF MASSIVELY PARALLEL SEQUENCING TECHNOLOGIES Bruce Budowle,1,2 Sarah E. Schmedes,1,2 and Randall S. Murch1,3 The Challenge Eliminating the threat of terrorist or criminal attacks with microorganisms or toxin weapons is a continual challenge for biodefense and biosecurity programs. The task is difficult for several reasons: (1) the relative ease of access to a variety of effective source materials (Srivatsan et al., 2008) and options for the delivery of a bioweapon, (2) the minute quantities of materials that can be transferred and yet still be effective, (3) the difficulties in detection and analysis of microbio- logical evidence, and (4) the lack of well-defined approaches regarding credible inferences that can be made from microbial forensic evidence given extant data. At the onset of an event, it may be difficult to distinguish between a deliberate attack and a naturally occurring outbreak of an infectious disease (Morse and 1   Institute of Applied Genetics, University of North Texas Health Science Center, Fort Worth, TX. 2   Department of Forensic and Investigative Genetics, University of North Texas Health Science Center, Fort Worth, TX. 3   Virginia Tech, National Capital Region, Arlington, VA. 117

OCR for page 117
118 THE SCIENCE AND APPLICATIONS OF MICROBIAL GENOMICS Budowle, 2006; Morse and Khan, 2005). Even if evidence strongly supports the hypothesis of a deliberate attack, it may still be very difficult to attribute the at- tack with certainty to those responsible (i.e., attribution). Attempts to resolve the crime will require advanced methods for characterizing microbial agents, as well as a combination of traditional investigation and intelligence gathering activities. The Approach In response to the need to determine the nature of the threat and the source of the weapon and to identify those who perpetrated the crime, the scientific community rose to the occasion beginning in 1996 and developed the field of microbial forensics. Microbial forensics is the scientific discipline dedicated to analyzing evidence from a bioterrorism act, biocrime, hoax, or inadvertent mi- croorganism/toxin release for attribution purposes (Budowle et al., 2003, 2005a; Köser et al., 2012; Morse and Budowle, 2006). Another goal can be to support analysis of potential bioweapons capabilities for counter-proliferation, treaty verification, and/or interdiction. A forensics investigation initially will attempt to determine the identity of the causal agent and/or source of the bioweapon in much the same manner as in an epidemiological investigation. The epidemiologi- cal concerns are identification and characterization of specific disease-causing pathogens or their toxins, their modes of transmission, and any manipulations that may have been performed intentionally to increase their effects against human, animal, or plant targets (Morse and Budowle, 2006; Morse and Khan, 2005). A microbial forensics investigation proceeds further in that evidence is character- ized to assist in determining the specific source of the sample, as individualizing as possible, and the methods, means, processes, and locations involved to deter- mine the identity of the perpetrator(s) of the attack or to determine that an act is in preparation. A systems analysis may be able to determine the processes used to generate the weapon or how it was delivered, which also can help inform the investigation and attribution decision. The ultimate goal is attribution—to identify the perpetrator(s) or to reduce the potential perpetrator population to as few in- dividuals as possible so investigative and intelligence methods can be effectively and efficiently applied to “build the case” (Figure A1-1). Forensic Targets Microbial forensic evidence may include the microbe, toxin, nucleic acids, protein signatures, inadvertent microbial contaminants, stabilizers, additives, dispersal devices, and indications of the methods used in a preparation. In ad- dition, traditional types of forensic evidence may be informative and should be part of the toolbox of potential analyses of evidence from an act of bioterrorism or biocrime. Traditional evidence includes fingerprints, body fluids and tissues, hair, fibers, documents, photos, digital evidence, videos, firearms, glass, metals, plastics, paint, powders, explosives, tool marks, and soil. Other types of relevant

OCR for page 117
APPENDIX A 119 Exclusion Attribution Source of Evidence/Perpetrator Consistent With… Could not have High confidence common origin in association Power of and Confidence in Analysis, Interpretation, Inferences “Not Guilty Integrate with Other Evidences and Intelligence “Guilty” FIGURE A1-1  The microbial forensics attribution continuum. evidence must be considered to exploit avenues to better achieve attribution, in- cluding proteins and chemical signatures. These types of signatures can only be obtained from crimes where the weaponized material or delivery device is found; they have little use in covert attacks where the biological agent is derived from the victims. Many of these methods are based on sound technologies and are complementary. They can be combined to identify signatures of sample growth, processing, and chronometry (Morse and Budowle, 2006). Matching of sample properties can help to establish the relatedness of disparate incidents. Further- more, mismatches might have exclusionary power or signify a more complex causal relationship between the events under investigation. The results of these analyses can provide information on how, when, and/or where microorganisms were grown and weaponized. While the goal of a microbial forensic analysis is to characterize a sample such that it can be traced to a unique source or at least eliminate other sources, it is unlikely that microbial forensic evidence alone is currently adequate to meet this goal. Emerging Science and Technology To enhance attribution capabilities with microbial evidence, considerable attention is being invested in molecular genetics, genomics, and bioinformatics. These fields are essential to microbial species/strain identification, fine genome variation, virulence determination, pathogenicity characterization, possible ge- netic engineering, and attaining source attribution to the highest degree possible. The various tools that have been, or are being, developed in these areas will help to narrow the potential sources from which the pathogen used in an attack may have originated. Indeed, sequencing of an entire genome has been demonstrated

OCR for page 117
120 THE SCIENCE AND APPLICATIONS OF MICROBIAL GENOMICS as feasible in epidemiological investigations, such as the recent studies of out- breaks of E. coli O104:H4 in Germany and cholera in Haiti (Brzuszkiewicz et al., 2008; Chin et al., 2011; Grad et al., 2012; Hasan et al., 2012; Hendriksen et al., 2011; Mellmann et al., 2011; Rasko et al., 2011; Rohde et al., 2011). In addi- tion, metagenomics studies may become foundational on describing diversity and endemicity. Endemicity becomes important when the relationship between microbes or their genetic residues in samples collected from a site of interest and microbes in the environmental background need to be defined. While the inferential capacity of microbial forensics genetics has yet to reach its full power, the phenomenal new generations of sequencing technology and the concomitant developments for bioinformatics capabilities to handle and extract the explosion of data offer potentials for enhancing microbial forensic investigations. Indeed, the science and technology supporting microbial forensics are advancing at an inconceivable rate. For example, in 2002 in response to the anthrax letter attack, whole genomes of a few isolates were sequenced using shotgun sequencing by TIGR (Budowle et al., 2005b; NRC, 2009; Ravel et al., 2009; Read et al., 2002, 2003). That seemingly nominal analysis, by today’s capabilities, cost ap- proximately $250,000 for one genome, took several weeks, and was unable to characterize but a few samples. Today, such enterprises are a fraction of the cost (and continue to drop dramatically), are becoming more automatable, and provide gigabases and terabytes of data in a matter of days (Bentley et al., 2008; Holt et al., 2008; Loman et al., 2012; MacLean et al., 2009; Margulies et al., 2005). Given the enhanced capabilities of nucleic acid sequencing of microbes the microbial forensics community will embrace these molecular tools. Although developments are needed, one can envision identification of microbes at the spe- cies, strain, and isolate levels being transformed using next- (or better termed “current-”) generation sequencing (CGS). Fine genome detail could become available for routine microbial forensic use. Because CGS provides whole ge- nome characterization capabilities with high depths of coverage (100s to 1,000s fold and beyond), the technology will serve a critical role for research, such as genetic diversity and endemicity studies via metagenomics, and become a rapid diagnostic tool initially when viable and culturable microbes are available. In- deed, whole genome sequencing will reduce the need for a priori design of as- says directed at defined species. The technology should apply at some resolution level to any genome without knowledge of the target. In addition, whole genome sequencing offers the capability to evaluate a sample for indications of genetic engineering. Current Realities However, not all microbial forensic evidence will present itself in a manner where copious quantities of target are available. Some samples will be highly degraded and/or contaminated. Thus, there will be challenges to extract the most

OCR for page 117
APPENDIX A 121 information possible from limited materials and non-viable organisms. To meet these challenges, improved sample collection and extraction methods will be needed, nucleic acid repair methods will be sought, target amplification strategies such as whole genome amplification and selective target capture will be sought, and sequencing chemistries will be enhanced. Because of the throughput, CGS technologies can analyze multiple samples and not even begin to exploit the full throughput of the systems (Brzuszkiewicz et al., 2011; Cummings et al., 2012; Eisen, 2007; Hasan et al., 2012; Holt et al., 2008; Howden et al., 2011; Loman et al., 2012; MacLean et al., 2009; Relman, 2011; Rohde et al., 2011). However, the technology still is evolving and currently does not offer the sensitivity of detection to analyze low-quantity and low-quality DNA samples without some amplification approach prior to sequencing. Nonetheless, CGS is sufficiently mature to be considered useful for microbial forensic applications. Alternatively, technologies, such as mass spectrometry analyses of nucleic acids and real-time PCR, will continue to be used because they offer rapid detection (at species and strain levels) at substantially lower costs (Jacob et al., 2012; Kenefic et al., 2008; Sampath et al., 2005, 2009; U’ren et al., 2005; Vogler et al., 2008). There are a number of CGS instruments and different chemistries. They include Miseq® System and Hiseq™ Sequencing Systems (Illumina, Inc., San Diego, CA), Ion Personal Genome Machine™ (PGM™) Sequencer, Ion Proton™ Sequencer and SOLiD® Systems (Life Technologies, Foster City, CA), and the 454 Genome Sequencer FLX and GS Junior Systems (Roche Diagnostics Cor- poration, Indianapolis, IN) (Bentley et al., 2008; Cummings et al., 2010; Loman et al., 2012; Margulies et al., 2005). In addition, single molecule detection plat- forms, such those from Pacific Bioscience (Chin et al., 2011; Eid et al., 2009) and possibly Oxford Nanopore (Branton et al., 2008) are on the horizon. Each system offers some advantages and limitations for sequencing that will need to be defined with considerations of library preparation, read length, and accuracy. The evaluations should be based on the needs of application-oriented laboratories and not necessarily those of a research laboratory. Initially, microbial forensics instruments will be maintained in controlled laboratory environments. Library preparation is one of the critical limiting factors for transferring CGS technology from a research environment to that of an operational labora- tory. Currently, only a few samples can be prepared at any given time. Thus, while the sequencing throughput of the platforms is high, a sufficient number of samples cannot be readily prepared in an appropriate amount of time to meet the full capacity of the system. Library preparation needs to be simplified. Haloplex (Agilent, Santa Clara, CA) is an example of a library preparation process that potentially can reduce the preparation work required (www.halogenomics.com). This library preparation approach is a single-tube target amplification methodol- ogy that enables a large number of library samples to be prepared manually. The general process is: (1) restriction digest and denature the sample; (2) hybridize probes to targeted ends of the digested fragments; (3) circularize and ligate the

OCR for page 117
122 THE SCIENCE AND APPLICATIONS OF MICROBIAL GENOMICS molecules; and (4) introduce bar codes and amplify the targets by polymerase chain reaction (PCR). Eventually with automation the process might accommo- date the number of samples that may be encountered by high-throughput opera- tional laboratories. As many as 96 bar codes are available, which fits well with the 96-well format and reduces the preparation time from 2 weeks or several days to 6 hours. However, currently Haloplex is not available for use with non-human nucleic acids. One constraint is that the Haloplex system employs restriction digestion of the DNA. The restriction enzymes can potentially cleave a target site of interest (either a single nucleotide polymorphism (SNP) site or within a repeat motif) and render the marker untypable. Unfortunately, the enzymes used in Haloplex are proprietary, and one cannot readily scan for the restriction sites that would be incompatible with the designated targets (although palindromes can be sought for potential sites that may be obliterated). Another strategy for simplifying library preparation and decreasing sample input is that of the Nextera XT DNA Sample Preparation Kit (www.illumina.com). Strategies, such as the Haloplex system and the Nextera XT DNA kit, hold promise for simplifying and possibly automating library preparation. Another factor to consider with CGS technology is sequencing read length and accuracy. Current read lengths for the most widely used CGS instruments typically do not exceed 200 bases, and when they do, the quality of base calling decreases substantially along the length of a read. Longer reads with higher ac- curacy are necessary. Advances in technology for some platform systems suggest that reads up to 400 bases will be feasible in 2012. Another consideration of platform selection is for situations where rapid responses are required (such as in military operations, some pandemics, and bio- terrorism acts). Initially, platforms will be placed in laboratories with controlled environments. One can envision the technology being taken to the field for im- mediate response and exigent circumstances. Robustness of the instrumentation, supply lines of reagents, and service support will be part of the decision process for the instrumentation/chemistry of choice. Fortunately, the technology and supporting interpretation tools continue to evolve and likely will become more robust. Seeking More Power and Depth For design and selection of systems and diagnostics, different diagnostic- based strategies can be considered. They can be based on the sample type, the sample matrix, the amount of work, or the question that one is attempting to address. The latter may be the best suited for conceiving workflow systems. The different scenarios should be considered where nucleic acid analyses may be applied, because these will help guide the needs for the microbial forensic com- munity. They likely are (1) identification of species/strain (i.e., similar to epide- miological needs), (2) attribution, (3) genetic engineering, (4) sample-to-sample

OCR for page 117
APPENDIX A 123 Analysis Focus Identification (Species, • Intelligence/Investigation Strain, Isolate, • Environmental Background Fine Variation) • Epidemiological Data • Additional Forensic Evidence Sample Types Laboratory Evolution & Genetic Filtrates, water, soil, food, swabs, Engineering bulk materials, trace materials, database samples, clinical samples, Second-Generation Sequencing – agricultural materials, background Capable Microbial Forensics Laboratory samples Sample-to- Sample • Rapid, Sequential Response Comparison • Endemic Database Generation, Environmental Context • Inform Intelligence/Investigation • Narrow Potential Sources (Attribution) Metagenomics/ Endemicity/ Diversity FIGURE A1-2  A general overview of the work and information flow from sample to anal- ysis to information developed based on use of second-generation sequencing technology. comparisons, and (5) metagenomics for endemicity (or a modified metagenomics for sample characterization) (Figure A1-2). Sample identification generally would be direct characterization to identify the agent for immediate determination of potential threat and probable cause Figure A1-2 to investigate further. The process of attribution would drill down to the finest resolution possible and make comparisons to other reference samples, databases, or repositories to reduce the possible sources from which the sample originated or to a recent common ancestor. Genetic engineering could be detected by whole genome sequencing. Metagenomics studies have been performed on several platforms, and they will likely provide some foundational data on diversity and endemicity (Eisen, 2007; Relman, 2011; Tringe et al., 2005). The value could be searching various niches for select agents. Suppose that in every sample tested certain select agents are identified. Then there can be two consequences: one is that it may be more difficult to elucidate natural outbreaks versus intentional releases (although strain resolution may reduce the uncertainty); the second could be that such high resolu- tion may be less informative at some threshold depth of coverage. Most metagenomic work to date has been by exploiting a small, single sequence target (16s rRNA), at a very high depth of coverage (Rusch et al., 2007; Venter et al., 2004). These studies often cannot provide resolution be- yond family to genus levels. Clearly such broad range definition will not enable

OCR for page 117
124 THE SCIENCE AND APPLICATIONS OF MICROBIAL GENOMICS individualization or identify select agents. The anthrax investigation could have benefited from a modified metagenomics characterization. The putative common source of the material (RMR1029) was composed of a population of very similar cells. The colony morphological variants found in the evidence from the 2001 anthrax letter attacks were minority components and because of sample prepa- ration and stochastic effects the minor variants potentially could be difficult to detect with PCR-based assays that were developed for the investigation. Because of the high depth of coverage with CGS, the population of low-level variants may be more readily detected, especially if an amplification enrichment step was included that focused only on the known variant sites that defined the morphology types. Such high depth of coverage would substantially reduce the false-positive rate and improve confidence in the potential relationship of the most similar samples to focus investigative leads (Cummings et al., 2010). Indeed, the depth of coverage could be in the millions. While exquisitely sensitive, platform- and chemistry-specific errors may confound interpretation, and thus thresholds of reliability may be necessarily invoked. One could envision extending this population depth analysis, which in es- sence is a simplified metagenomic analysis, and exploiting the concept of using a multi-locus sequence typing (MLST) approach to provide a species-level iden- tification capability (Maiden et al., 1998; Spratt, 1999). A few loci (perhaps the seven typically applied to MLST to 15) could be selected as a standard (e.g., for bacteria). If there is a combination of sufficiently stable sites and evolutionarily rapid sites, the loci could indicate species- to strain-level presence in mixed and metagenomic samples. Using the core seven used for MLST could allow some questions regarding time and place of isolation, host or niche, serotype, and some clinical or drug resistance profiles. This will not be a trivial process because each of the sites will not be physically linked. However, one could determine, if the complete set or a reasonable subset of targets are in a sample, whether there is confidence that a particular species or sets of species are present. In theory this approach could be extended to strain levels. There certainly is enough through- put to consider this capability. The potential already has been established with electrospray ionization mass spectrometry of targeted genes for rapid bacterial species identification (and even for viruses such as influenza). There are sufficient bacterial genomes that have been sequenced to test our hypothesis, and work is under way. Inferences about the significance of genetic evidence may not reach the ul- timate goal of attribution. The most confounding constraint on reaching the full power of attribution is scant data on diversity and endemicity. The vast diversity of the microbial world is unknown and will not be defined substantially with cur- rent approaches in the area where a biocrime or bioterrorist attack has occurred. This limitation is not the sole purview of the microbial forensics community; it plagues the epidemiologists as well. Another limitation that evidentiary samples will likely have is an unknown history. Lack of knowledge on how it was ma- nipulated (e.g., number of passages, exposure to mutagenic agents, length of

OCR for page 117
APPENDIX A 125 storage) will complicate providing inferences about the significance or strength of sequencing results, especially because the distance between samples will be de- termined by the degree of similarity or dissimilarity. Indeed, even defining what is a “match” or “similar” may not be straightforward. Keim (personal communica- tion) has stressed this uncertainty and proffered new terminology—a “member,” to the microbial forensic lexicon based on phylogenetics for the relationship of a sample to some reference samples. Regardless of the terminology used, some data will be needed to define the uncertainty of a “membership” or “association.” In 2006, the need for reconciliation between microbial genomics and systematics was described; microbial forensics and epidemiology were seen to offer useful, practical venues to frame the gaps and priorities (Buckley and Roberts, 2006). This challenge remains. Some assessment of the strength or significance of an analytical result and subsequent comparison also is needed (Budowle et al., 2008; Chakraborty and Budowle, 2011). Of course, because of scant supporting data, such an endeavor will be challenging. Qualitative and/or quantitative statements of the significance of the finding will need to be developed. As an example, consider a forensic analysis of whole genome sequence data that compared two or more sequences, such as an evidence sample profile with that of a reference sample that may be considered a possible direct link or have a common ancestor. The evolutionary rates of the variants will need to be known. But perhaps as consequential, se- quencing error and other factors could inflate the dissimilarity between samples and add a degree of “uncertainty” to some extent. Thus, efforts in defining and quantifying the error rates associated with each CGS platform and chemistry are critically important. Beyond comparison of samples for identification purposes are inferences by whole genome sequencing of phenotypic (i.e., functional) properties of a microbe. For example, even with a whole genome sequence whether a microbe phenotypically displays antimicrobial resistance or susceptibility is still limited. Bacteria may contain multiple pathways, and how the different genes interact is far from being completely understood (Eisen, 2007; Köser et al., 2012; Relman, 2011). Substantial research will be needed such that genotype can be used reli- ably to predict phenotype. Making Sense of Data The ever-increasing amount of microbial genomic sequence data presents a variety of challenges related to the handling and storage of data and the devel- opment of bioinformatics methods that can accommodate such large numbers of whole genomes. Being able to analyze the vast amounts of data in a timely fashion is a key challenge to leveraging the power of these newer sequencing platforms. Software, hardware, and IT support may be the greatest barrier to use of CGS technology. It is unlikely that dedicated bioinformaticists will reside in every microbial forensics laboratory. Data cannot be sent to web-based clouds

OCR for page 117
126 THE SCIENCE AND APPLICATIONS OF MICROBIAL GENOMICS and be analyzed because the results may be classified. Instead, some standardiza- tion and standard operating data analysis and interpretation approaches will be needed. Pipeline and interpretation software will need to be evaluated for reli- ability and seamless diagnostic flow without bioinformatics expert intervention. The output of results must be intelligible to the microbial forensic analyst as well. The ideal software should be a comprehensive tool(s) enabling microbe detection to determination of engineering. The government should rely heavily on industry and well-established ge- nome centers. The commercial competitive environment is driving down costs and improving informatics pipelines without the need for extensive investment. Leveraging these efforts will help meet the needs of microbial forensics more ex- peditiously than going it alone. The centers (to include the national laboratories) are evaluating platforms and chemistries and are generating data at unprecedented levels. They are providing solutions to massive data handling, including storage, curation of reference data, annotation, and data analysis. Collection and databases are needed to house the microbial genomic data and when possible the accompanying meta-data. No standards yet exist for building databases to meet the needs of the microbial forensic community. Requirements for storage and retrieval of raw sequence data in microbial forensics cases and supporting inferential data must be developed. Given the high throughput and anticipated speed of analyses, it is conceivable that meaningful databases can be developed “on the fly” that better reflect the diversity where the crime was com- mitted (to include the preparation laboratory to the crime scene). The power of microbial forensics techniques, tools, software, and databases that are used need to be understood, and their limitations even more so need to be understood. To achieve this goal methods need to be validated, and valida- tion should be a requisite of any forensic repertoire. Indeed the forensic sciences in general are facing well-deserved criticism for not necessarily having sound foundations and overstating the strength of the evidence (NRC, 2009). Attempts to attribute any attack to a person(s) or group should rely on acurate and credible results. The interpretation of such results might seriously impact the course or focus of an investigation, thus affecting the liberties of individuals or even being used as a justification for a government’s military response to an attack or threat of an attack. Therefore, the methods for collection, extraction, and analysis of mi- crobial evidence that could generate key results need to be as scientifically robust as possible, so the methods can be high performing and the results defensible for decision makers and to the legal, international government, law enforcement, and scientific communities, as well as scrutiny by the media. Validation Is Essential Validation is frequently used to connote confidence in a test or process, but it may be better thought of as defining the limitation of a method, process, or as- say (Budowle et al., 2003, 2006, 2008). It still is common for the term validation

OCR for page 117
APPENDIX A 127 to be used vaguely or to remain undefined when applied to process performance evaluation. The degree of validation varies from nominal to rigorous. The con- sequences of such varied requirements can be catastrophic if methods used in microbial forensic investigations are poorly constructed, under-developed, or generate results that are difficult to interpret. The validation process needs to be defined as to what is expected to be achieved by a validation study. Validation determines the limits of a test. It does not mean that a test must be 100 percent accurate or have no cross-reactivity, false-positive results, or false-negative results to be considered useful. It is often thought of as a process applied to the analytical portion of a system. This concept is only partly correct. The limits that the methods can provide must be demonstrated and documented for all steps of the process to include sample collection, preservation, extraction, analytical characterization, and data interpretation. Furthermore, it is recognized that as new technologies and capabilities are developed to address the needs of the microbial forensics community, key principles and performance parameters including accuracy, precision, bias, reliability, sensitivity, and robustness will need to be determined. Robust quality assurance and data control systems are required to achieve confidence in results by diverse users of the information. It is imperative that both technical and interpretation limitations (and thus ac- curacy and error) be defined. Additionally, a key resource for microbial forensic research, validation, and analysis is access to well-defined and curated microbial collections and data sets that are as comprehensive as is possible to the task. This effort includes the structure, content, and quality of the data sets. While some collections have been started for use in research, or created for case-specific use, no comprehensive repository exists to support microbial forensics, and standards are not codified for meta-data and data curation. The implications of highly technical data, epidemiological data, traditional evidence data, and investigative or intelligence information are complex and need to be appreciated for their strengths and limitations. Because scientific data can affect the decision-making process for retaliation, preemptive actions, and/or courtroom deliberations, it is imperative that those directly involved in microbial forensics or those who may use the results for investigative lead value or more direct associations be properly educated (or at least properly apprised) of the implications of such data. To meet this necessary goal, education and training are critical to disseminate the principles, development, and applications of the evolv- ing field of microbial forensics. Educational strategies and programs need to be constructed and training programs developed on the varied scientific foundations that support microbial forensics. If validation processes are not defined and not followed and proper train- ing or communication is not provided, then it is possible that a false sense of confidence may be associated with a poor method or process or from a result of limited significance. There are myriad methods, processes, targets, platforms, and applications. Yet some basic requirements transcend individual differences in methods, and these can be reinforced by contextual description (Table A1-1).

OCR for page 117
368 THE SCIENCE AND APPLICATIONS OF MICROBIAL GENOMICS TABLE A15-4  Illumina Sequences with Remote Homologies to Astroviruses Virus Number of sequences Astrovirus from dog feces 14 Bat astrovirus 47 Bottlenose dolphin astrovirus 1 California sea lion astrovirus 4 Human-mink astrovirus 3 Human astrovirus 59 Rat astrovirus 17 Sheep astrovirus 2 Swine astrovirus 14 Turkey astrovirus 1 The top alignment from tblastx, excluding MLB1 and MLB2, is reported if the alignment was to an astrovirus. that one of the rhinoviruses we detected was most similar to the recently discov- ered group C rhinovirus QPM (McErlean et al., 2007). Do Viral Sequences Correlate with Fever Without a Source? To compare afebrile children and children with UF, the number of sequence reads was normalized to 3 million per sample. After the adjustment, samples from children with UF had 1.5- to 5-fold more viral sequences than samples from afebrile children in NP and plasma samples, respectively (Figure A15-5A). Although sequencing is not strictly quantitative, the number of sequences gen- erated was inversely correlated with Ct values from the real-time PCR assays TABLE A15-5  Genome Coverage Sequences Input incorporated Smallest Largest Genome Genome Contigs sequences into contigs contig contig size Coverage Human 5 2733 100 nt 1886 nt 5299 nt 92.6% bocavirus Respiratory 24 2588 102 nt 1882 nt 15,191 nt 58.4% syncitial virus Human 2 7159 798 nt 5962 nt 6948 nt* 94.5% rhinovirus QPM Human 10 189 105 nt 803 nt 15, 462 nt 14.2% parainfluenza virus *Full genome sequence not available. Largest Genbank sequence used.

OCR for page 117
APPENDIX A 369 FIGURE A15-5  Febrile children have more viral sequences from a greater range of viruses than do afebrile children. The number of sequences was scaled to 3 million per sample before comparisons were made between groups. (A) The average numbers of viral sequences found in plasma and NP samples from the subjects are represented by gray bars for samples from afebrile children and black bars for samples from febrile children. The percentage of samples in each group for which 0, 1, 2, 3, 4, or 5 viruses was detected is plotted for (B) sequencing data and (C) PCR data.

OCR for page 117
370 THE SCIENCE AND APPLICATIONS OF MICROBIAL GENOMICS (Figure S2), which suggests the number of virus reads correlates with the amount of viral genomic material is present. More than one viral genus was found in some samples, and samples from children with fever had a greater number of viruses compared to those from afebrile children (Figure A15-5B). No plasma sample from an afebrile child had more than 1 viral genus detected, compared to 2 to 5 genera detected in 61% of the plasma samples from febrile children (Fig- ure A15-5B). The difference in the percentages of samples with multiple viruses was not as striking in NP samples from febrile and afebrile children, although only 12% of samples from afebrile children had 2 or more viruses compared with 26% of samples from febrile children (Figure A15-5B). In all groups, sequenc- ing detected multiple viral genera in a larger proportion of the samples than did directed PCR assays (Figure A15-5C). More plasma samples from febrile children were positive for viral sequences than were samples from afebrile children. Anellovirus sequences were the only group found in the plasma from afebrile children. They were found in 80% of all plasma samples, and there was no significant difference between the presence of anellovirus sequences in febrile and afebrile children (P = 0.2837, Fisher’s Exact Test). The presence of anelloviruses is not surprising, as they infect the majority of children by 1 year of age and establish chronic infections that are detectable in the blood of healthy individuals (Breitbart and Rohwer, 2005; Ninomiya et al., 2008; Vasilyev et al., 2009). By removing the ubiquitous anellovirus sequences from the analysis the difference between the febrile and afebrile groups became even more striking (Figure A15-6A). Most viruses were detected in only a few samples, so differences between the febrile and afebrile groups were not statisti- cally significant for individual viruses in this limited sample set. However, the enterovirus and roseolovirus sequences were more likely to be found in the febrile subjects than the afebrile subjects (Figure A15-6A), consistent with their roles as pathogens that can cause fever. Viral sequences were also detected more commonly in NP samples from febrile children compared with those from afebrile children (Figure A15-6B). Again, anellovirus sequences were ubiquitous. Enterovirus sequences were found in similar proportions in samples from febrile and afebrile children. Excluding the ubiquitous anelloviruses and enteroviruses, the less common viral sequences were detected more frequently in the febrile subjects compared to the afebrile subjects (Figure A15-6B). Specifically, adenovirus and parechovirus were more commonly associated with NP samples from febrile children (Figure A15-6B). These data indicate that viruses are more commonly associated with samples from febrile children and suggest that viruses are the cause of many fevers in young children for which a source is not determined. Sequences from febrile children revealed a greater range of viral genera compared to sequences from afebrile children. The difference was most striking in plasma, with sequences from 9 genera found as a result of screening the 23 samples from febrile children and 1 genus found as a result of screening the 22

OCR for page 117
FIGURE A15-6  Prevalence of viruses in samples from febrile compared with afebrile children. The total number of reads per sample was scaled to 3 million to make samples more comparable, and all counts of ≥1 virus sequence were reported. The percent virus-positive samples are graphed for plasma and NP samples, respectively (A and B). P-values were determined using Fisher’s exact test. Heatmaps representing the number of virus reads for each virus detected (x-axis) and each sample evaluated (y-axis) are presented for plasma and NP samples, re- 371 spectively (C and D). The light yellow area is 0 reads, with more intense red representing larger numbers of reads.

OCR for page 117
372 THE SCIENCE AND APPLICATIONS OF MICROBIAL GENOMICS samples from afebrile children (Figure A15-6C). In NP samples, sequences from 14 genera were detected by screening the 50 samples from subjects with UF com- pared with 10 genera detected by screening the 81 samples from afebrile subjects (Figure A15-6D). These data indicate that fever is likely associated with a broad range of viruses, and further studies with larger sample sizes may be important for elucidating the roles of particular viruses in febrile illness. Discussion Although it has long been suspected that virus infection is the cause of many unexplained fevers in children under 3 years old, this is the first comprehensive analysis of viruses in samples from children with UF and controls using deep se- quencing. We show that more viral sequences from a greater diversity of viruses are found in plasma and NP samples from children with UF than in corresponding samples from afebrile children, which supports the idea that viruses are the cause of many of these unexplained fevers. Children with UF are frequently hospital- ized or treated with antibiotics without a positive test for a bacterial infection. The evidence we provide indicates that viruses are commonly associated with UF, and further studies should be done to confirm and elaborate on their role in this clinical syndrome. Ultimately, it would be helpful to identify specific clini- cal features or tests that could aid diagnosis of virus infection to improve the treatment of children with UF and minimize the unnecessary use of antibiotics. As expected, the virome of the nasopharynx, which is directly exposed to the environment, is much more complex than the virome detected in plasma. Some viruses found in NP swabs were detected in both febrile and afebrile children. Of particular interest are the Enterovirus sequences, which include rhinoviruses that are known to cause colds. The presence of an enterovirus or rhinovirus in an NP sample from a child with fever would likely lead a physician to conclude that the enterovirus or rhinovirus was the cause of the fever, but we show that Enteroviruses are equally prevalent in the NP samples of afebrile children. These data suggest that in a microbial habitat that is exposed to the environment, the presence of a known pathogen should be interpreted with caution. These data also suggest that we are exposed to a number of known pathogens without showing symptoms of infection, either because the presence of the virus is transient or the particular virus species or strain does not cause symptoms. These observations indicate the importance of future experiments to evaluate the microbiome of the airways over time to look for indicators that a viral infection will become symp- tomatic, such as correlation of symptoms with specific viral subtypes, correlation with specific biomarkers, or shifts in the larger microbial community structure. The detection of viruses in the plasma has different implications than in NP samples. Plasma is not generally exposed to the environment, so the presence of a known viral pathogen in the plasma is most likely the result of a disseminated infection. While this study was not designed to determine causation of fever, the

OCR for page 117
APPENDIX A 373 complete absence of known viral pathogens in the plasma of afebrile subjects suggests the viral pathogens detected in the plasma of febrile subjects were the sources of their fevers. While it is more invasive to collect blood than other samples, these data suggest blood samples may provide clearer assessment of viruses that are directly associated with disease in contrast to NP samples where viral pathogens are detected in asymptomatic individuals. Additional studies will need to be done to confirm these ideas. Other viruses, such as anelloviruses, are present chronically in the plasma of healthy people. It remains to be determined what kind of effects long-term exposure to these viruses has on the immune re- sponse and human health. This study could be expanded in several ways in order to better character- ize the role of viruses in UF, including detecting viruses in children in whom no viruses have been detected thus far. The first would be to include additional sample types, such as stool. The second would be deeper sequencing of samples, particularly plasma, in which the presence of virus sequences are most likely to be clinically significant. We confirmed that additional sequencing improved virus detection of low abundance virus sequences, and as sequencing costs decrease and analysis tools improve it may be practical to generate and analyze 10 times the number of sequences for each sample to enhance virus detection. It is notable that the use of the Illumina platform in this study enabled the detection of many rare virus sequences, which would likely have been missed using sequencing platforms that generate fewer sequencing reads per unit cost. The third way to improve the study would be further examination of existing sequence data for novel viruses, focusing especially on samples from febrile children with no pathogen detected. Virus discovery using high-throughput sequencing methods has been very productive in recent years (Briese et al., 2009; Felix et al., 2011; Finkbeiner et al., 2008, 2009; Holtz et al., 2008; Loh et al., 2009). While short-read Illumina sequencing has not been widely adapted for virus discovery in metagenomic samples to date, our findings suggest that this 100-base platform can be applied to virus discovery. For example, the sequences we obtained from the recently discovered astrovirus MLB2 and rhinovirus QPM would have allowed discovery of those viruses based on alignment to other more remotely related reference ge- nomes. In addition, the depth of sequencing gained using the Illumina platform gives the advantage of detecting more virus sequences compared to the 454 plat- form, which could be advantageous by allowing alignment over different parts of a reference genome, some of which may be more conserved, and by generating enough sequences to enable longer, contiguous sequences to be assembled for further analysis. An important outcome of this study is to show that deep, Illumina-based sequencing has at least two advantages over targeted, PCR-based assays for the assessment of viruses in clinical samples. First, sequencing does not require prior knowledge of which viruses might be in the sample, thus allowing the detection

OCR for page 117
374 THE SCIENCE AND APPLICATIONS OF MICROBIAL GENOMICS of unexpected and novel viruses. Second, sequencing can often provide informa- tion such as virus subtype or sequence variation from reference genomes, which adds detail to our understanding of the viruses present. Our study illustrates both of these advantages. First, we identified viruses that would not have been rou- tinely queried by PCR assays for known pathogens. For example, we detected the astrovirus MLB2 in plasma and NP samples from a febrile child, which were subsequently confirmed by PCR in both samples ([Holtz et al., 2011] and data not shown). Because no other cause of the fever has been detected, these data suggest MLB2 is the cause of this subject’s fever and further examination of the role of this virus in pediatric fever is warranted. The second advantage of sequencing, the ability to determine virus subtype or sequence variation from reference genomes, is also evident in our study. For example, we were able to identify specific types or subtypes or strains of rhino- virus and bocavirus. Notably, this can often be accomplished without sequenc- ing most of the viral genome. In the case of HHV-6, all of the positive plasma samples were determined to be serotype 6B, even though 4 of the 8 samples had fewer than 15 HHV-6 sequences. We were also able to make distinctions between anellovirus species TTV, TTMDV, and TTMV with as little as one read. In future studies we will examine how different virus species and subtypes correlate with clinical symptoms. One challenge in analyzing the virome in metagenomic samples is the speed of alignment tools available. Aligners designed for large data sets with short sequences generally gain processing speed by sacrificing the ability to identify se- quences that differ more than slightly from the reference genome. Thus, many of these very fast aligners cannot be used effectively for analysis of virus sequences, which frequently differ considerably from their most closely related reference sequences. We are implementing new tools to be used for virome analysis that improve the speed of nucleotide and amino acid sequence alignments while re- taining most of the sensitivity, which will allow the efficient analysis of a greater number of sequences. A second challenge for virome analysis is the use of a more inclusive reference database (such as NCBI’s NT) because this would allow identification of more virus sequences based on sequence similarity; however, alignment results from a large database can be problematic for several reasons: (a) taxonomy can be irregular causing computational problems and (b) some of the viral entries contain sequences from the human genome or bacterial cloning vectors, which cause false positive alignments. We have addressed these problems in the present study by manually reviewing the data, but our goal is to develop an easily updated, semi-curated database that would minimize these problems. Future versions of this analysis protocol will be improved with faster alignment tools and improved databases. This study of deep sequencing of samples from febrile and afebrile children indicates that viruses are frequently detected in both groups, but with greater frequency and diversity in the samples from children with fever of unknown

OCR for page 117
APPENDIX A 375 cause. A causal role for these viruses would have important implications for the medical treatment of these children, since the children would not require antibiotic therapy. In evaluating viral causes of fever, sequencing appears to be advantageous in that it frequently reveals the presence of multiple viruses in a given sample, including unexpected viruses. Highly sensitive and specific PCR assays for a subset of viruses complement the sequencing analysis. As sequenc- ing continues to become less expensive and the speed of computational tools improves, it is possible that its sensitivity could match that of PCR. This could lead to a powerful diagnostic approach: rapid, unbiased sequence analysis of the microbiome in patient samples, which could identify potentially pathogenic viruses and other microbes, followed by confirmation of the results using highly targeted and extremely specific PCR assays. Methods Ethics Statement Samples were collected from human subjects using a protocol that was ap- proved by the Washington University Human Research Protection Office. Written informed consent was obtained from the parents or legal guardians of all subjects. Sample Collection The subjects included were febrile and afebrile children 2 to 36 months of age seen at St. Louis Children’s Hospital. The group of febrile children was com- prised of patients seen in the emergency room who had fever without an obvious source. In order to be included in the study, the physicians must have elected to obtain blood for testing. The afebrile group was comprised of children undergo- ing surgery. NP swabs and plasma samples were collected as described (Colvin et al., manuscript submitted). NP swabs were collected by inserting flocked swabs into the nasopharyngeal area, rotating the swab, and holding the swab in place for 10–15 seconds to increase specimen collection. Swabs were submerged in Uni- versal Transport Medium (Copan), and the shafts of the swabs were cut or broken off and discarded. The medium containing the swab was briefly vortexed, and then the swabs were removed without wringing out any absorbed medium. Tubes were centrifuged, and the supernatant was aliquotted and frozen at −70°C. Total nucleic acid was extracted using the Qiagen BioRobot M48 the Roche MagNA Pure automated extractor for NP and plasma samples, respectively. Sample Preparation and Sequencing For samples from afebrile and febrile children, sequencing libraries were prepared to look for DNA and RNA viruses. DNA and RNA were prepared as

OCR for page 117
376 THE SCIENCE AND APPLICATIONS OF MICROBIAL GENOMICS previously described (Wang et al., 2002, 2003). In brief, using total nucleic acid templates, RNA was primed with Primer A for reverse transcription. Sequenase DNA polymerase was used for second strand synthesis. DNA and RNA frag- ments were amplified with Primer B for 40 cycles. Samples were sequenced on the Roche 454 GS FLX Titanium or Illumina GAIIX. For samples in which ad- ditional sequencing reads were generated, the Illumina GAIIX or Illumina HiSeq 2000 was used (Figure S3). The SRR accession numbers for the sequence data are provided in Figure S4. Sequence Analysis A pipeline was developed for the analysis of large numbers of short sequence reads. This was adapted from that used for the analysis of 454 sequences, which used BLASTn and tBLASTx (Altschul et al., 1997) to align sequences to refer- ences in the NT database,59 followed by a manual review of the viral alignments. The details of the protocol for analysis of short reads follow. After removal of primer sequences, completely identical sequences were collapsed into a single representative sequence to minimize the number of sequences to be analyzed. Low complexity sequences were then masked using Dust (Morgulis et al., 2006). Sequences with greater than 20 N nucleotides (either from sequencing error or as a result of Dust) were removed. Human sequences were identified for removal by aligning sequences to the Genome Reference Consortium’s human build 36 60 including unplaced, human mitochondrial, and 5.8 s, 18 s, and 28 s rDNA se- quences using cross_match (Green, 1994) with the following alignment parame- ters: minscore 70, bandwidth 3, penalty –1, gap_init –1, gap_ext –1, masklevel 0. Non-human sequences were aligned to a metagenomic database consisting of all virus and phage sequences in NCBI NT plus full genomes from other microbes including bacteria, archaea, and small eukaryotes (Mitreva, et al., unpublished). Cross_match was used with the same parameters used for the human alignments. Any sequences that were unaligned using nucleotide alignment were then aligned to NR (ftp://ftp.ncbi.nlm.nih.gov/blast/db/) using WU-BLAST (BlastX) (Altschul et al., 1990) with the following parameters: filter seg, W 6, WINK 6, nogap. Sequences that aligned to microbial references using either cross_match or WU- BLAST were confirmed by WU-BLAST alignment to the larger NT database. Virus alignments were then manually evaluated, and ambiguous alignments were removed. The same protocol was used for the analysis of the 75-mer data, except a minscore of 50 was used in the cross_match alignments. Detailed sequence sta- tistics are presented in Figure S5. Figure S6 shows the number of virus sequences found with cross_match and BlastX, without scaling. 59   ftp://ftp.ncbi.nlm.nih.gov/blast/db 60   http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/data.shtml

OCR for page 117
APPENDIX A 377 Viral sequences were assembled into contigs using Tigra (Chen L and Wein- stock G, unpublished). Acknowledgments We thank David Wang for helpful discussion, Makedonka Mitreva, John Martin, Sahar Abubucker, and Karthik Kota for providing human reference and non-viral microbial reference databases, and Todd Wylie, John Martin, Eric Becker, and Matt Callaway for assistance with programming and parallelization of alignments. Author Contributions Conceived and designed the experiments: KMW KAM ES GMW GAS. Performed the experiments: KMW KAM. Analyzed the data: KMW KAM GAS. Wrote the paper: KMW GMW GAS. Contributed patient samples: GAS. References Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403–410. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI- BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389– 3402. doi: 10.1093/nar/25.17.3389. Angly FE, Felts B, Breitbart M, Salamon P, Edwards RA, et al. (2006) The marine viromes of four oceanic regions. PLoS Biol 4: e368. doi: 10.1371/journal.pbio.0040368. Baraff LJ (2000) Management of fever without source in infants and children. Ann Emerg Med 36: 602–614. Breitbart M, Rohwer F (2005) Method for discovering novel DNA viruses in blood using viral par- ticle selection and shotgun sequencing. Biotechniques 39: 729–736. doi: 10.2144/000112019. Briese T, Paweska JT, McMullan LK, Hutchison SK, Street C, et al. (2009) Genetic detection and characterization of Lujo virus, a new hemorrhagic fever-associated arenavirus from southern Africa. PLoS Pathog 5: e1000455. doi: 10.1371/journal.ppat.1000455. Felix MA, Ashe A, Piffaretti J, Wu G, Nuez I, et al. (2011) Natural and experimental infection of Caenorhabditis nematodes by novel viruses related to nodaviruses. PLoS Biol 9: e1000586. doi: 10.1371/journal.pbio.1000586. Finkbeiner SR, Allred AF, Tarr PI, Klein EJ, Kirkwood CD, et al. (2008) Metagenomic analysis of human diarrhea: viral detection and discovery. PLoS Pathog 4: e1000011. doi: 10.1371/journal. ppat.1000011. Finkbeiner SR, Holtz LR, Jiang Y, Rajendran P, Franz CJ, et al. (2009) Human stool contains a previ- ously unrecognized diversity of novel astroviruses. Virol J 6: 161. doi: 10.1186/1743-422X-6-161. Green P (1994) Cross_match. Unpublished manuscript. http://wwwphraporg. Holtz LR, Finkbeiner SR, Kirkwood CD, Wang D (2008) Identification of a novel picornavirus related to cosaviruses in a child with acute diarrhea. Virol J 5: 159. doi: 10.1186/1743-422X-5-159. Holtz LR, Wylie KM, Weinstock GM, Sodergren E, Jiang Y, et al. (2011) Astrovirus MLB2 Viremia in a Febrile Child. Emerg Inf Dis. (in press).

OCR for page 117
378 THE SCIENCE AND APPLICATIONS OF MICROBIAL GENOMICS Hormozdi DJ, Arens MQ, Le BM, Buller RS, Agapov E, et al. (2010) KI polyomavirus detected in re- spiratory tract specimens from patients in St. Louis, Missouri. Pediatr Infect Dis J 29: 329–333. doi: 10.1097/INF.0b013e3181c1795c. Krauss BS, Harakal T, Fleisher GR (1991) The spectrum and frequency of illness presenting to a pediatric emergency department. Pediatr Emerg Care 7: 67–71. doi: 10.1097/00006565-199104000-00001. Loh J, Zhao G, Presti RM, Holtz LR, Finkbeiner SR, et al. (2009) Detection of novel sequences related to african Swine Fever virus in human serum and sewage. J Virol 83: 13019–13025. doi: 10.1128/JVI.00638-09. McErlean P, Shackelton LA, Lambert SB, Nissen MD, Sloots TP, et al. (2007) Characterisation of a newly identified human rhinovirus, HRV-QPM, discovered in infants with bronchiolitis. J Clin Virol 39: 67–75. doi: 10.1016/j.jcv.2007.03.012. Morgulis A, Gertz EM, Schaffer AA, Agarwala R (2006) A fast and symmetric DUST implementa- tion to mask low-complexity DNA sequences. J Comput Biol 13: 1028–1040. doi: 10.1089/ cmb.2006.13.1028. Nakamura S, Yang CS, Sakon N, Ueda M, Tougan T, et al. (2009) Direct metagenomic detection of viral pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach. PLoS One 4: e4219. doi: 10.1371/journal.pone.0004219. Ninomiya M, Takahashi M, Nishizawa T, Shimosegawa T, Okamoto H (2008) Development of PCR assays with nested primers specific for differential detection of three human anelloviruses and early acquisition of dual or triple infection during infancy. J Clin Microbiol 46: 507–514. doi: 10.1128/JCM.01703-07. Reyes A, Haynes M, Hanson N, Angly FE, Heath AC, et al. (2010) Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466: 334–338. doi: 10.1038/nature09199. Rudinsky SL, Carstairs KL, Reardon JM, Simon LV, Riffenburgh RH, et al. (2009) Serious bacterial infections in febrile infants in the post-pneumococcal conjugate vaccine era. Acad Emerg Med 16: 585–590. doi: 10.1111/j.1553-2712.2009.00444.x. Sumino KC, Walter MJ, Mikols CL, Thompson SA, Gaudreault-Keener M, et al. (2010) Detection of respiratory viruses and the associated chemokine responses in serious acute respiratory illness. Thorax 65: 639–644. doi: 10.1136/thx.2009.132480. Vasilyev EV, Trofimov DY, Tonevitsky AG, Ilinsky VV, Korostin DO, et al. (2009) Torque Teno Virus (TTV) distribution in healthy Russian population. Virol J 6: 134. doi: 10.1186/1743-422X-6-134. Victoria JG, Kapoor A, Li L, Blinkova O, Slikas B, et al. (2009) Metagenomic analyses of viruses in stool samples from children with acute flaccid paralysis. J Virol 83: 4642–4651. doi: 10.1128/ JVI.02301-08. Waddle E, Jhaveri R (2009) Outcomes of febrile children without localising signs after pneumococcal conjugate vaccine. Arch Dis Child 94: 144–147. doi: 10.1136/adc.2007.130583. Wang D, Coscoy L, Zylberberg M, Avila PC, Boushey HA, et al. (2002) Microarray-based detection and genotyping of viral pathogens. Proc Natl Acad Sci U S A 99: 15687–15692. doi: 10.1073/ pnas.242579699. Wang D, Urisman A, Liu YT, Springer M, Ksiazek TG, et al. (2003) Viral discovery and sequence recovery using DNA microarrays. PLoS Biol 1: E2. doi: 10.1371/journal.pbio.0000002. Watt K, Waddle E, Jhaveri R (2010) Changing epidemiology of serious bacterial infections in febrile infants without localizing signs. PLoS One 5: e12448. doi: 10.1371/journal.pone.0012448. Wilkinson M, Bulloch B, Smith M (2009) Prevalence of occult bacteremia in children aged 3 to 36 months presenting to the emergency department with fever in the postpneumococcal conjugate vaccine era. Acad Emerg Med 16: 220–225. doi: 10.1111/j.1553-2712.2008.00328.x. Willner D, Furlan M, Haynes M, Schmieder R, Angly FE, et al. (2009) Metagenomic analysis of respiratory tract DNA viral communities in cystic fibrosis and non-cystic fibrosis individuals. PLoS One 4: e7370. doi: 10.1371/journal.pone.0007370. Willner D, Furlan M, Schmieder R, Grasis JA, Pride DT, et al. (2011) Metagenomic detection of phage-encoded platelet-binding factors in the human oral cavity. Proc Natl Acad Sci U S A 108: 4547–4553. doi: 10.1073/pnas.1000089107.