Twenty years ago, the Human Genome Project, and the nascent genomic sciences more generally, were highly controversial. Many biologists thought that investing resources in such “molecular natural history” was economically wasteful and intellectually suspect. Now, practically all biologists are genomicists. If not directly pursuing genome sequencing and the other “omic” methods, biologists nevertheless often ground their particular genetic, biochemical, physiological, behavioral, or ecological studies in the work of someone who is. Genomics has been transformative in the deepest sense, not only answering many questions about how organisms function, develop, and evolve, but also driving a radical reformulation of the terms in which such questions are asked. Although initially many of us thought of genomics mostly as a more economical and efficient way (because of economies of scale) to recover and study the behavior of individual genes, in fact it has shifted focus to the collective and integrated activities of genes functioning together, to the networks of interactions between them, and to how these are integrated (and have evolved) in the highly complex and coordinated business of living and reproducing at the level of cells and organisms. As noted earlier, genomics and the associated high-throughput “omic” technologies targeting gene expression, protein synthesis (and modification), protein interactions and protein structure are all becoming experimental subdisciplines of a new concept-driven computational science called systems biology.
What then, will metagenomics have become, in 20 years? We believe that it too will be a concept-driven computational science with subdisciplines that have evolved from the fusion of “omic” approaches and more tradi-
tional disciplines, such as environmental and clinical microbiology, biogeochemistry, biological oceanography, soil sciences, and theoretical ecology. It will indeed be the systems biology of the most inclusive biological system we know about: the biosphere of the planet. These disciplines will in the process be transformed and many questions redefined and refocused, most often at a level below (genes and genomes) or above (communities and ecosystems) the organism and species levels at which microbial ecologists have traditionally concentrated their efforts. Although individual microbial cells will always be suitable units of study, the “species,” because we have just begun to uncover the enormous genomic diversity within it, may no longer be a reliable or useful ecological unit. Instead, we will understand ecosystems in terms of the collective activities and interactions of the genes they contain, how these are distributed and expressed in space and time, and how they function together.
We can expect, in 20 years, enormous advances on three fronts— technical, computational, and biological—as well as a host of specific applications.
POTENTIAL TECHNICAL ADVANCES
Sequencing technology will have reduced the per-base price of finished sequence to fractions of a cent, and the cost of sequence-data acquisition will no longer by a serious consideration in studies of specific ecosystems. Sequencing methods now in use will have increased run lengths substantially but will themselves probably have been replaced with even more direct, and often also cloning-independent, approaches, perhaps single-molecule technologies now under development or others yet to be imagined. Single cell genome sequencing will be routine, and cell-sorting methods that readily permit recovery of even unique individual cells will be well advanced. Complete genome sequences, some produced by “traditional” methods based on isolates (or single cells) but others acquired metagenomically, will number in the thousands, perhaps even tens of thousands. There will be many “species” for which hundreds of individual isolates will have been sequenced.
Transcriptomic and proteomic applications to community samples will be comparable in their reliability and efficiency with such methods as are used in human genomics today. Incremental improvements in microarray sensitivity, specificity, and reproducibility will make it possible to assess community membership and abundance down to the “species” level, however that concept is then understood. New normalization protocols will allow a census of even the rarest members of a community, and whole-community RNA amplification will access their transcriptomes. We will be able routinely to classify or type ecosystems and monitor changes in
their compositions and activities with arrays (and their future equivalents, which may be microfluidics-based) that are inexpensive and readily available commercially. Such monitoring will indeed be routine practice in many environment-based business and regulatory activities and in epidemiology. New “omic” methods and sciences will have been developed for characterizing communities and their genetic, physiological, biochemical, and biogeochemical activities.
Many currently unculturable organisms and consortia will have been “domesticated,” by using knowledge of their individual needs and potentialities as derived from community metagenomics. As we come to appreciate the true extent of diversity (even within designated species) we will know that even such facilitated pure-culture or defined-culture studies will never be adequate for global understanding, but will provide excellent models of physiological interactions and the refinement of computational models for such interactions.
POTENTIAL COMPUTATIONAL ADVANCES
In 20 years, infrastructural accommodations will have been made for the almost unimaginable amount of metagenomic data that will have accumulated. For reasons elaborated in Chapter 5, the metagenomics databases are expected to dwarf genomic databases, no matter the predicted rate of growth of the latter. Although all sequences and trace data (or their future technological equivalents) will be available through GenBank or comparable public repositories there will be specialized (but fully public and interoperable) databases of all sorts. It will be possible to answer questions like those sketched in Box 5-1 by direct queries to the databases, which will also be rich in associated metadata. Just as much biological research is now conducted by computer scientists, much microbial ecology will be purely computational. Indeed, these downstream activities may be the dominant form of metagenomics employment; but metabioinformaticians will need even broader interdisciplinary training and collaborative links—in geochemistry, oceanography, earth and atmospheric sciences, biochemistry, microbiology, ecology, genetics and genomics, statistics, and computer science.
Although traditional microbial classification practices (phenotypic characterization and identification at the level of species and genus) may remain useful, the basis on which we predict properties of isolates will be sequence-and computation-driven and probabilistic. Equally often, investigations of community activities of any magnitude (from the tiny but complex ecosystem of a termite’s gut to the Pacific Ocean) will be conducted at the level of genes and their interactions—understanding the “games being played,” with decreased emphasis on phylogenetic identification of the “players.”
POTENTIAL BIOLOGICAL ADVANCES
It is of course pure science fiction to predict what we will know about the biosphere 20 years from now and it is in the nature of a transformative science to be unpredictable. But it is of some value to guess at the kinds of breakthroughs in biological science that metagenomics will make possible.
There are many more viruses (and possibly more kinds of viruses) than there are cells (or kinds of cells). In many ecosystems, viruses are the principal regulators of organismal abundance and may well be the principal agents of genetic exchange between organisms. Their genomes collectively harbor a vast number of genes about which we know almost nothing and that can be exchanged between viruses and cells in a mix-and-match fashion. In 20 years, we hope to have some good idea of the depth of this enormous gene pool and (through comparative genomics, ab initio structural modeling, and extensive structural genomics) a vastly better understanding of what many of the genes do for their viral or cellular hosts and what they might do for us. We will understand and be able to monitor the exchange of information between viruses in the environment and those infecting us and the animals and plants that we use. Our ability to monitor and predict the emergence of viral diseases will be much enhanced.
Cells and Their Genes and Genomes
We will have come to an understanding of the diversity of gene content within species, of how many strain-specific genes are involved in strain-specific biology, and of how many are “just passing through.” We will have a vast inventory of gene sequences and, through structural genomics, a vast reservoir of genes with reasonably inferred functions even if the organisms of origin and the roles of the genes in their biology remain a mystery. We will be able to say whether adaptation to environmental change of any sort most often involves recruitment of preadapted lineages from elsewhere or cobbling together of novel lineages by exchange and assembly of genes already present.
We will have enough information on the diversity of environmental gene sequences to allow us to redefine the species concept to a more consistent, accurate, defensible, and enduring concept that will have broad value
across numerous disciplines and applications. We will have relegated so much of the task of identification of isolates and prediction of their properties to computers and sequence databases that it will be the predictions, not formal identification, that we care about. We will understand the various processes that might be termed “speciation” and have a good idea of their relative frequencies in nature. We will have redefined questions of diversity (“How many species are there in an environment or in the world?”) in terms of the sequences of genes and the composition of genomes.
We will have mapped an enormous number and diversity of genes and genome compositions in space and time and will be able to retrieve and reanalyze this information and associated physical, chemical, and biological metadata. We will have substantial gene-expression and metabolomic data on the same sites and can begin to look at Earth as though it were an organism-like spatiotemporally defined entity with an evolved and homeostasis-promoting global “metabolism.” Gene frequency and expression will make sense in that context even though Earth is not an organism. The question of whether “everything is everywhere” will be subsumed into this gene-level and genome-level analysis, which will be recast in terms of relative rates of divergence and dispersal of genes.
Community Structure and Function
Model-community projects undertaken in the next 5 years will have been completed and, in addition to a deep understanding of their target systems, will provide templates for other studies, smaller in scope but greater in number and ultimately interconnectable. Community structure will be understood and described (“profiled”) in terms more of gene presence and abundance than of species presence and abundance, and we will have developed a typology or catalog of communities that will allow us to infer what sort of biogeochemistry is happening at any place and time and to monitor changes. Such profiling is already done with ribosomal RNA and a few other markers, but comprehensive functional gene (and gene-function) assessment will be vastly more subtle and informative. One safe prediction is that such profiling will be extensively applied and prove of great value in disease diagnosis and determination of nutritional status of humans (individual and communities) and of animals and plants that they use or care about. Probiotic therapies and regimens will become evidence-based and increasingly valuable, as will microbiome profiling in the detection of diseases that originate in the host.
Interactions Within and Between Communities
Gene frequency and expression data will, in 2027, have long been the basis for constructing community “interactome” maps, comparable in character but vastly more complex than maps now used by systems biologists to study individual organisms and their responses to perturbations. The combinations of genes and organisms that influence community robustness will have been identified and predictive principles of community behavior will have been derived. The development and implementation of such analytical models will allow computational microbial ecologists to predict responses (at the level of gene frequency, expression, and exchange) to environmental challenges of all sorts. Testing such predictions will lead to better models. Such reiterative approaches are already used, but models based on all genes rather than a few diagnostic markers will have immensely more explanatory and predictive power. The ultimate goal, perhaps in sight by 2027, would be a metacommunity model that seeks to explain and predict (and retrodict) the behavior of the biosphere as though it were a single superorganism. Such a “genomics of Gaia” would be the ultimate implementation of systems biology. The enormous challenge that creation of such a metamodel represents is matched by its importance for the future of the human species.
POTENTIAL ADVANCES IN EDUCATION AND PUBLIC UNDERSTANDING
By 2027, we will have many more mechanisms for communication than we have now, but all will be usable to teach the public about microbes through the excitement and “big science” appeal of metagenomics. Microbiology will be required in the K-12 curriculum and as a prerequisite for teaching certification, and metagenomics centers across the United States will have developed robust mechanisms for communication with diverse people, including those who do not have access to a university. The mechanisms might include distance-education courses, mobile microbiology units, press releases about milestones in projects, hosting of teachers in research laboratories, and teaching by metagenomics scientists in K-12 classrooms. Graduate students will be trained to teach microbiology in the classroom and in the larger community.
SOME POTENTIAL SPECIFIC APPLICATIONS
We see metagenomics as a new basic science with many eminently useful (and in tomorrow’s world essential) applications, some accomplishable over the short term and probably most on the drawing board by 2027.
The biological forcing of elemental cycles is key to understanding a wide variety of Earth-system processes. Large-scale, ecosystemwide fluxes of energy and matter, however, are difficult to model accurately or to study in the laboratory. By 2027, Earth-system processes will have been examined in much greater detail with metagenomics coupled with other synoptic physicochemical and biological measurements. Microbial-community genomics will provide information important for understanding energy fluxes and biogeochemical mechanisms in the deep subsurface, modeling biologically mediated rock weathering and surface chemistry, and defining the key genetic and biogeochemical drivers of processes that influence greenhouse-gas production and consumption. The oceans, which harbor millions of microbes in each teaspoonful of seawater, will be modeled more fully as we become able to visualize the rich biological systems they encompass. In a practical sense, such processes as uranium immobilization or acid mine drainage cleanup, which involve coupled biological-geochemical interactions, will be enhanced and improved with new community-genomic datasets. Microbe-enabled oil recovery, subsurface methane production and consumption, and carbon storage and turnover are other critical interfaces between the microbial world and the Earth system. The new “whole-Earth catalog” of microbial genes and genomes provided by metagenomics will propel a new understanding and new technologies for more appropriate resource use and sustenance of the living Earth system. Predictive models of many vital biogeochemical processes will inform enlightened policy makers. We will be able to say, for instance, why it might or might not be a good idea to seed oceans with iron to increase carbon sequestration. Similarly, we will be able to model (and predict the extent of) methanogenesis in the permafrost as it thaws. Metagenomics-based environmental monitoring will be a thriving industry.
Through a fine-scale and nuanced understanding of genetic and ecological processes, we will demolish many generalizations about microbes, replacing them with particularized knowledge. We anticipate that many basic concepts that have vexed biologists for decades (sometimes centuries), a few of which were alluded to earlier in this epilogue, will be recast in molecular terms. Taxonomy, the science of identification and naming organisms according to their relationships, will be radically transformed. The enormous combined genomic and metagenomics databases will enable us to predict the behavior of an isolate, a consortium, or a complex community on the basis of carefully targeted sequence or other molecular information.
Metagenomic methodology and concepts will have expanded well beyond the realm of viruses, bacteria, and archaea, to embrace the population biology and biogeography of microbial eukaryotes (protists, algae, and fungi). Indeed, the new research methodology and paradigm will have found uses even for macroscopic organisms, when it is population or ecological processes that are of interest. And with a proper appreciation of the roles of microbes in the balance of life, a new global systems ecology embracing all species, including humans, will have been born. This will mandate changes in how we teach biology at all levels. The teaching of microbiology, ecology, and evolutionary biology will all be profoundly affected by metagenomics, bringing the focus of a generation of students back “down to the ground,” where problems can be directly addressed.
The full extent of interindividual diversity within the human microbiome will be understood, and changes in microbial-community composition that contribute to or are responsible for a number of acute and chronic diseases will have been elucidated. Microbiome-based diagnosis will be an essential component in treatment for many diseases. Preventive medicine will be a major component of health care and health industries with the development of rational probiotic therapy as a means of maintaining a “healthy” human microbiome. By understanding how the human microbiome differs in health and disease, physicians will be on a much better footing to understand and predict the incidence of chronic inflammatory and infectious diseases, both viral and microbial. Therapeutic interventions (in addition to probiotics) will be based on comprehensive knowledge of the effects of treatment (such as with antibiotics) on the microbiota as a whole. New antibiotics from currently unknown natural (and generally microbial) sources will have come on line, and new strategies (such as those described below) for forestalling the development and spread of antibiotic resistance will have been devised.
Microbial communities will continue to affect productivity in agriculture, both plant-based and animal-based. Metagenomics studies of gut populations in poultry, pigs, and other food animals will increase our knowledge of gut-microbe interactions, which will help to formulate more effective probiotic mixtures in the future. We expect a comparable impact on plant-based agriculture. The function of the crenarchaeotes and other microbes that colonize plant roots and their importance to carbon and nitrogen cycles will be better understood. We will understand how plants
and their beneficial microbial partners deal with antagonistic microbes. Lessons will have been learned from the food crops that have been successfully cultivated over the centuries. Using metagenomic approaches, we will exploit the interplay of microbes and plants more intelligently for human benefit.
Fossil fuels are a nonrenewable natural resource. It is projected that energy demand will increase by more than 50% by 2025 (US Department of Energy 2005). The US economy depends on oil imports, so there is an interest in augmenting domestic energy production. Corn serves as the major feedstock for ethanol production, and biofuel-producing companies are using specialist microbes to convert cornstarch to ethanol, a high-octane, environmentally friendly biofuel. Cellulosic ethanol—made from such agricultural wastes as corn fiber, corn stalks, and wheat straw and other biomass, such as switchgrass and miscanthus—uses as substrates products that are not usable by humans as food. Furthermore, cellulosic materials are inexpensive, renewable, and their efficient use will reduce the cost of ethanol production. Most of the known ethanol-producing microbes are incapable of using cellulose to produce ethanol, because they lack the enzymes required to break it down. In nature, however, several microbes are equipped with arrays of enzymes that act together to release glucose from cellulose. The glucose can then be fermented to ethanol. Metagenomics will enable discovery of new cellulosic enzymes and novel microbial strategies for hydrolysis of biomass. These discoveries will lead to engineering of enzyme complexes and novel pathways for enzymatic hydrolysis of cellulose and a concomitant increase in production of biofuels from cellulosic materials.
Metagenomics will shape bioremediation in many interrelated ways. First, vastly increased understanding of how microbes form “bucket brigades” for the degradation of xenobiotic compounds will allow us to distinguish contaminated sites in which the native microbiota is competent to restore environmental health from sites in which intervention in the form of in situ bioaugmentation or intensive ex situ treatment at special facilities is needed. Second, metagenomics will facilitate sensitive monitoring of remediation activities of either sort. Third, it will identify key microbial processes and keystone species and indicate how community composition could best be complemented. Fourth, it will lead to the isolation of specific strains or consortia that could be used for such complementation. Fifth, a host of novel enzymes that might be useful in cellfree treatments of specific
contaminants will be found. And sixth, where appropriate and permitted, the metagenomics database will provide a rich stock of genes for the construction of novel specialized strains for targeted use in bioeremediation.
The biotechnology industry already employs hundreds of microbial enzymes and related products, and the global industrial enzyme market is currently in excess of $2 billion per year, primarily in technical (including scientific, pulp and paper), food, and agriculture and feed applications. The great majority of such enzymes are the result of traditional approaches: enrichment, culture, isolation, and enzyme purification. Collectively, the metagenomics database and the effort, now in full swing, to express, crystallize, and characterize structurally and functionally entire proteomes of many model organisms are likely to enhance the rate of discovery of such valuable catalysts by at least an order of magnitude—a revolution in green chemistry. Ironically, some of the key products of such activities to date have vital applications in the discovery process itself. For instance, the polymerase chain reaction—which is the basis of modern molecular environmental microbiology, DNA forensics, and molecular diagnosis—is based on genes cloned from thermophilic bacteria and archaea.
Biodefense and Microbial Forensics
The same methods that will allow us to assess community composition and activity will enable construction of biosensors for biodefense and microbial forensics. In 2027, the threat of terrorist or criminal use of pathogenic organisms and their toxins against human populations or agricultural (plant and animal) targets may still be of concern. However, society’s ability to anticipate and respond to these threats will be markedly enhanced through the continued application of new technologies that will allow us to assess microbial community composition and activity in various environments. This will permit precise, rapid, and sensitive monitoring of air, water, and food supplies for potential biothreat agents with novel biosensors. We will be better able to identify the presence of a natural or engineered biothreat agent against a large natural microbial background, and we will be able to predict virulence properties and sensitivity to antiviral or antimicrobial drugs. Another anticipated outcome of research in biodefense will be a strong forensic capability to carry out attribution for acts of bioterrorism that use animal, plant, and foodborne pathogens and toxins. Such capability will provide the law-enforcement, intelligence, agriculture, public-health, and homeland-security communities with information to assist in identifying perpetrators of biocrimes and bioterrorism and to serve as a deterrence factor.