The scope of metagenomics is vast. Defining the metagenomic characteristics of microbial communities in the biosphere is a critical first step in understanding their contributions to the health of the planet, their roles in the well-being of humans, and the environmental consequences of human activities. Because so little is known about microbial communities, the potential for discovery is great in any habitat chosen for study. The committee identified eight potential opportunities in different application areas that can be addressed with metagenomics:
Earth Sciences: The development of genome-based microbial ecosystem models to describe and predict global environmental processes, change, and sustainability.
Life Sciences: The advancement of new theory and predictive capabilities in community-based microbial biology, ecology, and evolution.
Biomedical Sciences: The definition, on a global scale, of the contributions of the human microbiome to health and disease in individuals and populations and the development of novel treatments based on this knowledge.
Energy: The development of microbial systems and processes for new bioenergy resources that will be more economical, environmentally sustainable, and resilient in the face of disruption by world events.
Environmental Remediation: The development of tools for monitoring environmental damage at all levels (from climate change to leaking gas-storage tanks) and microbially based (green) methods for restoring the health of an ecosystem.
Biotechnology: The identification and exploitation of the biosynthetic and biocatalytic capacities of microbial communities to generate beneficial industrial, food, and health products (pharmaceuticals, antibiotics, and probiotics).
Agriculture: The development of more effective and comprehensive methods for early detection of threats to food production (crop and animal diseases) and food safety (monitoring and early detection of dangerous microbial contaminants) and the development of management practices that maximize the benefit from microbial communities in and around domestic plants and animals.
Biodefense and Microbial Forensics: the development of more effective vaccines and therapeutics against potential bioterror agents, the deployment of genomic biosensors to monitor microbial ecosystems for known and potential pathogens, and the ability to precisely identify and characterize microbes that have played a role in war, terrorism, and crime events, thus contributing to discovering the source of the microbes and the party responsible for their use.
Meeting these challenges will require progress on several fronts. Technological, methodological, computational, and conceptual advances will be needed to develop the potential of metagenomics fully.
Furthermore, as microbiologists turn their attention to the study of microbes in their natural environments, it is likely that many of biology’s most basic organizing concepts will be affected by deeper understanding of life at the microbial level. Metagenomics will probably expand answers to questions like, What is a species?, What is the role of microbes in maintaining the health of their host?, How diverse is life?, and What ecological and evolutionary roles do viruses play? The metagenomics approach is uniquely well suited to gathering the information necessary to make progress on such basic conceptual questions.
The opportunity intrinsic to a new frontier of science is accompanied by new challenges that were not anticipated by prior research. Metagenomics is no exception. Current metagenomics researchers face several difficulties and obstacles. Early metagenomics studies have been able to survey the metagenomes of complex microbial communities, but have been able to characterize in depth only the simplest communities. Generating massive sequence databases is not the limiting step; using the databases to determine the complete genomes of community members and to understand a community’s metabolic capabilities or potential responses to environmental change is still beyond the field’s capabilities in even moderately complex
environments. A number of technological, methodological, and computational advances are needed for metagenomics to reach its full potential. Encouraged by the example of the human and other model organism genome projects, the committee believes that the best way to spur these advances is through a multi-scale approach that includes support for small, single-investigator projects; medium-sized, multi-investigator projects; and large-scale, multidisciplinary, multinational metagenomics projects.
The small-scale projects will ensure that creative contributions are solicited from a broad scientific community and engage many scientists in metagenomics. The medium-sized projects will provide centers of study that unite diverse techniques and disciplines to study numerous habitats encompassing diverse organisms, scientific questions, and technical challenges. The large-scale projects will characterize a few microbial habitats in great depth, using large multidisciplinary and multinational teams to address challenges in metagenomics that require massive datasets or highly diversified scientific approaches and engaging more investigators than would typically participate in a medium-sized center. The large-scale projects will cross national lines, facilitating study of many examples of a habitat worldwide, thereby generating sufficient data to develop generalizations about the communities that reside in that habitat.
The large-scale projects will also provide an excellent opportunity for young biologists to gain experience in participating in “big science” and global partnerships. And they offer unique opportunities for public outreach and stimulation of public interest in science because they will highlight the ability of metagenomics to explore a new biological frontier.
The medium-sized projects need to be funded through organizational models that recognize habitat differences, such as the National Science Foundation’s (NSF) Long Term Ecological Research (LTER) and National Ecological Observatory Network (NEON) programs. Similarly, the National Institutes of Health (NIH) recognizes very different human microbial habitats in humans, as does the US Department of Energy (DOE) in its different missions of bioenergy, carbon sequestration and bioremediation, and the US Department of Agriculture (USDA) in the different agricultural habitats of microbes. In addition to exploring a habitat in depth, medium- and large-scale projects would also likely develop different expertise, technology, and analytical methods to meet the challenges of their particular habitat type (e.g., one may take the lead in proteomics, another in chemical informatics, another in community signaling, and another in microhabitat sensors). In this way a suite of projects will provide more tools and knowledge to the metagenomics community than any single project could offer.
The committee recommends the establishment of a Global Metagenomics Initiative that includes a small number (perhaps three) of large-scale, comprehensive projects that use metagenomics to understand model microbial communities, a larger number of medium-sized projects, and many small projects. Large-scale projects will study microbial communities in great depth, exploring a habitat worldwide, with attention to variation, commonalities, and detailed characterization. Medium-sized projects will provide centers of excellence in metagenomics that can be somewhat more focused than the large-scale projects, but will include a multidisciplinary approach to the study of a community. The small-scale projects will be single-investigator initiated and will examine a slice of a community, a particular function in multiple communities, or a specific technical advance.
The communities chosen for the large-scale projects should have broad applicability and impact and represent a diversity of habitat types. The studies would establish methods, approaches, and conceptual insights that could be applied to ever more complex and dynamic systems. Large-scale projects would achieve a depth of analysis not possible with smaller-scale projects and provide a template for comprehensive system analysis. Large-scale projects would also provide a forum for developing and testing new experimental and analytical tools and for establishing standards of sampling and data quality. The large projects may also generate economies of scale, new mechanisms for data sharing or storage, and point to new models of collaboration among large research groups. Different communities will have different benefits, technical challenges, and conceptual frameworks. These differences necessitate studying more than one community in great depth, leading the committee to recommend that three large-scale projects be identified and developed.
To maximize the benefits and knowledge gained from the large-scale projects, they should represent a breadth of habitat type, including:
A community in a natural environment, to understand the interactions between microbial communities and geochemical processes or global nutrient cycles.
A host-associated community, to probe the interactions between a microbial community and the physiology and health of its host.
A “managed-environment” community, to learn to predict and manage the effects of environmental change or human activity on microbial communities.
The development of the large-scale projects should be carefully staged in three phases, as follows:
Phase I: Choosing the model communities. During Phase I, input should be solicited from a wide array of metagenomics and microbiology researchers to choose model communities of broad public interest with potential for immediate contribution to important environmental and public-health challenges. Clear goals or end points for each project should be defined during this phase. Phase I would conclude with a peer-reviewed competition for planning grants to be awarded to multidisciplinary, international teams. The committee anticipates that at least three model communities would be needed to cover the range of microbial community types, but Phase I may identify more than three projects with sufficient merit to proceed to Phase II.
Phase II: Planning. Each successful team would gather preliminary data and develop roadmaps for the completion of its project, including establishment of a data management and analysis group, development and testing of necessary methods and technologies, and launch of a Web site to provide access to data and analysis tools and to support public outreach.
Phase III: Implementation. Intensive sequencing, functional analysis, proteomics, and many other approaches would be applied to model communities that successfully complete Phase II.
The metagenomics approach is of potential value in fulfilling the missions of many federal agencies, including NSF, NIH, DOE, and others. Support for individual projects specifically tied to each agency’s mission has been and will continue to be productive, but communication and coordination across the interested agencies would be extremely useful. In particular, developing a consensus around which model communities to include in a Global Metagenomics Initiative should include the scientific constituencies of all these agencies. Scientific societies also can play a critical role in ensuring broad participation.
The committee recommends that an interagency working group like the Microbe Project take responsibility for ensuring open communication about the metagenomics portfolios of relevant agencies and for facilitating the organization of workshops and meetings to bring together metagenomics researchers who are working on different types of communities. The involvement of scientific societies is strongly encouraged. The Microbe
Project would be an appropriate forum for planning and promoting a Global Metagenomics Initiative.
Metagenomics will draw on expertise from people in many disciplines:
Those with knowledge of microbiology, including microbial genetics, biochemistry, physiology, pathology, systematics, ecology, and evolution.
Other biologists, including molecular and cell biologists and those with knowledge of host organisms, including humans and other mammals, plants, insects, and other microbial hosts that have important roles in nature or that are of economic importance.
Those with knowledge of the environment, including soil and atmospheric scientists, geologists, oceanographers, hydrologists, and agriculture and ecosystem scientists.
Those who stimulate microbial communities to achieve specific end points, including biological, chemical, and environmental engineers.
Computational scientists, including those with knowledge of statistics, computer science, data mining and visualization, database development, modeling, and applied mathematics.
Those with expertise in scaling information to large ecosystem parameters, and in evaluating the impact of global change and its interface with policy.
Engineers, physical scientists, and chemists whose skills and insights are potentially field-transforming in their contribution to new methods, chemistries, devices and applications (within and beyond metagenomics) and the understanding of complexity, networks, and system structure.
The value of integrating experts from such a wide array of fields into metagenomics projects is very high. Both they and metagenomics researchers will require appropriate cross-disciplinary knowledge in order to gain the full benefit of their different expertise. To realize the potential of metagenomics, interdisciplinary projects will be necessary and they will be aided by new education and training programs.
The committee recommends establishing several types of training programs to encourage scientists to develop the skills needed for metagenomics research. The following mechanisms have been successful in providing cross-disciplinary training: interdisciplinary training to augment traditional
graduate programs, summer courses patterned after the Cold Spring Harbor or Marine Biological Laboratories summer courses, and postdoctoral programs in which fellows undertake training in a new discipline. Support for faculty to attend metagenomics workshops or to spend sabbaticals in metagenomics research laboratories or facilities would also be beneficial in expanding appropriate training.
The value of the Human Genome Project was multiplied because the data that it generated were rapidly made available in a public database. GenBank and its collaborators in Europe and Japan serve as repositories for nucleic acid sequence data. They ensure that the data are accessible to all and can be obtained from a single site. Similar accessibility would multiply the value of metagenomic data.
The analysis of metagenomic data will require the establishment of new databases in addition to the sequence archives. It is essential that the databases use common data standards and agree on the metadata that will describe metagenomic sequences. This will ensure that data can be exchanged between researchers. It will also facilitate comparative analyses of data and the development of software. Community databases like those established for the Drosophila and Arabidopsis genome projects are excellent models for the type of databases metagenomics will require.
Information from metagenomics studies will be fully exploited only if appropriate data management and analysis methods are in place. Furthermore, metadata—for example, data on sampling method, sample treatment, and precise description of the sampled habitat—are essential for the analysis of metagenomic sequence data. If metagenomic data are to be used to their fullest advantage, metadata infrastructure is urgently needed. No one metadata standard will be appropriate for all samples, which will come from extremely diverse environments, but there should be close collaboration and coordination among the communities of scientists developing metadata standards.
One major challenge faced by metagenomics databases compared with “conventional” genomics databases will be the demand for community input into the annotation process. Annotation is the process of assigning functional, positional, and species-of-origin information to the genes in a database. In conventional genomics, primary responsibility for annotating data falls on the authors. In metagenomics projects, in which annotations will change as additional data (or metadata) are collected by other groups, an annotation database must be able to accept and integrate both individual and large-scale (computational) annotations of metagenomic data continu-
ally. Furthermore, the sources of and methods for modified annotations should be transparent to database users.
The committee recommends the establishment of new databases for metagenomic data and the development of tools for the storage, analysis, and visualization of these data. Early attention should be given to the challenge of providing dynamic and traceable annotation in metagenomics databases. Also warranting high priority is the development of a consensus—in a process that includes the research communities and the database developers—on the metadata that need to be collected and on the data standards to be used. Maintenance and curation of metagenomics databases will greatly add to their value, but are expensive and will require consistent support. Funding for databases requires a different approach than that for research projects: the committee recommends the development of mechanisms for long-term funding, coupled with community oversight and evaluation.
The enormous amounts of data generated by metagenomics should be made available as rapidly as possible, and deposition into the international sequence archives should be required. Some projects, like those of the proposed Global Metagenomics Initiative, would be undertaken specifically to create a community resource and these should follow accepted standards, such as those of the Fort Lauderdale Agreement, in immediately releasing the data without constraints as to their use. Data from single-investigator projects should be released within a short time, for example, within 6 months of its collection.
The analysis of genomics data is absolutely dependent on computer software. In general, metagenomics projects will require an even higher percentage of funds for bioinformatics and statistical support than have genomics projects, or most other kinds of biological research. It is common for software developed for a particular project gradually to find wide use in the community. Providing a mechanism whereby such analytical tools that have proved their value to the community can be brought up to robust, engineered, documented form would be very worthwhile.
Funding agencies should consider the development of a mechanism for identifying analytical tools that are finding wide use in the community and for providing for their development up to robust standards.
Current metagenomics researchers face several difficulties, including inadequate characterization of many habitats and inadequate understanding of the scope and nature of variation in different microbial communities. Therefore, determining how best to sample and determining whether a sample is representative remain challenging. DNA extraction techniques to minimize contamination and to ensure that a community’s genome is adequately represented have yet to be optimized. And expression systems for functional metagenomics are not yet sufficiently robust and flexible to express most genes in most metagenomes.
The committee recommends investment in the following because improvements would enhance the productivity of many metagenomics projects: new or improved technologies for appropriate habitat sampling, macromolecule recovery, and habitat characterization, depending on habitat; new approaches to deal with the unevenness of population sizes in communities and to target populations of interest within complex communities; development of measures of community diversity to supplement 16S rRNA gene surveys, including arrays and additional phylogenetically informative genetic markers; and development of diverse host species and expression strategies for functional-expression analyses.
The more is known about microbes, the greater the value that metagenomic data will have. It is extremely important for basic microbiology research not to be neglected but to be strengthened and deepened. Advances in the culturing of currently unculturable bacteria and archaea, in sequencing of their genomes, and in genetic and physiological studies are key reference points for interpreting a community’s metagenome. Active discussion involving metagenomics researchers and members of other subdisciplines of microbiology and their representatives in funding agencies will help to guide the fields in complementary directions.
The committee recommends that funding agencies consider the potential contributions of basic microbiology research to progress in metagenomics as they evaluate their overall research portfolios.
Because metagenomics constitutes a revolutionary advance in the ability of scientists to study a previously invisible biological realm, results of metagenomics studies have great potential interest not only for scientists but also for the general public. Metagenomics presents an important opportunity to engage the public in the excitement and value of basic and applied scientific research. Outreach efforts will help to train a new generation of scientists who are skilled in communicating science to the public.
The committee recommends that education and public outreach be components of all metagenomics projects. Both large and small projects can be used as catalysts for teaching microbiology. Each large project should have a budget for developing materials that explain its scientific basis and implications in accessible and interesting ways. Metagenomics researchers should be encouraged to teach about their science in their local communities and metagenomics projects should include training scientists in effective outreach teaching.