National Academies Press: OpenBook

The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet (2007)

Chapter: 7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"

« Previous: 6 The Institutional Landscape for Metagenomics: New Science, New Challenges
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

7
A Balanced Portfolio: Multi-Scale Projects in the “Global Metagenomics Initiative”

THE VISION

The opportunity afforded by metagenomics to study microbial communities in their natural state represents a vast frontier. Given the intense competition for science funding, some priority-setting is necessary to ensure that the most possible value is gained from early metagenomics investments. The diversity of habitats on Earth, the complexity of microbial communities, and the myriad functions governed by microbes suggest that highly productive metagenomics research will be possible in decentralized, small-project settings. However, no individual researcher is likely to have the capability and resources to achieve a comprehensive characterization of a complex microbial community. Therefore, there is also a substantial need for medium-sized, collaborative projects that involve multiple investigators. Small- and medium-sized projects are familiar to funding agencies and the scientific community in the form of single-investigator grants (National Institutes of Health [NIH] R01s, for example) and interdisciplinary collaborations (National Science Foundation [NSF] and the US Department of Agriculture [USDA] Microbial Observatories; NSF’s Long Term Ecological Research [LTER], Frontiers in Integrative Biological Research [FIBR] and National Ecological Observatory Network [NEON] programs; the US Department of Energy’s [DOE] GTL program, for example). Both mechanisms of funding are tested and proven effective in advancing new fields of science. The mixture of single- and multi-investigator projects maximizes the diversity of scientific approaches, assures that many avenues of research are pursued simultaneously, presents an opportunity to study many

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

habitats, and engages a broad community, thereby utilizing the creativity of many investigators. All these benefits are essential for the advancement of the field.

Metagenomics, however, differs from much of the science that precedes it in its complexity, multidisciplinarity, and in the magnitude of its unknowns. Its very nature departs from each of the fields—microbiology, ecology, and genomics—that fuse to form this new science. Consequently, metagenomics presents a number of conceptual and technical obstacles that limit the productivity of all metagenomics researchers (detailed in Chapters 4 and 5). The committee believes that the needs of the metagenomics field are not entirely met by current funding mechanisms, and the most efficient way to boost the effectiveness of the field overall is to augment small- and medium-sized projects with a small number of large-scale projects.

The Global Metagenomics Initiative is envisioned to capture all three types of projects—small-, medium-, and large-scale. Familiar mechanisms are available for the first two, so this chapter will detail the characteristics of the large-scale projects; issues that should be considered in evaluating proposals for small and medium-sized projects have been discussed in the previous chapters, as have infrastructural needs that affect metagenomics research at all scales (the need for, software development, database curation, and access to sequencing capacity, for example).

Much as the Human Genome Project drove advances in methods and technology, the large-scale projects will lead the development of broad principles and new technologies and methods that are more easily conceived and validated in the context of a multidimensional and highly replicated study than in traditional single-investigator projects. The large-scale projects will also offer special opportunities for public outreach and training of a new generation of scientists. There is excellent precedent in the genomics field to suggest that large-scale projects provide benefits far beyond the data gathered. Providing a community data resource was the initial motivation, but the Human Genome Project and other model-organism genome projects have also spurred technological advances and inspired the development of new tools, common standards, and shared software resources. This chapter will argue that the potential value of large-scale metagenomics projects is substantial.

CHARACTERISTICS OF SUCCESSFUL LARGE-SCALE PROJECTS

A recent Institute of Medicine-National Research Council report examining large-scale projects in biomedical science set forth the following reasons for undertaking a large-scale project (Nass et al. 2003):

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
  • “A major intent of such projects is to enable the progress of smaller projects.”

  • “Large-scale collaborative projects may also complement smaller projects by achieving an important, complex goal that could not be accomplished through the traditional model of single-investigator, small-scale research.”

  • “The objective of a large-scale project should be to produce a public good—an end project that is valuable for society and is useful to many or all investigators in the field.”

  • “Unconventional large-scale projects take advantage of economies of scale to produce relatively standardized data on entire classes or categories of biological questions … they may reveal novel areas of research for follow-up by smaller science projects, and they also provide essential tools and databases for subsequent research.”

The committee believes that, if carefully chosen and planned, large-scale metagenomics projects will have all of these characteristics.

WHY METAGENOMICS NEEDS A “BIG SCIENCE” COMPONENT

Metagenomics has great promise, but is challenged by the extreme complexity of microbial communities, by the lack of sufficient data on many aspects of microbial communities (such as diversity and conservation of structure or function across geographic location) to support valid generalizations and, because of these factors, by the lack of unifying ecological principles that enable predictive modeling. Put simply, it is hard to derive general principles from very few specific cases.

Table 7-1 lists a number of challenges, each of which would require substantial investment to address in depth. The knowledge needed can be obtained best in concerted, multi-investigator efforts. Although many individual-investigator-led and small-group collaborations in metagenomics have been successful, none has been able to generate sufficient data to allow comprehensive understanding of a complex microbial community or to invest the time and effort needed for the development of new tools and methods. For example, the assembly of individual genomes from metagenomic sequence information has been achieved only in the acid mine drainage project. A large-scale project could bring to bear a multipronged attack on the challenge of assembly in a complex community: redundant, deep sequencing; whole-genome sequencing of numerous community members as scaffolds; cell-sorting and single-cell analysis techniques; and analytical tool development and conceptual advances. The progress made would be available to individual researchers applying metagenomics in a plethora of environments.

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

TABLE 7-1 Challenges Facing Metagenomics

Challenge

Questions to Be Answered

Possible Strategies

Complexity and unknown structure of microbial communities

How much sampling is enough?

What is a representative sample?

Sample a complex community to completion, that is until few or no new species are collected with further sampling

Develop new mathematical models that can predict species richness and community structure so that the representativeness of samples can be evaluated

Methodological biases

What taxonomic groups are not accessed with the methods used?

Apply multiple methods to the same samples to assess the biases of each

What habitats are not accessible with current technology?

Systematically survey diverse habitats and assess access to microbes and their DNA

Improved correlation of phylogenetic analysis and community function

What roles does each taxon play in community structure and function?

Can generalizations be made about these roles?

Do communities always have definable functions?

Develop mathematical tools to establish associations between phylogeny and function

Develop ecological methods to remove specific community members and study the effects on structure and function

Explore broader definitions of function

Habitat variation and conservation

On what scale should habitats be studied?

What are the limits of habitats?

In what ways is an example of a habitat representative of other examples of the same habitat?

Are there core characteristics associated with every member of a type of habitat (that is, is there a set of traits required to live in soil)?

Do all human guts share a core community?

Which is more highly conserved, the taxa making up a community, or the community function? Or is there coconservation?

Conduct a worldwide sampling of many habitats of the same type and compare exhaustive descriptions of membership and function

Develop statistical methods that identify similarities at both taxonomic and functional levels

Compare variability between similar communities at different sites and in the same site at different times

Implement clustering methods for enumeration and identification of community types and representative diagnostic taxa

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

Challenge

Questions to Be Answered

Possible Strategies

Metagenome assembly

What are the rules of metagenome assembly?

What “binning” techniques are most useful in assigning sequences to taxa?

How much assembly is necessary to make sense of a community?

How can microdiversity (many similar genomes at one site) be handled?

Reassemble numerous metagenomes of various levels of complexity and extract common features and principles to construct a method and a set of rules for assembly

Select a few communities whose metagenomes have been assembled and study their structure and function in sufficient detail to determine how much and in which ways assembly contributes to ecological understanding (what organism is doing what)?

Functional analysis

Are there rules that guide the choice of expression system for function-based analysis?

Are there ways to increase the probability of finding a particular function (such as choice of habitat or expression system)?

Conduct global studies to correlate frequency of expression of particular characteristics with analysis parameters

Develop new gene expression systems for all phyla of bacteria and archaea

Map functional diversity to community type and map both to phylogenetic diversity

Correlate functions with extensive physical and biological metadata

Similarly, a large-scale project could advance the coupling of large sequencing databases with functional analysis. Massive sequencing has been conducted on samples from the Sargasso Sea and the Global Ocean Survey, but the metagenomic libraries from these environments have not been subjected to functional-expression assays. Conversely, a number of functional-expression studies have been conducted on soils for which there is not a rich base of sequence information. A global project might tackle one of these habitats from many angles—sequencing, functional-expression analysis, genome reassembly, deep phylogenetic analysis, hybridization-based screening, and much more. A large-scale project could involve investigators in many disciplines such as genomicists, statisticians, geneticists, physicians, and sociologists. The genomicists would have different expertise: one might be an expert in the habitat itself who could establish the strategy for collecting relevant metadata, another might be skilled in handling sequence

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

data, and another might be experienced in functional screening. Working together, such a team could make substantial progress in understanding the functional potential of the genetic repertoire of a microbial community, predicting function from sequence and developing new tools for functional screening and database mining and management. The resulting rich store of new knowledge would greatly improve the yield of information from smaller studies.

WHAT KIND OF LARGE-SCALE PROJECTS IN THE GLOBAL METAGENOMICS INITIATIVE AND HOW MANY?

Careful consideration must be given to the choice of projects for the large-scale portion of the Global Metagenomics Initiative. The challenges posed by metagenomics depend on the habitat being studied. No large-scale project would be able to address all the challenges. In broad terms, there are three types of habitats on Earth: unmanaged landscape and aquatic environments (such as seawater, soil, and sediments), managed ecosystems with a directed function (such as sewage treatment, bioremediation, and bioreaction), and host-associated habitats (such as the human gut, plant roots, and insect symbionts). Because the scientific knowledge and practical benefits to be gained differ among environments, the committee believes that three very different communities should be chosen for in-depth analysis.

Sampling challenges differ among the habitats because the sources of variability are different. The challenges associated with DNA extraction also differ. Host DNA is the most important contaminant in host-associated communities, whereas tannins, humic acids, polysaccharides, and other compounds are the dominant contaminants in environmental samples. Different organism genomes will be needed as scaffolds to facilitate assembly and for functional and evolutionary interpretations. To some degree, statistical methods will apply to all habitats, but the differences in community membership, size, structure, and complexity create different needs for analysis. Perhaps the most important difference in studies of diverse habitats is the type of metadata needed to make sense of genomic sequence data. A global effort is needed to develop standards of and methods for gathering metadata. In the human gut, for example, the host’s diet, genotype, and age will probably be critical; in an environmental sample, global positioning, meteorological, chemical, and physical data are likely to be needed. Information about habitat will also often need to include historical trends in these variables. Interoperable but separate model community databases would be the most efficient framework in which to develop the specific tools necessary to analyze data from the different environments and thereby maximize the utility of the data. Consequently, the committee

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

believes that the greatest gains would ensue from including one example of each of the three types of habitats in the Global Metagenomics Initiative’s large-scale projects.

EXPECTED BENEFITS OF LARGE-SCALE METAGENOMICS PROJECTS

The large-scale projects will bring benefits to the field that cannot be achieved with small-scale research. The benefits can be described, broadly, as contributing to ecological theory and principles, understanding of specific habitats and functions, technical advancement of the field, and international collaboration and training.

Theory and Principles

Large-scale projects that engage researchers in many locations and disciplines could reveal the principles of microbial community ecology through intensive studies. For example, whereas a small-scale project might aim to study the distribution of cellulases in the rumen, a large-scale study might attempt to provide a nearly complete inventory of the members of the rumen, assemble some of the members’ genomes, identify cellulases and other traits important to that community’s function and the animal’s feed efficiency, and assess the variation of all these characteristics among many animals and perhaps among ruminant species.

Some community behaviors will be peculiar to each community, but some will be governed by universal principles that can be derived by studying a few communities in great detail. Once those principles are derived, they can be tested with more focused experiments in small-scale studies to assess the degree to which they can be generalized. The proposal to create large-scale projects in the Global Metagenomics Initiative is driven in part by the need for these principles. Just as studies of different microbial communities face different technical challenges, they also raise different theoretical issues:

  • Study of a community in a natural environment would act as “proof of concept” for using metagenomics to understand the interaction between microbial communities and geochemical processes, eventually helping to understand change in global elemental cycles.

  • Study of a host-associated community would probe the interaction between a microbial community and the physiology and health of its host.

  • Study of a managed-environment community would seek to understand the effects of environmental change or human activity on microbial

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

communities and would have the potential to develop enough understanding to manage or mitigate environmental damage or maximize efficiency and sustainability of a bioreactor.

Each large-scale project would provide a comprehensive dataset about a particular habitat or function that could be the basis for building general theories and principles. The teams leading the large projects would need to communicate often because comparison among the three kinds of habitats could further illuminate global principles about the microbial world.

Understanding Specific Habitats

The committee anticipates that the large-scale projects will focus on habitats whose study has obvious and immediate benefits to society. In addition to contributing to broad theory, the large-scale projects would result in a comprehensive understanding of critical habitats at many levels. Full genome sequencing of organisms from a wide variety of phylogenetic groups represented in the three habitats should be an early focus of the large-scale projects; the resulting genomes would be an important resource for researchers in small and medium-sized projects. The chosen habitats should be of clear interest to the general public, and frequent public updates should be an integral part of each project. The funding agencies should encourage the development of strong outreach programs to the communities where the studies are being conducted. Due to the decentralized nature of the Global Metagenomics Initiative and its projects’ geographic diversity, this would have a broad impact on the public’s understanding of metagenomics and microbiology generally and would present an opportunity to train a new generation of scientists skilled in outreach and communication of science to the public.

Technical Advancement of the Field

Large-scale projects would unite scientists of multiple disciplines around the study of a particular habitat. These multidisciplinary groups would have the resources to develop new technical approaches useful to all metagenomics studies. The projects would also serve as incubators and evaluators of novel technologies, more precise and automated measures of conditions, and community databases and would equip smaller-scale projects with the knowledge to design efficient sampling schemes, make informed choices about habitats to study, and identify fruitful strategies for identifying specific functions.

The large-scale projects would offer an incomparable opportunity to lead the development of standards for data acquisition, management, and

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

release. Few projects can focus on scientific questions while evaluating sampling methods, experimental design, and data analysis. Such an integration of biology and evaluation of the outcomes of various approaches would be a central mission of the large-scale projects.

The size of the large-scale projects would provide economies of scale for “omic” analyses and the development of computational tools and provide guidance for future movement toward or away from centralized facilities for sequencing and data analysis. Furthermore, the large-scale projects would provide an interdisciplinary community to lead novel downstream metagenomic analyses, perhaps including uses for structural biology, high-throughput “omics,” new modeling of the evolutionary history of the early biosphere, and assessment of the current patterns and rate of evolutionary change. No doubt, metagenomic data will yield major approaches and questions that we cannot envisage today; these breakthroughs are best stimulated by large-scale projects.

International Collaboration and Training

The large-scale projects would require and enable collaboration and coordination that are difficult to achieve with single-investigator projects. Because they would be international and involve many investigators, they will require carefully considered and executed management plans and funding dedicated to fostering communication and promoting successful collaboration through scientific discourse. The large-scale projects would provide a unique setting for training a new community of young scientists who are skilled in collaboration and the execution of large-scale science. The nature of modern biology necessitates that at least some students have the skills to provide future leadership to international and multi-investigator projects as these become more prominent in biological research.

Thus, the large-scale projects would provide the intellectual environment and resources for the training of a new cadre of scientists to populate the field of metagenomics. In training, just as in research, the field would benefit from a healthy balance of large-scale and small-scale projects.

LEARNING FROM PREVIOUS LARGE-SCALE GENOMICS PROJECTS

Several collaborative research projects comparable with the proposed global metagenomics projects yielded important transformative science, such as the human and Arabidopsis genome sequencing efforts. An examination of the history of these projects reveals factors that proved to be crucial to their success.

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

The Human Genome Project

The Human Genome Project (HGP) provides an excellent window into the processes and pitfalls of “big science.” The HGP required the collaborative management of a large-scale, international, interdisciplinary research project involving input from several independent research teams. Two critical lessons of this highly visible, highly successful effort can be noted. First, there was a clear goal for the collaborative project that all collaborators could embrace—the sequencing of the whole genome. The goal was:

  • Specific in stating what would be done (sequence the human genome).

  • Publicly understandable in terms of the benefit to society (human health).

  • Time-bounded (within 15 years).

  • Finite and with a specific associated cost estimate ($200 million per year for 15 years; $3 billion total).

  • Wild and audacious (the goal was substantially beyond the technology that existed when the project was proposed).

Several intermediate end points were set, allowing the public and policy-makers to monitor progress of the project—such as completion of the physical map (two key maps in 1992 and 1994), completion of individual chromosomes (1999 and 2000), a draft genome (2001)—and then the “final” genome (2003). Effort could proceed in parallel at organizations around the world that contributed to the overall effort. Common data standards helped to enable this, and the discrete nature of chromosomes helped to organize the effort. Rapid data release and globally available databases ensured open sharing of information.

Second, the HGP devoted substantial resources to consensus building and coordination. It was an international collaboration involving 20 groups and funding from the United States, the United Kingdom, Japan, France, Germany, and China. Sequence data were contributed by many centers. The direction of the HGP was set by the major funders—the National Institutes of Health (NIH), the US Department of Energy (DOE) and the Wellcome Trust. They established mechanisms to assist with the coordination of research, in particular to avoid unnecessary competition or duplication of effort, and to coordinate research with parallel studies in model organisms; to coordinate and facilitate the exchange of data and biomaterials; and to encourage public debate and provide information and advice on the scientific, ethical, social, legal, and commercial implications of sequencing the human genome. The methods used by the funders to achieve collaboration and coordination included open “Bermuda” meetings, periodic inter-

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

national meetings and regular telephone conferences; rapid and unrestricted data release (all genomic sequence data were made publicly available without restriction within 24 hours of assembly); and data integration using a common software platform.

The Arabidopsis Genome Project

The A. thaliana genome sequencing project provides a slightly different perspective on how to establish and maintain such extensive, international collaborative research efforts. The Multinational Coordinated Arabidopsis thaliana Genome Research Project was conceived in 1990 by a small group of investigators who believed that such a project would have a profound enabling effect on the field of plant biology.

The Arabidopsis Genome Initiative (AGI) was formed to establish standards for sequencing accuracy and guidelines for data release, and to allocate workloads for each participant. The AGI was made up of representatives of the six research groups involved in sequencing the A. thaliana genome and played a key role in the oversight of the project. The group communicated regularly to deal with issues as they arose. Some members of the AGI lobbied for immediate data release as was the practice in the HGP, but there was considerable disagreement among the participants and their funders on this point, and public availability of data ranged from deposition of a draft sequence in GenBank within 24 hours of its generation to data release only when a sequence was finished and annotated. Data release was a subject of continuing discussion throughout the project, and the participants finally agreed to disagree about it. The AGI also played an important role in the final stages of the project when it became necessary to reallocate genome regions to centers that had finished their initial assignments ahead of schedule. This helped to ensure a steady flow of data and in part contributed to the completion of the project nearly 4 years ahead of schedule.

The project benefited from the additional oversight provided by the Science Steering Committee, composed of members of the Arabidopsis research community and representatives of some sequencing centers. A US Steering Committee was also established to facilitate interactions between the participating US laboratories, to serve as an additional link to the international efforts, to provide guidance on database issues, and to generate annual progress reports to the Arabidopsis community.

One of the things that set the A. thaliana genome project apart from other sequencing projects was that the scientists on the steering committees, not the representatives of the funding agencies, were empowered to make decisions on the overall management of the project. Agency representatives

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

were, however, invited to all the meetings as observers and helped to ensure that the sequencing groups met their obligations.

Another aspect of the project that required coordination was annotation and data analysis. Although each of the sequencing groups was involved in annotating its regions of the genome, different methods were used to generate the information. It was decided that the ultimate goal was to provide the scientific community with a unified set of genome annotations. Through the implementation of open communication and clear procedures, a plan for a joint annotation effort between the Institute for Genomic Research and the Munich Information Center for Protein Sequences was established.

LESSONS FOR METAGENOMICS

The HGP and AGI provide valuable lessons for implementing a successful Global Metagenomics Initiative. Both projects benefited from having a clear goal, broadly accepted scientific and public benefits, and continuing coordination and communication among scientists and funding agencies.

To succeed, the large-scale projects in the Global Metagenomics Initiative would need to replicate these qualities. The initial challenges will be to develop a consensus around the choice of three microbial communities for in-depth study, to set clear goals, and to map out a program that establishes priorities and intermediate milestones. It will be important to identify model communities whose understanding would be of immediate and obvious public benefit. The human microbiome (its health implications) or ocean microbes (their role in the global carbon cycle) are two examples. Fully characterizing these communities is as daunting a task as decoding the human genome appeared to be 25 years ago. Consensus-building, planning, and staging will be necessary.

A PRELIMINARY ROAD MAP

Phase I:
Choosing Model Communities

The first challenge that the scientific community needs to meet is to develop a consensus around the desirability of launching a few large-scale metagenomics projects and then to delineate the principles for selecting and recommending model communities. The choices in metagenomics are more daunting than choosing which model organisms to sequence. The broad categories of microbial communities are quite diverse: natural environments from ocean to soil to extreme habitats, such human-made environments as bioreactors of many types, and a vast array of host-associated microbial communities, from insect symbionts to the human microbiome. Each of these categories of environments offers different opportunities and chal-

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

lenges for metagenomics study, and thus it would probably be desirable to draw an in-depth, large-scale project from each. The broad scope of potential metagenomics study projects shows that metagenomics has much to offer in furtherance of the missions of many funding agencies, including the NSF, NIH, DOE, and the USDA. The choice of habitats to explore should be the product of discussions among members of the scientific community in a process that we recommend be initiated and coordinated by the Microbe Project Working Group. One way to enable such efforts is to support cross-disciplinary, international workshops to debate the principles to apply in choosing model communities and to debate how to establish and maintain multidisciplinary and multinational research efforts.

Alternatively, a process could be modeled after the NIH director’s roadmap meetings in which five different groups of scientists with diverse perspectives each spent a day at NIH discussing topics for the NIH’s long-term planning, called the Roadmap Initiative. The consensus ideas were then posted on the web for public comment, comments were collected, and decisions on themes were made based on the collective input of the scientific community.

The large-scale projects in the Global Metagenomics Initiative will be chosen, in part, based on the rationale for the habitats of choice. Basing choice of the habitats on the following criteria will ensure that the desired outcomes of the Global Metagenomics Initiative are identified and satisfied by the mix of systems that are selected. Three large-scale model microbial-community projects will probably be appropriate.

In each of the three broad categories (natural, host-associated, and managed communities), a specific model community will need to be chosen. The following criteria characterize communities that will yield the most useful data:

  • A community in which there is some fundamental understanding of the major functions and roles of the microbes and in which there would be a distinct benefit in improving that understanding, such as the microbial community colonizing the human gut or oral cavity.

  • A community of moderate complexity that is well characterized by environmental or geological criteria and that can be systematically sampled over long durations, such as those found in seasonally variable, depth-stratified lakes, hypersaline ponds, and low-nutrient oceans.

  • A community whose members can be well characterized by current sequencing technologies so as to make it possible to address fundamental questions of how the community is organized and stabilized, that is, of an appropriate level of complexity and where eukaryotes play a minimal role in community dynamics.

  • A community whose variation based on physical/geo/chemical

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

characteristics can be resolved by reproducible sampling, such as a soil community colonizing a winter wheat crop.

  • A community to which a particular treatment can be applied so that factors that shape community structure and function can be tested, such as intestinal tracts, bioreactors, streams, or soils.

Model metagenomics communities should be chosen to leverage past knowledge and current research. Several funding agencies target long-term investigation of particular communities or environments, including NSF’s LTER Network,1 NSF’s and USDA’s Microbial Observatories,2 and NSF’s NEON.3

Phase I would include one or more workshops to develop a consensus on at least three and perhaps up to 10 communities as possible objects of a large-scale project. The workshops would define a clear goal and end point for each project and elucidate the expected public benefit of achieving the goals. Examples of possible projects are listed in Table 7-2, but the committee emphasizes that these are not prescriptive; the choice of projects would be best determined with a great deal of community input.

Phase II:
Planning and Initial Data-Gathering

Once the communities have been chosen, Phase II would begin with a peer-reviewed, competitive process wherein groups of scientists submit interdisciplinary proposals for planning projects. The proposals would be evaluated according to the criteria presented in this report and any further criteria developed in the Phase I workshops. Planning proposals would be expected not only to address scientific issues but to outline project management, including coordination, milestones, oversight, data management and release, intellectual property, training, and public outreach.

Project-planning awards will support a year of meetings of the international group to hone hypotheses, approaches, and methods and to support the gathering of the baseline information needed to pursue the chosen hypotheses. Baseline information might include low-depth sequence of the habitat, phylogenetic analysis of the community, an assessment of variation among samples or locations, complete genome sequencing of culturable members of the community, or development of hybridization arrays, expression systems, or high-throughput assays. Establishment of a strong bioinformatics team would also take place during the 1-year planning

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

TABLE 7-2 The Global Metagenomics Initiative: Examples of Large-Scale Projects

Habitat

Approaches

Knowledge Derived

Microbial community associated with the human body

Sample the intestinal microbiota of people in many locations who consume diverse diets and have diverse genetics and lifestyles

Characterize the metagenome of the community with phylogenetic techniques, functional analysis, and massive sequencing

Determine whether there is a “core metagenome” of the human gut, a core community that is found in every person

Describe the extent of variation in communities at various location in the human gut and between individuals

Develop correlations between physiological conditions (health and disease, diet, and lifestyle) and microbial community structure and function in the human gut

Microbial community in an unmanaged habitat (such as soil or seawater)

Conduct extensive sampling over space and time

Conduct an extensive analysis of 16S rRNA sequences (>200,000)

Produce extensive sequence information about the metagenome in soils under different regimes

Conduct a function-based analysis of the metagenome of soil under each regime

Establish relationships between seasonal and daily cycles, structure, and function of microbial community

Determine how environmental change affects substrate use, polymer degradation, and secondary metabolite production

Describe the distribution of characteristics in the community

Microbial community associated with managed ecosystems that perform a service (such as bioremediation or sludge processing)

Conduct an extensive analysis of 16S rRNA sequences

Construct extensive metagenomic communities from habitat in various locations; characterize with sequencing and functional analysis

Sample over time, including when the community is fully operative and when it functionally collapses

Establish relationships between particular functions or members of the communities and community persistence and collapse

Identify organisms, traits, or chemical conditions that prevent or reverse community collapse

Develop and implement interventions for experiments and improved performance

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

phase, and the team would define the tools needed to test hypotheses (for example, the association of phylogeny and function, integration of medical records and genomic information, searches for rare motifs, or pattern-recognition algorithms).

Phase III:
Implementation

Phase III proposals would be submitted after the planning period, when sufficient baseline information and preliminary development were achieved. Phase III proposals would present a strategy that included evidence that all the methods needed are in hand, that the variation is known so that sampling strategies can be developed, and that the experimental design is carefully matched to the questions asked. Deep sequencing in a habitat would occur during Phase III, as would site-to-site comparisons, testing of hypotheses that are central to developing principles of microbial ecology, and potential new downstream uses of the metagenomics data in later years. The project Web site would provide up-to-date information about the project and direct viewers to the sequences and metadata that have been released. Phase III projects should be designed for a 10-year period with periodic review to achieve the larger-scale goals.

CONCLUSION

Undertaking three model, large-scale metagenomics projects in which the chosen environments can be characterized at great depth from a variety of perspectives would profoundly advance the field. No investigator can bring to bear all the different approaches that will be necessary to begin to understand the complex physical, chemical, genetic, metabolic, and environmental interactions that are taking place in even a moderately complex microbial community. The insights derived and the tools developed by a large, multidisciplinary group would be immediately useful to the wider community of investigators. If the model communities are carefully chosen, such large-scale projects would have obvious, major societal benefits. The Human Genome Project captured society’s imagination with the promise of a deeper understanding of the basis of human health. Well-chosen metagenomics initiatives could similarly inspire with the promise of understanding of the microbial communities that contribute not only to human health but to the health of the biosphere (see Box 7-1).

The projects outlined in Table 7-2 would furnish the statistical and biological power to support conclusions that cannot be drawn from smaller-scale projects that lack enough breadth of sampling to be representative or

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×

BOX 7-1

Key Outcomes of Large-Scale Projects in the Global Metagenomics Initiative

  • Broad principles and unifying theory for microbial-community ecology.

  • Large-scale, intensive studies of important habitats or questions.

  • Methods of broad applicability to metagenomics research, both basic and applied.

  • Massive contributions to metagenomics databases.

  • Standards for data acquisition, management, and release in the field of metagenomics.

  • Lessons about economies of scale in metagenomics research.

  • International cooperation and collaboration.

  • Training for young people in the conduct and management of international, collaborative “big science” projects.

  • An opportunity to share science with the public and train graduate students to do so effectively.

enough depth of analysis to assess the variation within and between sites with precision.

The large-scale projects would be “virtual” centers. They would include scientists at many locations in the world to maximize the scientific diversity of the project team. Communication would be achieved by frequent meetings in person and by videoconference or other technology that becomes available during the course of the projects. The projects would probably need to be sustained for 10 years; so changes in personnel and participating institutions during a project’s lifetime would be expected.

Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 107
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 108
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 109
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 110
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 111
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 112
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 113
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 114
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 115
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 116
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 117
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 118
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 119
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 120
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 121
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 122
Suggested Citation:"7 A Balanced Portfolio: Multi-Scale Projects in the "Global Metagenomics Initiative"." National Research Council. 2007. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington, DC: The National Academies Press. doi: 10.17226/11902.
×
Page 123
Next: 8 Recommendations »
The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet Get This Book
×
Buy Paperback | $54.00 Buy Ebook | $43.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Although we can't usually see them, microbes are essential for every part of human life—indeed all life on Earth. The emerging field of metagenomics offers a new way of exploring the microbial world that will transform modern microbiology and lead to practical applications in medicine, agriculture, alternative energy, environmental remediation, and many others areas. Metagenomics allows researchers to look at the genomes of all of the microbes in an environment at once, providing a "meta" view of the whole microbial community and the complex interactions within it. It's a quantum leap beyond traditional research techniques that rely on studying—one at a time—the few microbes that can be grown in the laboratory. At the request of the National Science Foundation, five Institutes of the National Institutes of Health, and the Department of Energy, the National Research Council organized a committee to address the current state of metagenomics and identify obstacles current researchers are facing in order to determine how to best support the field and encourage its success. The New Science of Metagenomics recommends the establishment of a "Global Metagenomics Initiative" comprising a small number of large-scale metagenomics projects as well as many medium- and small-scale projects to advance the technology and develop the standard practices needed to advance the field. The report also addresses database needs, methodological challenges, and the importance of interdisciplinary collaboration in supporting this new field.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!