7
A Balanced Portfolio: Multi-Scale Projects in the “Global Metagenomics Initiative”

THE VISION

The opportunity afforded by metagenomics to study microbial communities in their natural state represents a vast frontier. Given the intense competition for science funding, some priority-setting is necessary to ensure that the most possible value is gained from early metagenomics investments. The diversity of habitats on Earth, the complexity of microbial communities, and the myriad functions governed by microbes suggest that highly productive metagenomics research will be possible in decentralized, small-project settings. However, no individual researcher is likely to have the capability and resources to achieve a comprehensive characterization of a complex microbial community. Therefore, there is also a substantial need for medium-sized, collaborative projects that involve multiple investigators. Small- and medium-sized projects are familiar to funding agencies and the scientific community in the form of single-investigator grants (National Institutes of Health [NIH] R01s, for example) and interdisciplinary collaborations (National Science Foundation [NSF] and the US Department of Agriculture [USDA] Microbial Observatories; NSF’s Long Term Ecological Research [LTER], Frontiers in Integrative Biological Research [FIBR] and National Ecological Observatory Network [NEON] programs; the US Department of Energy’s [DOE] GTL program, for example). Both mechanisms of funding are tested and proven effective in advancing new fields of science. The mixture of single- and multi-investigator projects maximizes the diversity of scientific approaches, assures that many avenues of research are pursued simultaneously, presents an opportunity to study many



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 107
7 A Balanced Portfolio: Multi-Scale Projects in the “Global Metagenomics Initiative” THE VISION The opportunity afforded by metagenomics to study microbial com- munities in their natural state represents a vast frontier. Given the intense competition for science funding, some priority-setting is necessary to ensure that the most possible value is gained from early metagenomics invest- ments. The diversity of habitats on Earth, the complexity of microbial communities, and the myriad functions governed by microbes suggest that highly productive metagenomics research will be possible in decentralized, small-project settings. However, no individual researcher is likely to have the capability and resources to achieve a comprehensive characterization of a complex microbial community. Therefore, there is also a substantial need for medium-sized, collaborative projects that involve multiple investigators. Small- and medium-sized projects are familiar to funding agencies and the scientific community in the form of single-investigator grants (National Institutes of Health [NIH] R01s, for example) and interdisciplinary col- laborations (National Science Foundation [NSF] and the US Department of Agriculture [USDA] Microbial Observatories; NSF’s Long Term Ecologi- cal Research [LTER], Frontiers in Integrative Biological Research [FIBR] and National Ecological Observatory Network [NEON] programs; the US Department of Energy’s [DOE] GTL program, for example). Both mecha- nisms of funding are tested and proven effective in advancing new fields of science. The mixture of single- and multi-investigator projects maxi- mizes the diversity of scientific approaches, assures that many avenues of research are pursued simultaneously, presents an opportunity to study many 0

OCR for page 107
0 THE NEW SCIENCE OF METAGENOMICS habitats, and engages a broad community, thereby utilizing the creativity of many investigators. All these benefits are essential for the advancement of the field. Metagenomics, however, differs from much of the science that pre- cedes it in its complexity, multidisciplinarity, and in the magnitude of its unknowns. Its very nature departs from each of the fields—microbiology, ecology, and genomics—that fuse to form this new science. Consequently, metagenomics presents a number of conceptual and technical obstacles that limit the productivity of all metagenomics researchers (detailed in Chapters 4 and 5). The committee believes that the needs of the metagenomics field are not entirely met by current funding mechanisms, and the most efficient way to boost the effectiveness of the field overall is to augment small- and medium-sized projects with a small number of large-scale projects. The Global Metagenomics Initiative is envisioned to capture all three types of projects—small-, medium-, and large-scale. Familiar mechanisms are available for the first two, so this chapter will detail the characteristics of the large-scale projects; issues that should be considered in evaluating proposals for small and medium-sized projects have been discussed in the previous chapters, as have infrastructural needs that affect metagenomics research at all scales (the need for, software development, database cura- tion, and access to sequencing capacity, for example). Much as the Human Genome Project drove advances in methods and technology, the large-scale projects will lead the development of broad prin- ciples and new technologies and methods that are more easily conceived and validated in the context of a multidimensional and highly replicated study than in traditional single-investigator projects. The large-scale projects will also offer special opportunities for public outreach and training of a new generation of scientists. There is excellent precedent in the genomics field to suggest that large-scale projects provide benefits far beyond the data gathered. Providing a community data resource was the initial motivation, but the Human Genome Project and other model-organism genome projects have also spurred technological advances and inspired the development of new tools, common standards, and shared software resources. This chapter will argue that the potential value of large-scale metagenomics projects is substantial. CHARACTERISTICS OF SUCCESSFUL LARGE-SCALE PROJECTS A recent Institute of Medicine-National Research Council report exam- ining large-scale projects in biomedical science set forth the following reasons for undertaking a large-scale project (Nass et al. 2003):

OCR for page 107
0 A BALANCED PORTFOLIO • “A major intent of such projects is to enable the progress of smaller projects.” • “Large-scale collaborative projects may also complement smaller projects by achieving an important, complex goal that could not be accom- plished through the traditional model of single-investigator, small-scale research.” • “The objective of a large-scale project should be to produce a pub- lic good—an end project that is valuable for society and is useful to many or all investigators in the field.” • “Unconventional large-scale projects take advantage of economies of scale to produce relatively standardized data on entire classes or catego- ries of biological questions . . . they may reveal novel areas of research for follow-up by smaller science projects, and they also provide essential tools and databases for subsequent research.” The committee believes that, if carefully chosen and planned, large- scale metagenomics projects will have all of these characteristics. WHY METAGENOMICS NEEDS A “BIG SCIENCE” COMPONENT Metagenomics has great promise, but is challenged by the extreme complexity of microbial communities, by the lack of sufficient data on many aspects of microbial communities (such as diversity and conservation of structure or function across geographic location) to support valid gener- alizations and, because of these factors, by the lack of unifying ecological principles that enable predictive modeling. Put simply, it is hard to derive general principles from very few specific cases. Table 7-1 lists a number of challenges, each of which would require substantial investment to address in depth. The knowledge needed can be obtained best in concerted, multi-investigator efforts. Although many individual-investigator-led and small-group collaborations in metagenomics have been successful, none has been able to generate sufficient data to allow comprehensive understanding of a complex microbial community or to invest the time and effort needed for the development of new tools and methods. For example, the assembly of individual genomes from metagenomic sequence information has been achieved only in the acid mine drainage project. A large-scale project could bring to bear a multipronged attack on the challenge of assembly in a complex community: redun- dant, deep sequencing; whole-genome sequencing of numerous community members as scaffolds; cell-sorting and single-cell analysis techniques; and analytical tool development and conceptual advances. The progress made would be available to individual researchers applying metagenomics in a plethora of environments.

OCR for page 107
0 THE NEW SCIENCE OF METAGENOMICS TABLE 7-1 Challenges Facing Metagenomics Challenge Questions to Be Answered Possible Strategies Complexity How much sampling is enough? Sample a complex community to and unknown completion, that is until few or structure of What is a representative sample? no new species are collected with microbial further sampling communities Develop new mathematical models that can predict species richness and community structure so that the representativeness of samples can be evaluated Methodological What taxonomic groups are not Apply multiple methods to the same biases accessed with the methods samples to assess the biases of used? each What habitats are not accessible Systematically survey diverse with current technology? habitats and assess access to microbes and their DNA Improved What roles does each taxon play Develop mathematical tools to correlation of in community structure and establish associations between phylogenetic function? phylogeny and function analysis and Can generalizations be made Develop ecological methods to community about these roles? remove specific community function Do communities always have members and study the effects definable functions? on structure and function Explore broader definitions of function Habitat On what scale should habitats Conduct a worldwide sampling variation and be studied? of many habitats of the same conservation What are the limits of habitats? type and compare exhaustive In what ways is an example of descriptions of membership and a habitat representative of function other examples of the same Develop statistical methods that habitat? identify similarities at both Are there core characteristics taxonomic and functional levels associated with every member Compare variability between of a type of habitat (that is, is similar communities at different there a set of traits required sites and in the same site at to live in soil)? different times Do all human guts share a core Implement clustering methods for community? enumeration and identification Which is more highly of community types and conserved, the taxa making representative diagnostic taxa up a community, or the community function? Or is there coconservation? continued

OCR for page 107
 A BALANCED PORTFOLIO TABLE 7-1 Continued Challenge Questions to Be Answered Possible Strategies Metagenome What are the rules of Reassemble numerous metagenomes assembly metagenome assembly? of various levels of complexity What “binning” techniques and extract common features are most useful in assigning and principles to construct a sequences to taxa? method and a set of rules for How much assembly is assembly necessary to make sense of a Select a few communities whose community? metagenomes have been How can microdiversity (many assembled and study their similar genomes at one site) structure and function in be handled? sufficient detail to determine how much and in which ways assembly contributes to ecological understanding (what organism is doing what)? Functional Are there rules that guide the Conduct global studies to correlate analysis choice of expression system frequency of expression of for function-based analysis? particular characteristics with Are there ways to increase analysis parameters the probability of finding Develop new gene expression a particular function (such systems for all phyla of bacteria as choice of habitat or and archaea expression system)? Map functional diversity to community type and map both to phylogenetic diversity Correlate functions with extensive physical and biological metadata Similarly, a large-scale project could advance the coupling of large sequencing databases with functional analysis. Massive sequencing has been conducted on samples from the Sargasso Sea and the Global Ocean Survey, but the metagenomic libraries from these environments have not been sub- jected to functional-expression assays. Conversely, a number of functional- expression studies have been conducted on soils for which there is not a rich base of sequence information. A global project might tackle one of these habitats from many angles—sequencing, functional-expression analysis, genome reassembly, deep phylogenetic analysis, hybridization-based screen- ing, and much more. A large-scale project could involve investigators in many disciplines such as genomicists, statisticians, geneticists, physicians, and sociologists. The genomicists would have different expertise: one might be an expert in the habitat itself who could establish the strategy for col- lecting relevant metadata, another might be skilled in handling sequence

OCR for page 107
 THE NEW SCIENCE OF METAGENOMICS data, and another might be experienced in functional screening. Working together, such a team could make substantial progress in understanding the functional potential of the genetic repertoire of a microbial community, predicting function from sequence and developing new tools for functional screening and database mining and management. The resulting rich store of new knowledge would greatly improve the yield of information from smaller studies. WHAT KIND OF LARGE-SCALE PROJECTS IN THE GLOBAL METAGENOMICS INITIATIVE AND HOW MANY? Careful consideration must be given to the choice of projects for the large-scale portion of the Global Metagenomics Initiative. The chal- lenges posed by metagenomics depend on the habitat being studied. No large-scale project would be able to address all the challenges. In broad terms, there are three types of habitats on Earth: unmanaged landscape and aquatic environments (such as seawater, soil, and sediments), man- aged ecosystems with a directed function (such as sewage treatment, bioremediation, and bioreaction), and host-associated habitats (such as the human gut, plant roots, and insect symbionts). Because the scientific knowledge and practical benefits to be gained differ among environments, the committee believes that three very different communities should be chosen for in-depth analysis. Sampling challenges differ among the habitats because the sources of variability are different. The challenges associated with DNA extraction also differ. Host DNA is the most important contaminant in host-associated communities, whereas tannins, humic acids, polysaccharides, and other compounds are the dominant contaminants in environmental samples. Different organism genomes will be needed as scaffolds to facilitate assem- bly and for functional and evolutionary interpretations. To some degree, statistical methods will apply to all habitats, but the differences in com- munity membership, size, structure, and complexity create different needs for analysis. Perhaps the most important difference in studies of diverse habitats is the type of metadata needed to make sense of genomic sequence data. A global effort is needed to develop standards of and methods for gathering metadata. In the human gut, for example, the host’s diet, geno- type, and age will probably be critical; in an environmental sample, global positioning, meteorological, chemical, and physical data are likely to be needed. Information about habitat will also often need to include histori- cal trends in these variables. Interoperable but separate model community databases would be the most efficient framework in which to develop the specific tools necessary to analyze data from the different environments and thereby maximize the utility of the data. Consequently, the committee

OCR for page 107
 A BALANCED PORTFOLIO believes that the greatest gains would ensue from including one example of each of the three types of habitats in the Global Metagenomics Initiative’s large-scale projects. EXPECTED BENEFITS OF LARGE-SCALE METAGENOMICS PROJECTS The large-scale projects will bring benefits to the field that cannot be achieved with small-scale research. The benefits can be described, broadly, as contributing to ecological theory and principles, understanding of specific habitats and functions, technical advancement of the field, and international collaboration and training. Theory and Principles Large-scale projects that engage researchers in many locations and disciplines could reveal the principles of microbial community ecology through intensive studies. For example, whereas a small-scale project might aim to study the distribution of cellulases in the rumen, a large-scale study might attempt to provide a nearly complete inventory of the members of the rumen, assemble some of the members’ genomes, identify cellulases and other traits important to that community’s function and the animal’s feed efficiency, and assess the variation of all these characteristics among many animals and perhaps among ruminant species. Some community behaviors will be peculiar to each community, but some will be governed by universal principles that can be derived by study- ing a few communities in great detail. Once those principles are derived, they can be tested with more focused experiments in small-scale studies to assess the degree to which they can be generalized. The proposal to create large-scale projects in the Global Metagenomics Initiative is driven in part by the need for these principles. Just as studies of different microbial communi- ties face different technical challenges, they also raise different theoretical issues: • Study of a community in a natural environment would act as “proof of concept” for using metagenomics to understand the interaction between microbial communities and geochemical processes, eventually helping to understand change in global elemental cycles. • Study of a host-associated community would probe the interac- tion between a microbial community and the physiology and health of its host. • Study of a managed-environment community would seek to under- stand the effects of environmental change or human activity on microbial

OCR for page 107
 THE NEW SCIENCE OF METAGENOMICS communities and would have the potential to develop enough understand- ing to manage or mitigate environmental damage or maximize efficiency and sustainability of a bioreactor. Each large-scale project would provide a comprehensive dataset about a particular habitat or function that could be the basis for building general theories and principles. The teams leading the large projects would need to communicate often because comparison among the three kinds of habitats could further illuminate global principles about the microbial world. Understanding Specific Habitats The committee anticipates that the large-scale projects will focus on habitats whose study has obvious and immediate benefits to society. In addition to contributing to broad theory, the large-scale projects would result in a comprehensive understanding of critical habitats at many levels. Full genome sequencing of organisms from a wide variety of phylogenetic groups represented in the three habitats should be an early focus of the large-scale projects; the resulting genomes would be an important resource for researchers in small and medium-sized projects. The chosen habitats should be of clear interest to the general public, and frequent public updates should be an integral part of each project. The funding agencies should encourage the development of strong outreach programs to the communi- ties where the studies are being conducted. Due to the decentralized nature of the Global Metagenomics Initiative and its projects’ geographic diversity, this would have a broad impact on the public’s understanding of meta- genomics and microbiology generally and would present an opportunity to train a new generation of scientists skilled in outreach and communication of science to the public. Technical Advancement of the Field Large-scale projects would unite scientists of multiple disciplines around the study of a particular habitat. These multidisciplinary groups would have the resources to develop new technical approaches useful to all metagenomics studies. The projects would also serve as incubators and evaluators of novel technologies, more precise and automated measures of conditions, and community databases and would equip smaller-scale projects with the knowledge to design efficient sampling schemes, make informed choices about habitats to study, and identify fruitful strategies for identifying specific functions. The large-scale projects would offer an incomparable opportunity to lead the development of standards for data acquisition, management, and

OCR for page 107
 A BALANCED PORTFOLIO release. Few projects can focus on scientific questions while evaluating sam- pling methods, experimental design, and data analysis. Such an integration of biology and evaluation of the outcomes of various approaches would be a central mission of the large-scale projects. The size of the large-scale projects would provide economies of scale for “omic” analyses and the development of computational tools and pro- vide guidance for future movement toward or away from centralized facili- ties for sequencing and data analysis. Furthermore, the large-scale projects would provide an interdisciplinary community to lead novel downstream metagenomic analyses, perhaps including uses for structural biology, high- throughput “omics,” new modeling of the evolutionary history of the early biosphere, and assessment of the current patterns and rate of evolutionary change. No doubt, metagenomic data will yield major approaches and questions that we cannot envisage today; these breakthroughs are best stimulated by large-scale projects. International Collaboration and Training The large-scale projects would require and enable collaboration and coordination that are difficult to achieve with single-investigator projects. Because they would be international and involve many investigators, they will require carefully considered and executed management plans and fund- ing dedicated to fostering communication and promoting successful collab- oration through scientific discourse. The large-scale projects would provide a unique setting for training a new community of young scientists who are skilled in collaboration and the execution of large-scale science. The nature of modern biology necessitates that at least some students have the skills to provide future leadership to international and multi-investigator projects as these become more prominent in biological research. Thus, the large-scale projects would provide the intellectual environ- ment and resources for the training of a new cadre of scientists to populate the field of metagenomics. In training, just as in research, the field would benefit from a healthy balance of large-scale and small-scale projects. LEARNING FROM PREVIOUS LARGE-SCALE GENOMICS PROJECTS Several collaborative research projects comparable with the proposed global metagenomics projects yielded important transformative science, such as the human and Arabidopsis genome sequencing efforts. An exami- nation of the history of these projects reveals factors that proved to be crucial to their success.

OCR for page 107
 THE NEW SCIENCE OF METAGENOMICS The Human Genome Project The Human Genome Project (HGP) provides an excellent window into the processes and pitfalls of “big science.” The HGP required the collabora- tive management of a large-scale, international, interdisciplinary research project involving input from several independent research teams. Two criti- cal lessons of this highly visible, highly successful effort can be noted. First, there was a clear goal for the collaborative project that all collaborators could embrace—the sequencing of the whole genome. The goal was: • Specific in stating what would be done (sequence the human genome). • Publicly understandable in terms of the benefit to society (human health). • Time-bounded (within 15 years). • Finite and with a specific associated cost estimate ($200 million per year for 15 years; $3 billion total). • Wild and audacious (the goal was substantially beyond the tech- nology that existed when the project was proposed). Several intermediate end points were set, allowing the public and policy-makers to monitor progress of the project—such as completion of the physical map (two key maps in 1992 and 1994), completion of indi- vidual chromosomes (1999 and 2000), a draft genome (2001)—and then the “final” genome (2003). Effort could proceed in parallel at organizations around the world that contributed to the overall effort. Common data stan- dards helped to enable this, and the discrete nature of chromosomes helped to organize the effort. Rapid data release and globally available databases ensured open sharing of information. Second, the HGP devoted substantial resources to consensus building and coordination. It was an international collaboration involving 20 groups and funding from the United States, the United Kingdom, Japan, France, Germany, and China. Sequence data were contributed by many centers. The direction of the HGP was set by the major funders—the National Institutes of Health (NIH), the US Department of Energy (DOE) and the Wellcome Trust. They established mechanisms to assist with the coordination of research, in particular to avoid unnecessary competition or duplication of effort, and to coordinate research with parallel studies in model organ- isms; to coordinate and facilitate the exchange of data and biomaterials; and to encourage public debate and provide information and advice on the scientific, ethical, social, legal, and commercial implications of sequencing the human genome. The methods used by the funders to achieve collabora- tion and coordination included open “Bermuda” meetings, periodic inter-

OCR for page 107
 A BALANCED PORTFOLIO national meetings and regular telephone conferences; rapid and unrestricted data release (all genomic sequence data were made publicly available with- out restriction within 24 hours of assembly); and data integration using a common software platform. The Arabidopsis Genome Project The A. thaliana genome sequencing project provides a slightly different perspective on how to establish and maintain such extensive, international collaborative research efforts. The Multinational Coordinated Arabidopsis thaliana Genome Research Project was conceived in 1990 by a small group of investigators who believed that such a project would have a profound enabling effect on the field of plant biology. The Arabidopsis Genome Initiative (AGI) was formed to establish standards for sequencing accuracy and guidelines for data release, and to allocate workloads for each participant. The AGI was made up of repre- sentatives of the six research groups involved in sequencing the A. thaliana genome and played a key role in the oversight of the project. The group communicated regularly to deal with issues as they arose. Some members of the AGI lobbied for immediate data release as was the practice in the HGP, but there was considerable disagreement among the participants and their funders on this point, and public availability of data ranged from deposition of a draft sequence in GenBank within 24 hours of its genera- tion to data release only when a sequence was finished and annotated. Data release was a subject of continuing discussion throughout the project, and the participants finally agreed to disagree about it. The AGI also played an important role in the final stages of the project when it became necessary to reallocate genome regions to centers that had finished their initial assign- ments ahead of schedule. This helped to ensure a steady flow of data and in part contributed to the completion of the project nearly 4 years ahead of schedule. The project benefited from the additional oversight provided by the Science Steering Committee, composed of members of the Arabidopsis research community and representatives of some sequencing centers. A US Steering Committee was also established to facilitate interactions between the participating US laboratories, to serve as an additional link to the inter- national efforts, to provide guidance on database issues, and to generate annual progress reports to the Arabidopsis community. One of the things that set the A. thaliana genome project apart from other sequencing projects was that the scientists on the steering committees, not the representatives of the funding agencies, were empowered to make decisions on the overall management of the project. Agency representatives

OCR for page 107
 THE NEW SCIENCE OF METAGENOMICS were, however, invited to all the meetings as observers and helped to ensure that the sequencing groups met their obligations. Another aspect of the project that required coordination was annotation and data analysis. Although each of the sequencing groups was involved in annotating its regions of the genome, different methods were used to gener- ate the information. It was decided that the ultimate goal was to provide the scientific community with a unified set of genome annotations. Through the implementation of open communication and clear procedures, a plan for a joint annotation effort between the Institute for Genomic Research and the Munich Information Center for Protein Sequences was established. LESSONS FOR METAGENOMICS The HGP and AGI provide valuable lessons for implementing a success- ful Global Metagenomics Initiative. Both projects benefited from having a clear goal, broadly accepted scientific and public benefits, and continuing coordination and communication among scientists and funding agencies. To succeed, the large-scale projects in the Global Metagenomics Initia- tive would need to replicate these qualities. The initial challenges will be to develop a consensus around the choice of three microbial communi- ties for in-depth study, to set clear goals, and to map out a program that establishes priorities and intermediate milestones. It will be important to identify model communities whose understanding would be of immediate and obvious public benefit. The human microbiome (its health implications) or ocean microbes (their role in the global carbon cycle) are two examples. Fully characterizing these communities is as daunting a task as decoding the human genome appeared to be 25 years ago. Consensus-building, planning, and staging will be necessary. A PRELIMINARY ROAD MAP Phase I: Choosing Model Communities The first challenge that the scientific community needs to meet is to develop a consensus around the desirability of launching a few large-scale metagenomics projects and then to delineate the principles for selecting and recommending model communities. The choices in metagenomics are more daunting than choosing which model organisms to sequence. The broad categories of microbial communities are quite diverse: natural environments from ocean to soil to extreme habitats, such human-made environments as bioreactors of many types, and a vast array of host-associated microbial communities, from insect symbionts to the human microbiome. Each of these categories of environments offers different opportunities and chal-

OCR for page 107
 A BALANCED PORTFOLIO lenges for metagenomics study, and thus it would probably be desirable to draw an in-depth, large-scale project from each. The broad scope of potential metagenomics study projects shows that metagenomics has much to offer in furtherance of the missions of many funding agencies, includ- ing the NSF, NIH, DOE, and the USDA. The choice of habitats to explore should be the product of discussions among members of the scientific com- munity in a process that we recommend be initiated and coordinated by the Microbe Project Working Group. One way to enable such efforts is to support cross-disciplinary, international workshops to debate the principles to apply in choosing model communities and to debate how to establish and maintain multidisciplinary and multinational research efforts. Alternatively, a process could be modeled after the NIH director’s roadmap meetings in which five different groups of scientists with diverse perspectives each spent a day at NIH discussing topics for the NIH’s long- term planning, called the Roadmap Initiative. The consensus ideas were then posted on the web for public comment, comments were collected, and decisions on themes were made based on the collective input of the scientific community. The large-scale projects in the Global Metagenomics Initiative will be chosen, in part, based on the rationale for the habitats of choice. Basing choice of the habitats on the following criteria will ensure that the desired outcomes of the Global Metagenomics Initiative are identified and satisfied by the mix of systems that are selected. Three large-scale model microbial- community projects will probably be appropriate. In each of the three broad categories (natural, host-associated, and managed communities), a specific model community will need to be chosen. The following criteria characterize communities that will yield the most useful data: • A community in which there is some fundamental understanding of the major functions and roles of the microbes and in which there would be a distinct benefit in improving that understanding, such as the microbial community colonizing the human gut or oral cavity. • A community of moderate complexity that is well characterized by environmental or geological criteria and that can be systematically sampled over long durations, such as those found in seasonally variable, depth- stratified lakes, hypersaline ponds, and low-nutrient oceans. • A community whose members can be well characterized by current sequencing technologies so as to make it possible to address fundamental questions of how the community is organized and stabilized, that is, of an appropriate level of complexity and where eukaryotes play a minimal role in community dynamics. • A community whose variation based on physical/geo/chemical

OCR for page 107
0 THE NEW SCIENCE OF METAGENOMICS characteristics can be resolved by reproducible sampling, such as a soil community colonizing a winter wheat crop. • A community to which a particular treatment can be applied so that factors that shape community structure and function can be tested, such as intestinal tracts, bioreactors, streams, or soils. Model metagenomics communities should be chosen to leverage past knowledge and current research. Several funding agencies target long-term investigation of particular communities or environments, including NSF’s LTER Network,1 NSF’s and USDA’s Microbial Observatories,2 and NSF’s NEON.3 Phase I would include one or more workshops to develop a consensus on at least three and perhaps up to 10 communities as possible objects of a large-scale project. The workshops would define a clear goal and end point for each project and elucidate the expected public benefit of achiev- ing the goals. Examples of possible projects are listed in Table 7-2, but the committee emphasizes that these are not prescriptive; the choice of projects would be best determined with a great deal of community input. Phase II: Planning and Initial Data-Gathering Once the communities have been chosen, Phase II would begin with a peer-reviewed, competitive process wherein groups of scientists submit interdisciplinary proposals for planning projects. The proposals would be evaluated according to the criteria presented in this report and any further criteria developed in the Phase I workshops. Planning proposals would be expected not only to address scientific issues but to outline project manage- ment, including coordination, milestones, oversight, data management and release, intellectual property, training, and public outreach. Project-planning awards will support a year of meetings of the inter- national group to hone hypotheses, approaches, and methods and to sup- port the gathering of the baseline information needed to pursue the chosen hypotheses. Baseline information might include low-depth sequence of the habitat, phylogenetic analysis of the community, an assessment of varia- tion among samples or locations, complete genome sequencing of cultur- able members of the community, or development of hybridization arrays, expression systems, or high-throughput assays. Establishment of a strong bioinformatics team would also take place during the 1-year planning 1 http://www.lternet.edu/. 2http://www.csrees.usda.gov/fo/fundview.cfm?fonum=0; http://www.nsf.gov/pubs/00/ nsf000/nsf000.htm. 3 http://www.neoninc.org/.

OCR for page 107
 A BALANCED PORTFOLIO TABLE 7-2 The Global Metagenomics Initiative: Examples of Large-Scale Projects Habitat Approaches Knowledge Derived Microbial Sample the intestinal microbiota Determine whether there is community of people in many locations a “core metagenome” of associated with who consume diverse diets the human gut, a core the human body and have diverse genetics community that is found in and lifestyles every person Characterize the metagenome Describe the extent of variation of the community with in communities at various phylogenetic techniques, location in the human gut functional analysis, and and between individuals massive sequencing Develop correlations between physiological conditions (health and disease, diet, and lifestyle) and microbial community structure and function in the human gut Microbial Conduct extensive sampling Establish relationships between community in over space and time seasonal and daily cycles, an unmanaged Conduct an extensive analysis structure, and function of habitat (such as of 16S rRNA sequences microbial community soil or seawater) (>200,000) Determine how environmental Produce extensive sequence change affects substrate information about the use, polymer degradation, metagenome in soils under and secondary metabolite different regimes production Conduct a function-based Describe the distribution analysis of the metagenome of characteristics in the of soil under each regime community Microbial Conduct an extensive analysis Establish relationships between community of 16S rRNA sequences particular functions or associated Construct extensive members of the communities with managed metagenomic communities and community persistence ecosystems from habitat in various and collapse that perform a locations; characterize with Identify organisms, traits, service (such as sequencing and functional or chemical conditions bioremediation or analysis that prevent or reverse sludge processing) Sample over time, including community collapse when the community is Develop and implement fully operative and when it interventions for experiments functionally collapses and improved performance

OCR for page 107
 THE NEW SCIENCE OF METAGENOMICS phase, and the team would define the tools needed to test hypotheses (for example, the association of phylogeny and function, integration of medical records and genomic information, searches for rare motifs, or pattern- recognition algorithms). Phase III: Implementation Phase III proposals would be submitted after the planning period, when sufficient baseline information and preliminary development were achieved. Phase III proposals would present a strategy that included evidence that all the methods needed are in hand, that the variation is known so that sampling strategies can be developed, and that the experimental design is carefully matched to the questions asked. Deep sequencing in a habitat would occur during Phase III, as would site-to-site comparisons, testing of hypotheses that are central to developing principles of microbial ecology, and potential new downstream uses of the metagenomics data in later years. The project Web site would provide up-to-date information about the project and direct viewers to the sequences and metadata that have been released. Phase III projects should be designed for a 10-year period with periodic review to achieve the larger-scale goals. CONCLUSION Undertaking three model, large-scale metagenomics projects in which the chosen environments can be characterized at great depth from a variety of perspectives would profoundly advance the field. No investigator can bring to bear all the different approaches that will be necessary to begin to understand the complex physical, chemical, genetic, metabolic, and envi- ronmental interactions that are taking place in even a moderately complex microbial community. The insights derived and the tools developed by a large, multidisciplinary group would be immediately useful to the wider community of investigators. If the model communities are carefully chosen, such large-scale projects would have obvious, major societal benefits. The Human Genome Project captured society’s imagination with the promise of a deeper understanding of the basis of human health. Well-chosen meta- genomics initiatives could similarly inspire with the promise of understand- ing of the microbial communities that contribute not only to human health but to the health of the biosphere (see Box 7-1). The projects outlined in Table 7-2 would furnish the statistical and bio- logical power to support conclusions that cannot be drawn from smaller- scale projects that lack enough breadth of sampling to be representative or

OCR for page 107
 A BALANCED PORTFOLIO BOX 7-1 Key Outcomes of Large-Scale Projects in the Global Metagenomics Initiative • Broad principles and unifying theory for microbial-community ecology. • Large-scale, intensive studies of important habitats or questions. • Methods of broad applicability to metagenomics research, both basic and applied. • Massive contributions to metagenomics databases. • Standards for data acquisition, management, and release in the field of metagenomics. • Lessons about economies of scale in metagenomics research. • International cooperation and collaboration. • Training for young people in the conduct and management of international, collaborative “big science” projects. • An opportunity to share science with the public and train graduate students to do so effectively. enough depth of analysis to assess the variation within and between sites with precision. The large-scale projects would be “virtual” centers. They would include scientists at many locations in the world to maximize the scientific diversity of the project team. Communication would be achieved by frequent meet- ings in person and by videoconference or other technology that becomes available during the course of the projects. The projects would probably need to be sustained for 10 years; so changes in personnel and participating institutions during a project’s lifetime would be expected.