3
The Multinational Coordinated Arabidopsis Thaliana Genome Research Project

Arabidopsis thaliana is a noncommercial member of the mustard family that has become widely used as a model plant because it develops, reproduces, and responds to stress and disease in much the same way as many crop plants. The plant has a number of features that make it ideal for research purposes—it is easy and inexpensive to grow and produces many seeds, which is useful for genetic experiments. One especially attractive feature is its small genome (100 megabases), which simplifies and facilitates genetic analysis.

Project Elements

The Multinational Coordinated Arabidopsis Thaliana Genome Research Project is an international scientific effort that began in 1990. Its stated goal is to understand the physiology, biochemistry, growth, and development of a flowering plant at the molecular level. The project developed when several program managers at the National Science Foundation (NSF), recognizing that research on Arabidopsis was accelerating, convened a series of international workshops of leading scientists to devise a long-range plan. The resulting project plan called for genetic and physiologic experiments to identify, isolate, sequence, and understand genes; the establishment of worldwide electronic communication among laboratories; the establishment of resource centers for collection and dissemination of genetic stocks, genes, and related materials; and the creation of databases so that new knowledge would be shared. The project plan also contained mechanisms for formal, annual progress reviews and periodic establishment of new goals by a multinational steering committee



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 33
--> 3 The Multinational Coordinated Arabidopsis Thaliana Genome Research Project Arabidopsis thaliana is a noncommercial member of the mustard family that has become widely used as a model plant because it develops, reproduces, and responds to stress and disease in much the same way as many crop plants. The plant has a number of features that make it ideal for research purposes—it is easy and inexpensive to grow and produces many seeds, which is useful for genetic experiments. One especially attractive feature is its small genome (100 megabases), which simplifies and facilitates genetic analysis. Project Elements The Multinational Coordinated Arabidopsis Thaliana Genome Research Project is an international scientific effort that began in 1990. Its stated goal is to understand the physiology, biochemistry, growth, and development of a flowering plant at the molecular level. The project developed when several program managers at the National Science Foundation (NSF), recognizing that research on Arabidopsis was accelerating, convened a series of international workshops of leading scientists to devise a long-range plan. The resulting project plan called for genetic and physiologic experiments to identify, isolate, sequence, and understand genes; the establishment of worldwide electronic communication among laboratories; the establishment of resource centers for collection and dissemination of genetic stocks, genes, and related materials; and the creation of databases so that new knowledge would be shared. The project plan also contained mechanisms for formal, annual progress reviews and periodic establishment of new goals by a multinational steering committee

OCR for page 33
--> of leading Arabidopsis researchers. In this multi-institution project, or collection of related projects, NSF supported the early collaboration and planning efforts, but the U.S. scientific community now is also supported by the National Institutes of Health (NIH), the Department of Agriculture, and the Department of Energy. Ongoing communications among scientific administrators, the scientific community, and the national and international steering committees facilitate the identification of needs, rationalization and prioritization, and negotiations with agencies around resource requirements. The remarkable collaborative spirit of the participants has made it a successful model for scientific cooperation among several thousand participating scientists and scientific administrators in Asia, Australia, Europe, the Middle East, and the Americas. Thus, it seemed an especially appropriate case with which to examine the ingredients that facilitate the sharing of research resources. Perhaps most central to the issue of sharing research resources are the biological resource centers and the informatics that facilitate exchange of information and materials. The Arabidopsis stock centers were established in 1991 to preserve and distribute biological materials supporting the large Arabidopsis research community. There are two such centers—the Arabidopsis Biological Resource Center (ABRC) at Ohio State University in Columbus, Ohio, and the Nottingham Arabidopsis Stock Centre (NASC) at the University of Nottingham, United Kingdom. Both of these stock centers have a comprehensive collection of seeds and clones as well as other research tools such as T-DNA lines and transposable element-transformed lines, transposon lines, promoter trap lines, recombinant inbred populations, and yeast artificial chromosome (YAC) and phage libraries—which they distribute worldwide. The number of stocks sent has increased significantly in the last three years, from 15,000 total seed stocks distributed in 1992 by ABRC and NASC combined, to about 45,000 seed stocks distributed in 1994. As for DNA, 1,000 clones and 6 YAC libraries were sent in 1991; just two years later, about 3,100 clones and 166 libraries were distributed, according to the NSF's Multinational Coordinated Arabidopsis Thaliana Genome Research Project Progress Report for Year Four. Centers are now providing considerable technical services such as multiplexed libraries to facilitate screening for specific genes. Three major databases are key resources for sharing information. These include the Stanford-based Arabidopsis thaliana Database (AtDB) previously at Massachusetts General Hospital, where it was called An Arabidopsis thaliana Database (AAtDB). This is a comprehensive collection of many types of information, including genetic map information obtained directly from investigators or from publicly available collections and databases. The Arabidopsis Information Management System (AIMS) is an on-line database system running on a central machine at Michigan State University. It is devoted primarily to stock center operations, but like the other information

OCR for page 33
--> systems, it is readily accessible to anyone with a connection to the Internet. The third major database, devoted to cDNA sequences and expressed sequence tags (ESTs), is maintained at the University of Minnesota, which periodically sends these data to the National Center for Biotechnology Information at the National Library of Medicine. In addition, several new databases have recently been developed for managing information on EST contigs (the Institute for Genomic Research) or for information on YAC contigs (University of Pennsylvania John Innes Center). The relative ease with which a World Wide Web (WWW) server can be established is leading to rapid proliferation of specialty databases. Ownership and Access Issues At this time, the U.S. stock center and databases do not accept deposits that place restrictions on materials, a policy that has in a few instances, impeded accepting some important collections. However, NASC has accepted a collection of insertional mutants in which users are required to sign a material transfer agreement that cedes commercial rights to the investigators that produced the collection. Curators aggressively solicit materials. Whenever a paper is published, authors are sent a note requesting the materials in the paper (in the future, because obtaining deposits is such a time-consuming but important process, members of the research community, rather than members of the stock center, will solicit deposits). Quality control is also conducted by the curators. Peer pressure, the example of prominent scientists, and recognition for contributing stocks all help foster continued contributions to the centers and their associated databases. The national steering committees, originally ad hoc but now elected (the six-member American committee has two members replaced each year through e-mail balloting), wield considerable influence in this respect, as do the heads of the major laboratories, who have encouraged openness and sharing by clear public acknowledgments to depositors of data and materials. Although at this time the Arabidopsis community requires that genome sequences be deposited in the public database three months after they are publicly available, the multinational steering committee is considering requesting that journals publishing in this area require an accession number from the stock centers indicating that experimental materials have been deposited. There is also no continuing ownership of materials in these stock centers (i.e., once there, they are owned by the stock center). The stock centers and databases are extremely successful because resources and information are so freely and willingly shared. As soon as raw sequence data are obtained from

OCR for page 33
--> the Michigan State University cDNA sequencing project laboratory, for example, they are sent directly to the University of Minnesota, where the initial analysis takes place and the result are deposited in the public database. At the same time, the clones are deposited in the stock center and thereby made available to anyone interested. An interesting feature of the U.S. stock center database is that all requests to the stock center are logged on the database, which is available on-line to anybody in the world. This way anyone can find out the names of the people and the labs requesting a specific seed or clone and the date of the request. Although at first there was considerable concern that large laboratories might gain an edge over smaller ones through such information sharing, the mechanism has instead been found extremely useful in developing collaborations rather than stimulating competition. Products of Arabidopsis sometimes stimulate commercial interest, and patenting is both common and encouraged, although there seems to be a strong feeling in the Arabidopsis community that nobody should patent genes in this organism (as opposed to a novel use of them). One consequence of this view has been a strong pressure to get sequence data, especially ESTs, into the public domain quickly, so that patenting based merely on sequence information becomes difficult or impossible. In other cases—for example, novel applications—relevant materials and information are not published or deposited until after the patent application is filed, but once this is accomplished there has been a general commitment to sharing the resource. For example, the project's "rule" is that the sequences appear in a public database three months after they are available, and although undoubtedly the odd patent may be written on these sequences before the three months is up, this is the rule that the community itself wrote at a workshop sponsored by NSF. Another aspect of the enforcement question is a second rule, this one requiring sequencing groups to have a member of the national steering committee on the executive committee overseeing their sequencing operation, thus making it difficult for a sequencing lab to keep results secret for very long. There have been no real tests of the consequences for breaking either of these rules, but the assumption is that NSF will discontinue funding if there is a complaint from the community. Enforcement is more complicated at the international level. The international steering committee is trying to negotiate the contribution of a collection of mutants made by a consortium in Europe, where strong pressure is being put on scientists by their funding agencies to limit distribution to those willing to cede or share future commercial benefits.

OCR for page 33
--> Cost Issues In the United States, the Department of Agriculture, the Department of Energy, the National Institutes of Health, and the National Science Foundation collectively supplied $7.5 million for Arabidopsis research in 1990 and $22 million in 1993. Of that total, the amount devoted to the Multinational Coordinated Arabidopsis Thaliana Genome Research Project over the last five years comes to about $4.2 million: $2.2 million for establishing and maintaining the various databases and $1.9 million for the stock center at Ohio State (ABRC). Universities are unquestionably subsidizing the enterprise, but the extent and cost-effectiveness of this approach are not known. International components receive support from their own governments and the European Community. The relatively modest amounts required by this project appear to the committee as money well spent, and unlike some of the other case studies examined, current and projected funding appears adequate. One reason for this seems to be that the services provided appear to be viewed by both the NSF and the Arabidopsis research community as legitimate objects of research support. That is, essential support for researchers as a group is seen as no less deserving of research dollars than the projects of individual scientists. Other Issues and Problems In his presentation at the workshop, Chris Somerville of Stanford and the Carnegie Institution, one of the project's original organizers, identified factors that are instrumental in the sharing of resources. These include the leadership of program managers in government agencies that support the research; the leadership and example of senior scientists and prominent laboratories; an oversight committee with broad representation of countries and scientists that sets policy, adjudicates problems, and can make proposals to funding agencies based on the needs of the community; investment in infrastructure such as stock centers and information databases; support for workshops and other scientific meetings; and a process for annual updating of a plan. Peer pressure to share information and materials and aggressive solicitation of stocks for the centers are also important. Among the problems identified by Chris Somerville is the requirement for U.S. funding agencies to set up stock centers and databases via a competitive process even when the steering committee could locate only one interested and capable bidder in the community. Other problems include providing stocks and services to an international community with limited funds from U.S. agencies;

OCR for page 33
--> as well as the ongoing administrative load imposed by the need for an active process of soliciting deposits, a time-consuming activity given the pace of research on Arabidopsis. More information on this case study is available from NSF in the project's progress report for year four.