Study Overview and Background
Community standards and policies for the deposition and sharing of materials and data associated with published research findings play an important role in the life-sciences community. Concern that adherence to standard practices of the community was eroding in recent years culminated in February 2001 when Science published a paper describing the draft sequence of the human genome by researchers at a company, Celera Genomics. Another paper on a draft version of the human genome, assembled by the publicly funded International Human Genome Sequence Consortium, was published at the same time in Nature and the sequence data deposited in GenBank, an annotated collection of all publicly available DNA sequences. Although Science usually requires authors to deposit DNA sequences that a paper cites in GenBank or one of the affiliated public databases, in this case it allowed Celera to post its sequence data on the company’s own Web site, where they were made available to academic researchers, but with restrictions on the amount of data downloadable from the Web site at any one time. The data were also made available to for-profit companies on different terms.
The decision by Science (Kennedy and Jasny, 2001) provoked considerable debate in the life-sciences community.
The debate stimulated interest in revisiting the core principles that underlie community standards, the accepted practices for sharing data, software, and materials that are specific to different disciplines of the life sciences. One might presume that community standards were established long ago and are therefore widely recognized and agreed on, given that scientific publication has existed for more than 3 centuries. This is true in some, but not all, areas of biology. For example, in systematic and evolutionary biology, there are certain widely accepted standards that are routinely observed. In many more recent, rapidly expanding fields, this is not the case. Rapid changes in the life sciences in recent years have led to:
Disagreement and uncertainty about the responsibilities of authors to share data and materials.
A sense that, in practice, publication-related materials and data are not always readily available to researchers who desire access to them.
Suggestions that standards for sharing are not being enforced.
Controversy over seemingly different application of journal policies to different authors.
Questions about how standards and policies apply to various types of data and materials, such as large databases and software.
Suggestions that standards for sharing may be in conflict with federal legislation that encourages commercialization of the results of federally funded research.
The prospect that new legal protections for databases, particularly in Europe, will complicate the development of comprehensive and consistent standards.
Uncertainty as to whether academic investigators should be treated differently from industry investigators with regard to the provision of access to their publication-related data or materials.
To address these concerns, the National Research Council created, in October 2001, the Committee on Responsibilities of Authorship in the Biological Sciences, whose members were chosen from academe and the commercial sector for their expertise in the life sciences
and medicine and their experience with issues related to intellectual property rights, scientific publishing, data, software, technology transfer, and the structure of the scientific enterprise. The committee was given the following charge:
To conduct a study to evaluate the responsibilities of authors of scientific papers in the life sciences to share data and materials referenced in their publications. The study will examine requirements imposed on authors by journals, identify common practices in the community, and explore whether a single set of accepted standards for sharing exists. The study will also explore whether more appropriate standards should be developed, including what principles should underlie them and what rationale there might be for allowing exceptions to them.
To meet its charge and to obtain input from the breadth of the life-sciences community, the committee organized a workshop, “Community Standards for Sharing Publication-Related Data and Materials,” which was held on February 25, 2002, at the National Academy of Sciences in Washington, DC. The workshop was organized around five hypothetical scenarios (see Appendix B) that served as the basis for examining the wide array of complex issues related to authors’ responsibilities for sharing data and materials. More than 70 workshop participants—the keynote speaker, invited panelists, and audience members—discussed the issues in plenary sessions and smaller working groups. The participants comprised distinguished members of the life-sciences community, including researchers and administrators from universities, federal agencies, and private industry; scientific-journal editors; and members of the legal and university technology-transfer communities.
Scope of the Study and Structure of the Report
This report presents a synthesis of the discussions at the workshop and the issues considered by the committee in its deliberations. The report puts forward the committee’s findings and recommendations on
the key issues facing the life-sciences community with regard to sharing of publication-related data and materials. The rest of this chapter provides background on reasons for addressing these issues. Chapter 2 examines the value of publishing scientific findings and the principles related to the publication of scientific findings. Sharing of data and software and of materials related to publication are treated in Chapters 3 and 4, respectively, and Chapter 5 reviews the various arguments advanced regarding the differing interpretations and consequences of existing standards. Chapter 6 addresses compliance with appropriate community standards, including ways to encourage compliance and to handle cases of noncompliance; it also addresses the challenge of forging community standards that have the robustness and flexibility needed to accommodate the rapid change in life-sciences research that is expected to continue.
The scope of the committee’s study was restricted to issues that begin to arise when a paper is submitted for publication. The report therefore does not address questions about the sharing of data that are not being published or unrefereed preliminary or raw data posted on public Web sites before formal peer-reviewed publication. The report also does not address the requirements of the Shelby Amendment of the Freedom of Information Act. As emphasized at the workshop by committee chair Thomas Cech, president of the Howard Hughes Medical Institute, the purpose of the workshop was to address “the responsibility of authors with respect to sharing publication-related data and materials.”
While the principles and standards identified in this report have broad applicability to various disciplines within the life sciences, the committee did not conduct a comprehensive examination of practices for sharing of data and materials specific to every discipline. Such practices are tailored to the types of data and material in use and by the unique circumstances of the research. For example, in systematic and evolutionary biology, the long established and only acceptable practice for sharing publication-related voucher specimens is to deposit them in public or accessible repositories, often museums, where they are available to everyone. In microbiology, on the other hand, the use of national
repositories to share cultures is not uniform; many scientists maintain and distribute cultures from in-house collections. To the extent that there are multiple communities of life scientists rather than a single community, those disciplines have the ultimate responsibility to develop and implement specific standards that extend from the general principles and standards identified in this report. Although the focus of this report is on the life sciences, the principles and standards considered in the committee’s deliberations are of a fundamental nature, and the committee hopes that its recommendations will be of interest to scientists in general.
BACKGROUND: WHY IS THERE A PERCEIVED PROBLEM?
The sharing of experimental results and research materials has long been important for the advancement of science and technology. For many years, a spirit of free and open sharing seemingly prevailed among life scientists. However, today’s rapidly evolving research environment is producing some growing pains in the life-sciences community.
Among the common perceived problems are the ignoring or denial of requests for materials or data associated with a publication and long delays in honoring such requests. Increasingly, data and materials that are shared come with restrictions, such as material transfer agreements (MTAs) that limit how the resources may be used. Although in some fields of biology, such as x-ray crystallography, more data are shared than ever before; in other life-science fields, the unrestricted, unfettered sharing of data and materials, including those related to published research, is thought to be less common than it was some 20 years ago. Although quantitative evidence is difficult to obtain, a recent survey of geneticists and other life scientists at 100 U.S. universities (Campbell et al., 2002) reported that the ideal of free and open sharing is not always being met. Of geneticists who had asked other academic faculty for additional information, data, or materials regarding published research, 47% reported that at least one of their requests had been denied in the preceding 3 years, and 12% of geneticists acknowledged denying a
request from another academic researcher themselves. The phenomenon is not peculiar to academic genetics, according to the survey. There are no a priori reasons to suggest that geneticists were more likely than other university life scientists to report having their requests denied or having denied others’ requests. Supporting the notion that sharing is becoming less common, 35% of the geneticists said that sharing of data and materials had decreased during the preceding decade, while only 14% said that it had increased.
Commercial Considerations and Other Concerns
Various factors are believed to contribute to the reduction in unrestricted sharing of publication-related data and materials and to new concerns about sharing in the life-sciences research community. One is the growing role of the for-profit sector—such as pharmaceutical, biotechnology, research-tool, and bioinformatics companies—in basic and applied research over the past 2 decades, and the resulting circumstance that increasing amounts of material and data are in private hands. Although scientists who work for the companies typically want to share reagents and information and many companies see value in sharing, the primary responsibility of a company is to its investors. Giving away valuable data and materials without securing some type of intellectual property protection, and without any promise of financial return, can, depending on costs and competitive implications, result in reluctance to share widely.
Biotechnology and pharmaceutical companies are not only conducting basic research in their own laboratories but also are funding the work of researchers in the not-for-profit sector (universities and private not-for-profit research institutions). This intermingling of for-profit private-sector activities with public or not-for-profit interests increases the prospect of potentially conflicting missions that can impede unrestricted sharing as researchers become involved in commercial activities.
Another major contributor to the current climate is the increasing concern among universities and academic life scientists about intellectual
BOX 1–1 Technology Commercialization and Sharing Research Tools
Many researchers believe that the increased use of license agreements and material transfer agreements interfere with the free exchange of publication-related research resources. One school of thought holds that university technology-transfer offices tend to overestimate the potential commercial value of their own researchers’ materials, particularly research tools that are unlikely to be commercialized, as opposed to materials that could be used directly as products in their own right (such as diagnostics, drugs, and vaccines) or used as the basis for new services (such as software databases).
In response to such concerns, the National Institutes of Health (NIH) established a Working Group on Research Tools (www.nih.gov/news/researchtools), and in 1999 issued a set of principles and guidelines for sharing biomedical research resources developed with NIH funding (NIH, 1999). For cases in which an invention supported in whole or in part with federal funds is useful primarily as a research tool, NIH says that “inappropriate licensing practices are likely to thwart rather than promote utilization, commercialization and public availability of the invention.”
property rights and commercialization. In the United States it stems in large part from the Patent and Trademark Law Amendments Act, commonly known as the Bayh-Dole Act (PL 96–517, 1980), passed by Congress in 1980. The Bayh-Dole Act encourages universities and other not-for-profit research institutions to promote the use, commercialization, and public availability of inventions developed through federally funded research by allowing them to own the rights to patents they obtain on these inventions. That has contributed to more overlap in the interests of the for-profit and not-for-profit research sectors, and in some cases impeded the unrestricted sharing of publication-related data and material as universities and other not-for-profit research institutions have sought the commercial development of and economic returns from their intellectual property (see Box 1–1). According to workshop panelist Michael Hayden, a professor of medical genetics at the University of British Columbia and chief science officer for Xenon Genetics, Inc., “the blurring between the university and the biotechnology companies has become more and more apparent” in Canada as well—in this case as a result of “an implicit understanding” that, in return for increased govern-
ment funding, “the universities are going to play a bigger role in commercializing that intellectual property and making sure there is economic benefit.”
Commercial interests are not the only reason for withholding data and materials. The geneticists surveyed by Campbell and his colleagues cited additional reasons for intentionally withholding information, data, or materials related to their own published research. They include the financial cost of providing the materials or information to others; the need to preserve patient confidentiality; the need to protect the ability of a graduate student, postdoctoral fellow, or junior faculty member to publish follow-up papers; the need to protect one’s own ability to publish follow-up papers; and the likelihood that the recipient will never reciprocate. It is not surprising that a reluctance to share is more common in fields in which scientific competitiveness is high.
The Changing Nature of Data
New types of data and materials are also complicating publication-related sharing practices. One factor that has added a new dimension to the scientific landscape and is an increasing source of concern about community standards for sharing is the growing role of large databases and other large datasets in life-sciences research. The rise of “big science” projects, such as the Human Genome Project, and the ever-increasing pace of technology have enabled researchers to collect vast quantities of data faster and faster. The large databases being compiled include genomic databases, microarray-based gene-expression databases, proteomics databases, large-scale databases for comparative genomics, human population-genetics datasets, ecological datasets, and databases resulting from the use of imaging technologies. Because many of the databases can be productively “mined” for a long time and yield many papers, some authors view relinquishing control of them as a stiff penalty in light of the time, cost, and effort needed to create the first publication.
Although the genomics, structural-biology, and clinical-trials communities have established public databases that facilitate the free sharing
of data in standardized formats, researchers in other disciplines that are generating large datasets, such as those resulting from brain imaging or gene and protein expression studies, have yet to agree on standards for when and how to share, format, annotate, and curate data.
The time, effort, and expense involved in generating large datasets, databases, and some research materials have been cited as arguments for restricting access to them. In the case of databases, unrestricted sharing is considered especially problematic because U.S. law does not provide intellectual property protection for databases (see Box 3–1). Any enterprise that produces large databases may be reluctant to share it without restrictions on initial publication, inasmuch as doing so may mean giving up a substantial commercial advantage and could enable the wholesale copying of databases by others for commercial purposes.
Other emerging challenges in publication-related sharing arise from practices related to software and algorithms. These are becoming more common as the subject of publications in the life sciences. Software developers have long disagreed about whether the source code needed for a published program or algorithm should be made available to everyone, and life scientists who develop software are no exception. One reason for the debate is that although software can be copyrighted, it can be difficult in practice to prevent someone else from copying and quickly modifying the source code and taking the lead in commercializing it. And some have argued that mandatory sharing of source code prevents universities from exercising their legal right to develop commercial products from federally funded research.
In the workshop’s keynote presentation, Eric Lander, director of the Whitehead Institute Center for Genome Research, reviewed some of the many contentions that are shaping the debate over sharing of data and materials associated with publications and the related topic of public-domain resources. “These are hard arguments to weigh in the absence of an intellectual framework for evaluating them,” he said. “I think our goal is to step back and ask, ‘What is the intellectual framework in which we can parse these arguments?’” The following chapter examines such a framework.