Read "The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium" at NAP.edu

Page 161 Cite

Suggested Citation:"21. Strengthening Public-Domain Mechanisms in the Federal Government: A Perspective From Biological and Environmental Research." National Research Council. 2003. The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Washington, DC: The National Academies Press. doi: 10.17226/10785.

×

21
Strengthening Public-Domain Mechanisms in the Federal Government: A Perspective from Biological and Environmental Research

Ari Patrinos and Daniel Drell¹

I want to begin by noting that these remarks are based on a commentary that we wrote for the June 6, 2002 issue of Nature and reflect our views, not those of the U.S. Department of Energy (DOE).²

The office in which we serve, the Office of Biological and Environmental Research at the DOE, is the steward of about half a billion dollars. Our responsibility is to oversee and manage its spending for the greatest good to the U.S. science effort in service to DOE missions and, by extension, our citizenry. Our office supports research in various scientific areas, among them environmental sciences, global climate research, medical technologies and imaging, and genomics including the Human Genome Project that we started in 1986. Among other things, we work very hard to adhere to practices and policies that promote openness of data access because we believe that experience has demonstrated that, paraphrasing Ivan Boesky, openness is good, openness works, and this maximizes the benefits to us all.

But we have to recognize and adapt to new realities. The U.S. government expenditures for fundamental research total in the neighborhood of $45 billion per year. Recent figures for fiscal year 2000 for genomics indicate that the U.S. government invests about $1.8 billion in this area, but the private sector invests close to $3 billion. Glossing over the imprecision of these numbers, it is clear that the expenditures on “postfundamental,” or exploitation-focused, research by the private sector are roughly comparable, at least in its order of magnitude, to expenditures by the federal government. What the private sector chooses to do with research results and data is up to them. As more private-sector funding goes toward “upstream” (more fundamental) science, and the distinctions blur, the challenge we face is not to decry this situation any further but to try to work out accommodations that promote science. We are optimistic that it can be done, and we are particularly encouraged by Professor Berry’s presentation.³ We want to emphasize his point that, before we move toward more restrictive data policies, we should experiment, collect data, and see what happens.

In February 2001, two significant papers were published, one in Nature⁴ and the other in Science,⁵ reporting on the “draft” sequence of the 3.2 billion base pair human genome. The Nature paper derived from the multiyear

¹	The authors acknowledge with gratitude the helpful comments of colleagues, as well as Robert Cook-Deegan (Duke University). The views expressed herein are those of the authors and do not reflect policy of either the U.S. Department of Energy or the U.S. government.
²	See A. Patrinos and D. Drell. 2002. “The times they are a-changin,” Nature, 417:589-590 (6 June).
³	See Chapter 17 of these Proceedings, “Potential Effects of a Diminishing Public Domain on Fundamental Research and Education,” by R. Stephen Berry.
⁴	The Genome International Sequencing Consortium. 2001. “Initial sequencing and analysis of the human genome,” Nature 409:860-921.
⁵	J. C. Venter et al. 2001. “The Sequence of the Human Genome,” Science 291 (5507): 1304.

Page 162 Cite

Suggested Citation:"21. Strengthening Public-Domain Mechanisms in the Federal Government: A Perspective From Biological and Environmental Research." National Research Council. 2003. The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Washington, DC: The National Academies Press. doi: 10.17226/10785.

×

effort of the International Human Genome Sequencing Consortium, led by the U.S. National Human Genome Research Institute but also comprised of the DOE, the Wellcome Trust, and many other partners, whereas the Science paper resulted from the more recent effort of Celera Genomics Corporation, a private company based in Maryland. Although the consortium had practiced, since 1996, a policy of unrestricted rapid release (and deposition into GenBank) of their sequencing data as they proceeded, Celera put its data up on its Web site (www.celera.com) upon publication in Science. Celera and Science worked out a limited-access arrangement. On April 5, 2002, Science published two papers⁶^,⁷ reporting the draft genome sequences for two subspecies of rice, Oryza sativa; one is from Syngenta International Inc.’s Torrey Mesa Research Institute and was published with limitations on data access essentially identical to those associated with Celera’s human genome sequence publication. Considerable controversy has resulted from these policies; however, this presents a challenge to make constructive suggestions for ways to move forward that might reduce the impasse and perhaps promote greater data sharing.

One possibility that has been proposed in various contexts⁸ is to start a clock on the deposition of certain data whereby a journal or other depository agrees to restrict access to the source data underlying a paper for a specified duration; other data could be housed with a trustee who ensures that the data were indeed deposited at the agreed time. Careful provisions would be required both for how long the clock is set to run as well as precisely when it starts, but the idea is to permit a set duration for commercial exploitation (including the filing of patent applications) on inventions derived from the data. The U.S. Patent and Trademark Office allows up to one year before a provisional patent application is converted to a utility patent application, giving an applicant time to perform additional research toward developing an invention while retaining the early priority date; thus one year might be a reasonable time for such a clock to run, but this would be subject to negotiation. This is similar to past practices with databases such as the Protein Data Bank.

The responsibility for implementing this scheme could rest with the journal or with a respected nonprofit foundation (e.g., the Institute for Scientific Information or the Federation of American Societies for Experimental Biology). In consultation with GenBank (or a relevant public repository), the journal (or foundation) could provide access to the necessary files upon the expiration of the clock. It would be very useful to know the consequences of varying clock “periods,” as well as just how much privately held data actually contribute to the commercial viability of a company; such studies would provide valuable insights.

As time goes by, data lose value both as new discoveries (and, in particular, new technologies for reacquisition of the same data) are made and as science, as is its wont, proceeds unpredictably into new areas. This clock mechanism would allow a company to publish valuable data that would otherwise remain private while offering some protection for a limited duration for the company to use the data exclusively. This role might be uncomfortable for journals and trustees, so it is important to explore fully a mechanism that all sides would have confidence in. An added concern could arise if implications for national or international security (for example, potential detection signatures in a pathogen’s genomic sequence) emerged while the data were held on deposit before publication.

There is an urgent need to find ways of giving incentives to the private sector, which now controls vast amounts of valuable data that have no obvious short-term commercial value but could be of great potential research value. Most of the human genome sequence (about 98 percent of it) is noncoding; allowing greater access to this part of it would not seem to threaten Celera’s stated goal of discovering candidate drugs based on those portions of the genome that encode expressed proteins. In addition, as is becoming ever clearer, the sheer volume of data from high-throughput sequencing centers (such as the Sanger Center in Great Britain or DOE’s Joint Genome Institute) challenges even the most advanced and sophisticated labs to mine it for value within a reasonable time frame.

Can incentives be defined that would induce Celera, Syngenta, and other similar companies to relax the access restrictions for some of their data? That question deserves to be explored because the benefits move in both directions, with academic expertise becoming more available to private-sector companies and the science carried out in the

⁶	J. Yu et al. 2002. “A Draft Sequence of the Rice Genome (Oryza sativa L. ssp indica),” Science 296:79-92.
⁷	S. A. Goff et al. 2002. A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. Japonica),” Science 296:92-100.
⁸	Petsko, G. 2002. “Grain of Truth,” Genome Biology 3(5): 1007.1-1007.2.

Page 163 Cite

Suggested Citation:"21. Strengthening Public-Domain Mechanisms in the Federal Government: A Perspective From Biological and Environmental Research." National Research Council. 2003. The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Washington, DC: The National Academies Press. doi: 10.17226/10785.

×

private sector becoming more accessible to academic scientists. Ideally, this becomes a “win-win” for both sectors. Echoing Professor Berry’s comment earlier, this could be subjected to some exploration and evaluation.

There are several precedents for successful public-private collaboration. The Single Nucleotide Polymorphism (SNP) Consortium⁹ and the IMAGE Consortium¹⁰ both involved partnerships that placed into the public domain valuable genomic sequence information (the first on single base pair variants useful for trait mapping, the second of complementary DNA [cDNA] sequences representing expressed human genes); the commercial partners became valued contributors. They made the assessment that the value of restricting such data was not sufficient in contrast to the benefits of making these data and resources freely available tools for the intelligence and creativity of the widest possible base of researchers.

Another instructive example, one that is representative without being unique, comes from the Keck Graduate Institute in California. Industry-sponsored research carried out at the Keck Institute, involving Keck faculty, is built upon a carefully negotiated contract. Both parties work out who does what, when and where, and who will own what results. Of particular concern is the role of students whose educational needs must take precedence; this means carefully negotiating disclosure issues and rights to publish so that the attainment of their degrees is not restricted. Fundamentally, this requires the parties involved defining a work plan, benchmarks and milestones, and the terms of a mutually acceptable contract. Although this may not be easy, and the exact conditions need to be tailored to the specific parties and their needs, the Keck Institute has succeeded in using negotiated contracts with private-sector companies to attract research support, advance both the scientific work of their faculty and the education of their students, and contribute to the commercialization of scientific knowledge. In fact, practices such as this are not uncommon, although they do not garner the publicity that the disputes do.

Are some constraints on data access preferable to not seeing the data at all? We believe they are. Is the academic scientific community willing to forego the science being done outside the groves of academe? If this is to be the policy, then an increasing fraction of the 60 percent or so of genomics research conducted in the labs of private firms will remain unavailable to academic and government scientists. That is, in our view, too high a price to pay.

Our case is reinforced by actual practices among most genomics firms. They do not publish their data. The many firms sequencing cDNAs and identifying SNPs, for example, have information that would be immensely valuable to academic researchers if it were publicly available. The firms have, however, chosen not to publish those data, preferring instead to patent genes as they are characterized and to sell access to their databases under agreements that protect data as trade secrets. That is their right. We should, however, be creating incentives for companies to publish data when they choose to and to facilitate such publication within proprietary constraints, rather than clamoring for policies that will push firms toward nondisclosure.

Sir Isaac Newton is widely credited with the observation that “If I have seen farther than others, it is because I was standing on the shoulders of giants.” The steady progress of science is founded on the traditional concept that individual scientists assemble knowledge “brick by brick.” We believe that full and unrestricted access to fundamental research data should remain a guide star of science because centuries of experience suggest that it is the most efficient approach to promoting scientific progress and realizing its many benefits. However, we must also accept the current realities.

At no time has science ever been the exclusive province of those in academia; however, today the proportion of high-quality science taking place in the private sector (e.g., the invention of polymerase chain reaction technology and the development of cre-lox recombinase gene knockout technology) is impressive as never before. The potential in the private sector for productively collaborating with the academic or government scientist is greater than ever before. We should not bemoan this development but should welcome it. Private-sector science has its legitimate interests too. The burden of argument is on the academic sector to attract and justify greater openness on the part of private-sector science and to state clearly what the benefits to the private sector can be.

⁹	See the SNP Consortium Web site at http://snp.cshl.org/, as well as Chapter 28 of these Proceedings, “The Single Nucleotide Polymorphism Consortium,” by Michael Morgan for additional information.
¹⁰	See the IMAGE Web Site at http://image.llnl.gov/ for additional information.

Page 164 Cite

Suggested Citation:"21. Strengthening Public-Domain Mechanisms in the Federal Government: A Perspective From Biological and Environmental Research." National Research Council. 2003. The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Washington, DC: The National Academies Press. doi: 10.17226/10785.

×

We offer the following conclusions for consideration by the National Academies as they explore the role of scientific and technical data and information in the public domain:

Openness, by and large and as a guide for public funding of fundamental basic research, is a very successful policy because it generates data that in unpredictable ways lead to exciting insights into nature’s workings.
It is the appropriate role of the private sector to exploit open basic research to develop and commercialize valuable products. It is what the private sector is good at. The amount of their investment is large (and may in certain areas exceed that of the public sector) and the quality of the resulting discoveries is very high.
We need to explore aggressively compromises and quid pro quos to attract private-sector companies to loosen their hold on that portion of their data that could benefit fundamental research, but in ways that do not threaten their intellectual property concerns. By working together, in creative ways, everybody can benefit.
We suggest some mechanisms, none particularly novel, that could be used to increase private sector-public sector collaboration. Importantly, we think this is an area of potential opportunity.
Different schemes (e.g., timers, impartial trustees, incentives, bilateral contracts, public-sector–private-sector consortia) can be put to experimental test to learn which work better, with what partners, and under what circumstances. We do not pretend to have all the answers, but we do assert that the exploration is worth undertaking.
If we succeed, the scientific and financial benefits can be enormous; if we fail, so too could be the costs. Science will continue to advance regardless; but for self-evident reasons, we all would like to see it advance as rapidly as possible in the United States.

The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium (2003)

Chapter: 21. Strengthening Public-Domain Mechanisms in the Federal Government: A Perspective From Biological and Environmental Research

21
Strengthening Public-Domain Mechanisms in the Federal Government: A Perspective from Biological and Environmental Research

Welcome to OpenBook!

Get Email Updates

The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium (2003)

Chapter: 21. Strengthening Public-Domain Mechanisms in the Federal Government: A Perspective From Biological and Environmental Research

21 Strengthening Public-Domain Mechanisms in the Federal Government: A Perspective from Biological and Environmental Research

Welcome to OpenBook!

Get Email Updates

21
Strengthening Public-Domain Mechanisms in the Federal Government: A Perspective from Biological and Environmental Research