effort of the International Human Genome Sequencing Consortium, led by the U.S. National Human Genome Research Institute but also comprised of the DOE, the Wellcome Trust, and many other partners, whereas the Science paper resulted from the more recent effort of Celera Genomics Corporation, a private company based in Maryland. Although the consortium had practiced, since 1996, a policy of unrestricted rapid release (and deposition into GenBank) of their sequencing data as they proceeded, Celera put its data up on its Web site ( www.celera.com) upon publication in Science. Celera and Science worked out a limited-access arrangement. On April 5, 2002, Science published two papers 6,7 reporting the draft genome sequences for two subspecies of rice, Oryza sativa; one is from Syngenta International Inc.'s Torrey Mesa Research Institute and was published with limitations on data access essentially identical to those associated with Celera's human genome sequence publication. Considerable controversy has resulted from these policies; however, this presents a challenge to make constructive suggestions for ways to move forward that might reduce the impasse and perhaps promote greater data sharing.
One possibility that has been proposed in various contexts 8 is to start a clock on the deposition of certain data whereby a journal or other depository agrees to restrict access to the source data underlying a paper for a specified duration; other data could be housed with a trustee who ensures that the data were indeed deposited at the agreed time. Careful provisions would be required both for how long the clock is set to run as well as precisely when it starts, but the idea is to permit a set duration for commercial exploitation (including the filing of patent applications) on inventions derived from the data. The U.S. Patent and Trademark Office allows up to one year before a provisional patent application is converted to a utility patent application, giving an applicant time to perform additional research toward developing an invention while retaining the early priority date; thus one year might be a reasonable time for such a clock to run, but this would be subject to negotiation. This is similar to past practices with databases such as the Protein Data Bank.
The responsibility for implementing this scheme could rest with the journal or with a respected nonprofit foundation (e.g., the Institute for Scientific Information or the Federation of American Societies for Experimental Biology). In consultation with GenBank (or a relevant public repository), the journal (or foundation) could provide access to the necessary files upon the expiration of the clock. It would be very useful to know the consequences of varying clock “periods,” as well as just how much privately held data actually contribute to the commercial viability of a company; such studies would provide valuable insights.
As time goes by, data lose value both as new discoveries (and, in particular, new technologies for reacquisition of the same data) are made and as science, as is its wont, proceeds unpredictably into new areas. This clock mechanism would allow a company to publish valuable data that would otherwise remain private while offering some protection for a limited duration for the company to use the data exclusively. This role might be uncomfortable for journals and trustees, so it is important to explore fully a mechanism that all sides would have confidence in. An added concern could arise if implications for national or international security (for example, potential detection signatures in a pathogen's genomic sequence) emerged while the data were held on deposit before publication.
There is an urgent need to find ways of giving incentives to the private sector, which now controls vast amounts of valuable data that have no obvious short-term commercial value but could be of great potential research value. Most of the human genome sequence (about 98 percent of it) is noncoding; allowing greater access to this part of it would not seem to threaten Celera's stated goal of discovering candidate drugs based on those portions of the genome that encode expressed proteins. In addition, as is becoming ever clearer, the sheer volume of data from high-throughput sequencing centers (such as the Sanger Center in Great Britain or DOE's Joint Genome Institute) challenges even the most advanced and sophisticated labs to mine it for value within a reasonable time frame.
Can incentives be defined that would induce Celera, Syngenta, and other similar companies to relax the access restrictions for some of their data? That question deserves to be explored because the benefits move in both directions, with academic expertise becoming more available to private-sector companies and the science carried out in the
6J. Yu et al. 2002. “A Draft Sequence of the Rice Genome (Oryza sativa L. ssp indica),” Science 296: 79-92.
7S. A. Goff et al. 2002. A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. Japonica),” Science 296: 92-100.
8Petsko, G. 2002. “Grain of Truth,” Genome Biology 3(5): 1007.1-1007.2.