New Paradigms in Industry: The Single Nucleotide Polymorphism Consortium
I was asked to comment on my understanding of the European Union legislation protecting databases and what effect it has had. From a personal perspective in the biomedical sciences and as an academic user, my answer is none. I think it is fair to say that until it has some effect, we are not going to take much notice of it. European legislation is often a mystery, and I am not sure how much the United Kingdom had to do with that draft legislation. I believe the database directive has now been incorporated into U.K. law.
I want to discuss both the Single Nucleotide Polymorphism (SNP) Consortium and the Human Genome Project. I am afraid most of my presentation will be thin on law and possibly too high on rhetoric. Having been engaged in a personal and direct way with these issues as a trained scientist, I find it quite difficult to be always as objective as I ought to be. To paraphrase Winston Churchill, I have always thought that lawyers should be on tap and not on top.
The Human Genome Project is a consortium involving laboratories and funding agencies around the world. The major funding organizations in the United States are the National Institutes of Health and the Department of Energy, and in the United Kingdom a private organization, the Wellcome Trust. The U.K. government did not put very much resource into this initiative. One of the things that we first discussed as we began to think how to organize the Human Genome Project was what to do about the DNA, who does this DNA belong to, and was there something special about the fact that we were dealing with human DNA rather than mouse or rat DNA.
As you have already heard, we developed a series of principles that have been called the Bermuda Rules or Bermuda Principles. In essence, in return for the enormous largesse given to very few, selected sequencing groups, the sequence data would be deposited into the public databases every 24 hours.1 The raw information would be provided for people to use as best they could. That was a grassroots movement. That was not imposed by the funding agencies. It is remarkable how in the scientific business we are so dependent on these agencies being champions and leaders. However, this policy was very much due to two scientists, John Sulston in the United Kingdom and Bob Waterston at the University of St. Louis. They had to persuade the scientific community, the leaders of all the major research groups capable of taking on this task, and there was eventually agreement on these principles, which were ratified at subsequent meetings held in Bermuda.
The Human Genome Project was progressing rather well, until the announcement in May 1998 of the establishment of Celera. In principle, of course, there is no reason why a private entity should not go about sequencing the human or any other genome and using that information as it chose to do. The real issue that set the scientific
For additional information on the Bermuda Principles, see www.wellcome.ac.uk.
world alight was the fact that efforts were made to close down the public-domain activity. There was a meeting at Cold Spring Harbor, during which Dr. Craig Venter told the National Institutes of Health to give up their human sequencing program and focus on the mouse.
By pure chance, the Wellcome Trust was considering a proposal to double the financing available to the Sanger Center, now the Sanger Institute, to sequence the human genome. The governors approved the award, and John Sulston and I flew to Cold Spring Harbor to announce the news to the assembled scientists and to talk to the funding agencies. As we all know, the Human Genome Project survived and there was one project in the public domain and one in the private domain. But you need to ask yourselves, what would have happened if the Wellcome Trust had not by that time become reasonably wealthy and if there had not been a public-domain Human Genome Project. Would those data now be available to scientists around the world?
There were negotiations to try to bring the private and public programs together. There were exchanges of letters. Finally, because time was pressing and we wanted to move forward, the Wellcome Trust released a letter that in essence stopped the negotiations. Celera did not react very well to this, and a cartoon was published in Nature purporting to show that the gene of human aggression had at last been isolated. The following week, there was a joint statement issued by the White House and 10 Downing Street, which included a declaration supporting the release of data. Unfortunately, it was misinterpreted, particularly by the financial press. It had an effect that had not been foreseen, which was the suggestion that it was inappropriate in any way to use the patent system to capture intellectual property (IP) coming out of genome research, which was not the intent. There was also a simultaneous announcement by both camps of completion of a working draft. I want to make it clear that the Wellcome Trust is not opposed to the appropriate patenting of IP, I just wanted to mention that, whereas we regard the human sequence as something that itself should not be patented, the proteins derived from that and therapies to interfere with that are entirely appropriately protected by the various IP rules.
Each of us contains three billion nucleotides in our DNA. Each of us is extremely different, yet our DNA is 99.9 percent the same. But that means there are three million differences, and those differences will be distributed throughout the three billion nucleotides. If you want to do a bit of mathematics as to what the various combinations can be, you can see that it is truly astronomical and explains why we are all so different. As a species we are more closely related through our DNA than any other species on the planet, and we understand, or at least we have some reasonable understanding, of why that is so.
These are some of the reasons why SNPs were seen to be important by the pharmaceutical industry. They in some way will reflect our different susceptibility to disease, our different susceptibility to the action of drugs. If you can understand that information with respect to whom is susceptible, for example, to diabetes or hypertension, you will be able to segment the population, advise on different changes in lifestyle, and develop appropriate drug therapies. There are many other uses. As such, industry recognized the need to understand the variation in the human genome, the SNPs. They could see that if these could be mapped out and we could understand where they were and correlate them with disease, that would be very powerful. GlaxoSmithKline estimated that it would cost approximately $250 million to get a reasonable first map of SNPs. Even for a large company like Glaxo, $250 million is a large portion of the research budget. Multiply that by the fact that each pharmaceutical company would need its own SNP map and you can see why, from a purely economic reason, industry was interested in somehow mitigating those expenses. So they all got together to explore whether a combined effort would enable them to get a SNP map cheaper, or share the cost. It was only later in the process that the possibility of public funds via the Wellcome Trust became a possibility.
One of the reasons that putting the data into the public domain was appealing was, if you get a cabal of companies together to produce information that they will share only among themselves, you can run afoul, particularly in the United States, of antitrust legislation. So we developed a model for a not-for-profit company, a 501(c)(3) in the United States, where the partners would join, agree to the workings of the organization, work out a work plan, and fund it.
The companies involved were not only the major pharmaceutical companies in the United States and Europe, but also IBM and Motorola. Motorola wanted to put SNPs on chips, exactly what is now happening. The mission of the consortium was to gather these data to serve the medical community, the life sciences community, and the
membership. Industry needs proper quality control. If this resource is to in any way be used in the drug industry, it will no doubt come under the rules of the Food and Drug Administration in the United States and the equivalent agencies around the world, so this needed to be an industry standard. As such, quality issues that do not normally worry academic groups had to be built into the business plan.
Very early on, it was agreed that we needed an IP legal task force to work out how these data should be made available and in what form. It became clear that there was a risk that simply releasing the information might enable other entities to download the data, subject them to some form of IP protection, and then sell the data back to the member companies. The senior executives did not want to go back and explain to their boards that they spent a lot of money and then now had to buy the information. So a series of data policies were initiated. By January, six months after the formation of the SNP Consortium, the information was released via a Web-based site.
By the end of the project, we had mapped 1.25 million of these SNPs, and all were released into the public domain. I want you to remember that number, 1.25 million. Glaxo’s original program was going to cost $250 million and was going to end up with 150,000 mapped SNPs.
A number of scientific groups around the world were commissioned under contractual conditions to determine how the data were to be put together and to determine the quality assurances, and a data center that would capture all of the information was set up. None of the data would be released to the public or to the companies until they had been validated and mapped, and then they would be released to the companies and the public at exactly the same time. Nobody saw the data ahead of anybody else. The release policy made the data available at approximately quarterly intervals, which enables the SNPs to be mapped. This enables the patent to be applied for in bundles of SNPs to establish prior art, which is the only purpose of it—a protective patenting policy.
The IP policy was to maximize the number of SNPs into the public domain at the earliest possible date and ensure that they remained free of third-party encumbrances, so that the map could be used by all without financial or other IP obligations—no charge to access the data, no licensing fees, nothing. This plan was simply to ensure that there was a priority date for the SNPs, which would then prevent anybody else from capturing them. That has worked, but the legal task force, using lawyers from each of the companies and the Wellcome Trust and using very expensive external lawyers, took a long time to come up with a deal that we all thought would work. Again, it was extremely important that we got an agreement that nobody would have prior access to the information, and that remains the case.
As I mentioned earlier, this program from scientific and economic perspectives was spectacularly successful, in that with a budget of $44 million, not $250 million, we mapped close to 1.5 million SNPs rather than the target of 150,000. As such, industry was very pleased. And, of course, we thought maybe this model could be applied elsewhere.
We are in the process of setting up a structural genomics consortium that will provide a high-throughput resource for determining the structures of human proteins. The data release principles have been agreed on, although the details have yet to be fully worked out as part of the contract between the Wellcome Trust and the companies. However, our IP experience informs us that the raw structural data should not be patented and should be made freely available to researchers everywhere. Again, all these coordinates, once they are validated, will be released into the public domain on the Web. There will be no restrictions, and they will be provided to the public and the consortium members at the same time.
At the moment, we are working with the U.K. Department of Health and the U.K. Medical Research Council on a national population collection. We will try to recruit 500,000 volunteers, aged between 40 and 65, to ask them to donate blood, and then a DNA database will be established. Access rules for the database are still under development, but it will be freely accessible. There are many things that still need to be done, including further public consultation, because there are significant privacy issues that need to be resolved. But the principles of access have been agreed. The aim is to make the information available to anybody who wants to use it for health care and public health benefits.
One unique thing about this database is that people who use the data will generate new data. As a quid pro quo, those new data will have to be redeposited in the database, so that the database will continue to grow. And, of course, the very important thing for this database is to ensure that the way in which the samples and the data are used is consonant with the consent that has been obtained from each of the volunteers.
The Wellcome Trust is committed to advancing health care through support of biomedical research. We fund the research on the basis of scientific merit, not for direct commercial benefit, although we need funds to fund research. By that same token, we are supportive of the protection of IP in an appropriate way and an appropriate time in the value chain, and with appropriate licensing terms. We are under an obligation under the Charity Commission in the United Kingdom to ensure that useful results of research are applied to public good. You cannot have exploitation of research results without protection of the IP. As such, it is a challenge.
The release of this basic information is, in my opinion, definitely in the public interest, especially for genome sequences and protein structure. Access to this information furthers research progress rather than hindering it.