Handling Genetic Data in the Laboratory
In the summer of 1996, just as laboratories around the United States were gearing up to begin sequencing the human genome in earnest, a sudden realization threatened to bring things to a halt. For years, since the start of the Human Genome Project, researchers had assumed that the DNA they would be studying would come from a large number of sources—so large that it would be practically impossible to identify any given bit of DNA as coming from a particular person. But as the time for serious sequencing approached, genome researchers noticed that almost all the DNA they would be working with was copied from the DNA of just four donors, three men and a woman, and the identities of at least a couple of them were known because they worked at two of the labs providing the DNA for the sequencing effort. To make matters worse, at least two of the four were apparently not told that their genetic sequences were to be made public, so they had never given their consent to this use of their genes. Of these two, one had since died, so scientists could not go to him and ask for belated consent.
This posed a dilemma. Researchers had worked for years to prepare large libraries of clones—identical copies of short stretches of DNA—from the genomes of these donors, and it would be time-consuming and expensive to do it over with other donors. On the other hand, the four donors might some day face unpleasant consequences. "Their redundant DNA would be out there for anybody to look at and draw conclusions from," noted Shirley Tilghman of Princeton University. "If an insurance company finds out it is this guy's DNA that is largely in GenBank, looks up this guy's DNA and finds he has 27 recessive alleles and is going to be a big-time problem with early onset Alzheimer's," the donor could find it impossible to get health insurance.
At the same time, some genome workers worried about the political correct-
ness of the DNA libraries. Being mainly from staffers at genome laboratories, might the selection of DNA be seen as elitist? Would some people wonder why there were three men and only one woman contributing DNA? Was the set of donors diverse enough? "There was sensitivity to how it would be publicly perceived, who was selected," said Raymond White from the University of Utah.
So the choice was made to work out an agreement between DOE and NIH for such libraries and to phase out use of those with known donors. "It was painful," recalls Tilghman, who served on the council that made the decision. But it seemed necessary to get fully informed consent from the donors and make sure that their identities remained secret.
Pieter de Jong of the Roswell park Cancer Institute in Buffalo, New York, described the process for collecting a new set of DNA. Advertising in Buffalo in newspapers and on radio, he and his colleagues attracted five or six hundred willing donors. They took the first ten male and ten female volunteers who were okayed by genetic counselors. Each of the twenty gave a blood sample and was paid a small amount of money—in cash, so there would be no paper trail leading to the donor. "The blood samples entered into my laboratory with a number on them rather than a name," de Jong said. "Numbers were taken off and replaced by our own lab numbers. No records were kept about this correlation, so there was no way for us to go back to knowing which twenty people they were." Then two donors, one male and one female, were selected at random to supply the DNA to make the libraries of DNA clones. The only record of who took part in the study is twenty sealed envelopes, each containing a consent form with the signature of one of the donors. The result of all this secrecy is that it would be practically impossible for anyone, even the researchers, to learn whose DNA was analyzed for the Human Genome Project. "The best chance of revealing the identity of the donors is through the donors themselves," de Jong said. "They know they have a 10 percent chance that they eventually were the people who delivered the blueprint which is part of the genetic database.''
As this tale illustrates, the field of genome research is still so new that researchers sometimes find themselves making up the rules as they go along. Many of the issues that genome researchers face are either not covered at all under existing policies or else are regulated by policies that were intended for very different sorts of research and are not appropriate for genome work. More specific standards will need to be developed to deal with dilemmas created by the new genetic technologies.
Consider, for example, the difficulties that genome researchers face in trying to comply with the Privacy Act of 1974. The Act forbids government agencies (and their researchers) from maintaining secret files of any type on individuals, noted Sherri Bale, a genetic researcher at the National Institute of Arthritis, Musculoskeletal and Skin Diseases, but genetic research demands a certain amount of discreteness. "We are supposed to allow people to see the records kept
on them," she said, "and that is an issue because there are certain pieces of information that in my consent document I tell people they will not see." For example, genetic testing may reveal that a child was not fathered by the man who believes he was the father. Bale—along with many other researchers—does not reveal this information, and she tells her subjects before the experiment that she will not reveal it. "I think it is a good decision," she said, but it is "a little bit in conflict with the Privacy Act because the misattributed paternity information is in my research record, and I don't open that research record to the individuals who I am supposed to allow to see all their records."
In general, a literal reading of the Privacy Act would seem to imply that any withholding of information from the records of any government researcher is illegal. Yet because this record often contains sensitive details that are not relevant to the medical care of the patient and whose release could actually harm the patient, researchers like Bale take a less-than-literal reading of the Act.
The Clinical Laboratory Improvement Amendments of 1988, or CLIA, offers a different set of problems for genetic researchers. To ensure the clinical value and a minimal quality of precision and accuracy of clinical lab work around the country, the Act imposes standards on all clinical laboratories. This includes the mandatory testing and periodic on-site inspections. Because research labs are not intended to provide information for use in clinical care, few worry about or are even aware of the CLIA regulations. But this means that, by law, they cannot provide to patients the genetic information generated from research testing.
"This is a very big issue where I am from," Bale said, "whether or not we can release this kind of information even to the patients themselves. People in the research labs are ignorant of or just tend to ignore the fact that the CLIA regulations are there, and in some cases information just goes out. In some cases it goes into the medical chart, in others it goes to the research chart. In some cases it goes directly into the hands of the referring physician and in others the hands of the participants themselves.
Not releasing the results of genetic tests to the subjects is not an attractive option, Bale noted. "A lot of people come into these studies because they want to know what their mutation is." If the research results would be kept secret, many people will not participate in the study. Furthermore, there is a tension between the Privacy Act and CLIA, the former demanding that nothing be kept from the subjects and the latter demanding that certain things not be released. Research labs generally cannot become CLIA certified because of the expense and difficulties posed by an array of requirements intended for clinical laboratories.
Yet another challenge for genetic researchers is the consent form that subjects must sign in order to take part in an experiment. The guidelines and requirements for consent forms were not designed with genetic experiments in mind, and researchers find themselves scrambling to meet these mandates. "We are required," Bale noted, "to tell everybody that their participation is voluntary and they can withdraw at any time," but what does "withdrawal" mean when the
One particularly sticky genetic privacy problem concerns what to do about archived data. "The national bank of archived human tissue is vast," said David Korn of the American Association of Medical Colleges, "and institutions that have a long history—Yale, the Massachusetts General Hospital, Hopkins—may have over 1 million archived cases. Not samples, but cases with multiple samples that are all basically accessible like an archaeological dig, if you will." And all of these records are potential sources of information for genetic researchers looking to understand a particular disease.
Unfortunately, almost none of these samples were obtained with a consent form that would allow such genetic research to be done on them. How, then, can this vast bank of information be put to use?
The case of the National Health and Nutrition Evaluation Survey (NHANES) offers one answer. "The NHANES III story," said Sherri Bale of the National Institute of Arthritis, Musculoskeletal and Skin Diseases, "is that between 1988 and 1991, 7900-plus subjects ages twelve and up had white blood cells stored on them, and between 1991 and 1994, another 9500 subjects had white cells frozen on them, and 8200 cell lines have been established already and are stored." These cell lines can be used for, among other things, extracting DNA and doing genetic research, so the researchers who collected the data and samples wished to make it available to genetic study. Unfortunately, when the study began, no one had thought to ask the donors to consent to such genetic testing.
The Centers for Disease Control and Prevention (CDC) decided that the samples could still be made available for genetic research if they were ''anonymized.'' So, Bale said, "we have gotten to the point of trying to define what this new verb 'to anonymize' means. The definition that the CDC staff has come up with is that anonymized samples are those where no one, including the staff of the CDC, is ever able to link the results of a genetic test or any kind of test done on DNA back to the survey participant." To this end, the samples will be made freely available but identified only by race, sex, and ten-year age groupings. These data will be useful to researchers who wish to study how common various alleles, or versions of a particular gene, are in the population. If researchers want further information, such as disease status or exposure to possible carcinogens, they will have to come up with a research design that guarantees that the anonymity of the data will be kept intact. As for detailed information on the subjects, such as would appear in a medical record, that will not be provided, as it would theoretically make it possible to identify the donors.
In general, anonymization seems to be the only way to make archived material available to genetic researchers short of contacting the original donors and asking them to sign new consent forms, a process that, in most situations, would be prohibitively expensive. It is not an ideal solution by any means, as it limits researchers to a small percentage of the information they could glean from these archives, but no one has come up with an acceptable alternative. "There has been a tremendous amount of discussion on this, as you would expect," Bale said, "and this was the only solution that [NIH's Center for Human Genome Research] and [the National Cancer Institute] would allow."
researcher may have already sequenced the person's DNA and put it into a database? "Do you destroy the sample? Do you destroy the information that you have already gleaned from the sample?" As Susan Rose of the Department of Energy commented, the option of opting out of an experiment makes no sense after researchers have isolated or even published the subjects genetic sequence. "That is another example of something that was written for a biomedical situation where somebody is on a therapeutic drug or something like that. The idea of checking in and out when the DNA sequence taken from your sample has been published is an example of something that doesn't fit the genetics world."
Perhaps the most vexing issue facing genetic researchers is how to protect the privacy of individuals, such as those whose DNA is being used in the Human Genome Project, without compromising the research itself. The obvious solution might seem to be an approach like the one de Jong took in assembling the samples for the genome project's new clone libraries, when he made it practically impossible for anyone—even the researchers involved—to identify who provided a particular genetic sample. Such "anonymization," however, is not suited for most genetic research. Workshop participants agreed on this point more than perhaps any other issue, and several of them explained in great detail the value of maintaining a link to the medical records of the donors of genetic information.
"In our research now it has become very important to be able to go back to the individual who gave the original tissue sample," said Vicky Whittemore of the National Tuberous Sclerosis Association. Working from DNA provided by a number of people with tuberous sclerosis, researchers have recently identified two genes that cause the disease. Now, Whittemore said, scientists want to go back to the medical records of the individuals that supplied the original samples and correlate the symptoms each has with the genetic mutations they have. In this way the researchers can learn more about how the particular mutations produce the disease and, eventually, come up with ways to treat it. But if the original samples had been stripped of all identifying information, it would be impossible to do this.
In the same vein, David Korn of the American Association of Medical Colleges argued that for many types of studies it is vital to have longitudinal data—information that is accumulated over time, usually years or even decades—and that this is impossible if the data are made anonymous by destroying links to the subjects' identity and medical records. Suppose, for instance, a researcher studying a collection of tissue samples from breast tumors has discovered a particular genetic marker (a stretch of DNA used to identify genes) that seems to be associated with the tumor. If the researcher knows the identity of the donors of the samples—which may be a decade or two old—he can follow the progression of these patients' tumors over time and look for correlations between the marker and the development of the disease, perhaps discovering a way to predict its course ahead of time. Without identification, the archived tissue samples have much less value.
"Anonymization decisions mean irreversible disruption of linkage," Korn noted. "No one will ever be able to restore that link." In some studies, such as the Human Genome Project, this may not be a problem, but in others "it really means that you are destroying the utility of the material. Without having the ability to get the additional correlative or follow-on information that may, in fact, already exist, you are severely truncating the significance, the interpretability, the impact of whatever it is that you are measuring. People have to realize what the trade-off are on these kinds of decisions.''
An even more important reason not to anonymize genetic data may lie in the emerging field of predictive medicine. As Barbara Handelin, a genome consultant for private industry, explained it, predictive or prognostic medicine will tailor treatments for a particular patient according to that patient's genetic makeup. To that end, she said, researchers are now trying to understand such things as "why people respond or do not respond to certain drugs or to certain kinds of therapy, why people have adverse reactions to drugs, and how we can better define cancers." So doctors of the future may speak to their patients like this: "I want to give you this drug but before I do we are going to have to do some kind of genetic profiling on you because we know that there are three groups in the population and if you are in Group X you are going to have a very bad reaction to it so clearly I am not going to give it to you. Furthermore, I am not going to give it to you if you are in Category A because we know that people of the A profile simply don't get much therapeutic benefit from this drug."
The promise of this sort of medicine is hard to overstate, Handelin said. "We could produce a rational delivery of diagnostics and therapeutics. We would overall save medical dollars spent. We would give medicine to people who really benefit from it and not give it to people who would be harmed by it, and, also, we would develop new drugs that would address the reasons why some people don't respond to the drugs that we already have. We would also be able to stratify patients into groups according to how much time and money we need to spend monitoring them, for example, for a certain future disease."
But developing such a genome-based medicine will demand a tremendous amount of information correlating genes and diseases. "We would need to study large populations of people. We would need to access a lot of the archival material that Dr. Korn just referenced in order to study lots of individuals, to understand the variability that impacts on all these aspects of how we develop disease. Then we would need to correlate those genetic profiles with exhaustive clinical information. It is of no use just to have the tissue. We have to be able to connect the tissue with as much information as we can find out about that person, with the exception of things like your name and address, telephone number, e-mail address. Everything else about you, we want to know. We would need outcome data. We would need drug history. We would need to be able to put all that information together into some really large informatics capability to be able
to make complex analyses, to be able to draw conclusions and associations between genotypes and a whole bunch of clinical information."
"It is not just the good of the researcher in this case," added the University of Utah's Ray White, "but it is really the public good that is at stake."
For these reasons, many workshop participants agreed, wherever possible genetic data should be maintained not anonymously but with identification that will allow researchers to get further information on the donors. However, until the extent of the threat of genetic discrimination is known, several participants suggested that modern encryption techniques might offer a way out. Carol Dahl of the National Cancer Institute summed it up this way: "Are there technologies out there that will enable us to encrypt information to allow us to use it in a prospective way for studies in research while protecting that information from incorporation into medical records and from insurance companies gaining access? Clearly encryption is not perfect, but there are industries out there in defense and banking that have spent a lot of money trying to make it as secure as possible." If such encryption technologies were put to work in genetic research, she said, "we might be able to actually protect patients in research studies rather than looking for legislative ways of solving our problems."