2

The Benefits of Data Sharing

Key Messages Identified by Individual Speakers

•   Data sharing can enhance understanding of the results of an individual clinical trial and enable the pooling of data from multiple trials to extend scientific discoveries beyond those derivable from any single study.

•   The moral and ethical arguments for data sharing center on fulfilling obligations to research participants, minimizing safety risks, and honoring the nature of medical research as a public good.

•   The practical and scientific arguments for data sharing include improving the accuracy of research, informing risk/benefit analysis of treatment options, strengthening collaborations, accelerating biomedical research, and restoring trust in the clinical research enterprise.

•   A cultural shift has already begun as leaders in industry, academia, and regulatory agencies recognize the value in increased transparency and data sharing and are focusing on how—instead of why—data should be shared.

•   Participant-level data are particularly useful when shared, but care must be taken to avoid drawing inaccurate conclusions from reanalysis of such data.

Clinical data come in a variety of formats (see Box 2-1), from the raw data collected in case report forms during trials to the coded data



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 9
2 The Benefits of Data Sharing Key Messages Identified by Individual Speakers  Data sharing can enhance understanding of the results of an in- dividual clinical trial and enable the pooling of data from multiple trials to extend scientific discoveries beyond those derivable from any single study.  The moral and ethical arguments for data sharing center on ful- filling obligations to research participants, minimizing safety risks, and honoring the nature of medical research as a public good.  The practical and scientific arguments for data sharing include improving the accuracy of research, informing risk/benefit analy- sis of treatment options, strengthening collaborations, accelerat- ing biomedical research, and restoring trust in the clinical research enterprise.  A cultural shift has already begun as leaders in industry, aca- demia, and regulatory agencies recognize the value in increased transparency and data sharing and are focusing on how— instead of why—data should be shared.  Participant-level data are particularly useful when shared, but care must be taken to avoid drawing inaccurate conclusions from reanalysis of such data. Clinical data come in a variety of formats (see Box 2-1), from the raw data collected in case report forms during trials to the coded data 9

OCR for page 9
10 SHARING CLINICAL RESEARCH DATA BOX 2-1 What Is Participant-Level Data? Terms such as “participant-level data,” “individual patient data,” and “raw data” are not well defined, noted Elizabeth Loder of the BMJ. A mutual understanding of the way these data are generated and shared can help alleviate ambiguities in nomenclature. In a typi- cal multicenter clinical trial, data originate with case report forms, which can be handwritten or electronic. Study monitors audit the da- ta, either at individual sites or electronically, to ensure accuracy. When a form contains an entry that is difficult to interpret or obvious- ly mistaken, the monitors send a query back to the investigator or study staff to resolve the problem. Each query has to be explained and resolved before the data are entered into the coordinating cen- ter database (Kirwan et al., 2008). At several points in this process, a portion of the data is coded or categorized, and additional checks are performed to make sure the data entry is correct. Sometimes in the process of data entry, additional queries about the data are gen- erated that must be addressed by the original investigator and the study staff. The term “participant-level data” generally refers to the de- identified records of individual patients generated through this pro- cess. De-identification is the process by which personal information that can be used to identify an individual is removed. However, even participant-level data may not capture all relevant information rec- orded in the raw dataset. For example, Loder described several challenges involved in coding adverse events. Misclassification of adverse events in clinical trials can have serious consequences—as when adverse events like suicidal behavior are coded only as emo- tional liability—so systems have evolved to minimize this possibility. Adverse events usually are categorized using a predefined hierarchy or organizational system. But the symptoms reported by patients do not necessarily fall into this hierarchy or system. As a result, such symptoms can be interpreted in different ways. Because of this am- biguity, some have argued for access to raw data as reported by pa- tients or researchers on the case report forms before any coding has taken place (Gøtzsche, 2011). stored in computerized databases to the summary data made available through journals and registries like ClinicalTrials.gov. Data sharing can also occur at many levels. Several of the presenters at the workshop de-

OCR for page 9
THE BENEFITS OF DATA SHARING 11 scribed these data-sharing continuums and discussed the benefits and risks of data sharing, based on the degree to which participant-level data are made available to researchers and the public. In some trials, data are not even made available to individual re- searchers participating in a multicenter trial. Sometimes, data are re- leased to researchers not associated with the study only if they show a genuine research interest in the question and a track record of research capability. In some cases, data are shared with everyone. THE USES OF SHARED PARTICIPANT-LEVEL DATA De-identified patient data have two major uses, observed Deborah Zarin, director of ClinicalTrials.gov at the National Library of Medicine. They can improve transparency, helping to understand the results of an individual clinical trial, including what happened to individuals in the trial, and they can be pooled to discover new things not identified in the individual trials. Data Sharing to Enable Independent Reanalysis Steven Goodman, associate dean for clinical and translational re- search and professor of medicine and health policy and research at the Stanford University School of Medicine, discussed the former use case in the context of ensuring that a study was correctly analyzed and interpret- ed. Independent reanalysis of data is the basis of reproducible research and can be an extremely difficult task. An example he mentioned was a study of childhood asthma that had 72 different study forms, 109 form revisions, and almost 300,000 records in the database. The original man- uscript started with 73 tables and 9 figures and underwent 40 revisions. The published manuscript contained three tables and two figures. “How do we begin from this tiny little slice that we see to begin to work back- ward and figure out is what they did right?” he asked. While the top tier of journals may have methodologists who can begin to check the chain of scientific custody from protocol to conduct to data to analysis to results, other journals have to rely on peer reviewers to detect problems. The au- thors of published studies can put additional information on the Web in the form of supplementary material and appendixes, but in reality, check- ing the accuracy of the results for a study like this is extremely difficult.

OCR for page 9
12 SHARING CLINICAL RESEARCH DATA In talking about the tools that are needed to ensure that published findings are based on sound data and analyses, Goodman referenced a paper titled “Reproducible Epidemiologic Research” that proposes a standard for reproducibility (Peng et al., 2006). The premise behind that paper is that independent replication of research findings is the funda- mental mechanism by which scientific evidence accumulates to support a hypothesis. The authors, therefore, argue that datasets and software should be made available to allow other researchers to conduct their own analyses and verify the published results. Peter Doshi, a postdoctoral fellow at the Johns Hopkins University School of Medicine, also discussed the application of shared data to cred- ible assessment of clinical trial results. Doshi, however, argued for a broader view of what should be considered clinical trial data. He pro- posed that detailed records of measurements and analyses, as well as narratives—including descriptions of patient dispositions, study proto- cols, and even correspondence—are needed to evaluate the quality of published trial results. Data Sharing for Discovery Participant-level data from multiple trials also can be combined to learn more than can be derived from the results of a single trial. Elizabeth Loder, clinical epidemiology editor at BMJ, observed that although meta- analyses historically have been done using summary-level data, the num- ber of meta-analyses of individual participant data has been growing substantially. Furthermore, meta-analyses done with individual patient data are typically more likely to be able to detect treatment effects that differ across subgroups than meta-analyses done with aggregate data (Riley et al., 2010). These subgroup effects are frequently of great inter- est to clinical investigators. As Loder said, drawing from the title of an essay by Stephen Jay Gould, “the median is not the message.” THE RATIONALE FOR DATA SHARING The arguments in favor of sharing can be divided into two broad and overlapping categories, Loder explained. The first category consists of moral and ethical arguments. These arguments point to the necessity of fulfilling obligations to research participants, minimizing known risks

OCR for page 9
THE BENEFITS OF DATA SHARING 13 and potential harm from unnecessary exposure to previously tested inter- ventions, and honoring the nature of medical research as a public good. Patients participate in clinical trials based at least in part on the under- standing that their data may benefit others, and these benefits are more likely to occur if the data are widely available. Also, unpublished infor- mation might in some cases prevent the occurrence of adverse events (Chalmers, 2006). Data sharing may take different forms, from simply publishing the results of research to publicly sharing detailed patient- level datasets. Finally, taxpayers provide a large amount of money to support publicly funded research and expect to have access to the bene- fits of that research. The second category consists of practical and scientific arguments. These include detecting and deterring selective or inaccurate reporting of research; enabling the replication of results and potential resolution of apparently conflicting results; informing risk/benefit analyses for treat- ment options; facilitating application of previously generated data to new study questions; accelerating research; enhancing collaboration; and building trust in the clinical research enterprise. Rob Califf, director of the Duke Translational Medicine Institute, professor of medicine, and vice chancellor for clinical and translational research at Duke University Medical Center, who also spoke during the first session, pointed to the need to resolve results that appear conflicting. Clinicians are not able to interpret conflicting clinical trials data based on looking at the data ab- stractly without any kind of expert synthesis of information. Only through replication can one sort out whether conflicting results are due to chance or true differences. Califf went on to describe a “cycle of quality” that can generate evi- dence to inform patient care (see Figure 2-1). Clinical trials generate knowledge, which is then applied in clinical practice. The measurement of patient outcomes then leads both to clinical practice guidelines that define standard of care and to further clinical trials. At the core of the cycle is measurement and education, which in turn depend on access to data. Box 2-2 describes how this paradigm of cumulatively building and sharing datasets has worked to reduce deaths due to heart attacks by 40 percent. As an example of the kinds of advances that may be possible, Loder cited the case of a high school student who won $75,000 at the Intel In- ternational Science and Engineering Fair. The student cited searchable databases and free online science papers as the tools that allowed him to create his prize-winning entry. “How many collaborators are out there,

OCR for page 9
14 SHARING CLINICAL RESEARCH DATA who we cannot even imagine at this point, who might make use of the data?” said Loder. Loder also called attention to the need to build trust in the clinical research enterprise. This trust is at “an all-time low,” she said, which is causing a crisis in recruitment for clinical trials (Williams et al., 2008). FIGURE 2-1 A “cycle of quality” from discovery science to the measurement of outcomes can generate evidence to inform policy. SOURCE: Califf et al., 2007. BOX 2-2 Treatment Advances for Cardiovascular Disease: A Success Story for Data Sharing The risk of death after a heart attack is now 40 percent lower than it was before the development of medical therapies designed to reduce such deaths (Krumholz et al., 2009) and the development of these therapies relied extensively on clinical trials, said Rob Califf, director of the Duke Translational Medicine Institute, professor of

OCR for page 9
THE BENEFITS OF DATA SHARING 15 medicine, and vice chancellor for clinical and translational research at Duke University Medical Center. As an example, he pointed to the Antithrombotic Trialists’ Collaboration (2002), which involved 135,000 patients and 287 randomized controlled trials. This study provided compelling evidence that the use of aspirin can reduce deaths from heart attacks. Replication of results from multiple trials has also demonstrated the benefits of fibrinolytics, beta blockers, angiotensin-converting enzyme inhibitors, and other treatments. These studies also showed that particular therapies were more or less useful in different groups of patients and at different times fol- lowing presentation of symptoms, providing information that then shaped clinical practice guidelines. Another example involves the effects of statins. By pooling data from multiple trials, it has been possible to show that statins confer benefits regardless of their effects on cholesterol levels (Baigent et al., 2005). In contrast, when data were not released and combined regarding the use of erythropoietin in renal patients who are anemic, the harmful effects of high-dose erythropoietin were overlooked (McCullough and Lepor, 2005). “This could have been detected much earlier if the right trials had been done and the data had been combined,” Califf asserted. The lack of trust extends even to physicians, who tend to discount studies of superior methodological rigor when they perceive that the studies have been funded by industry (Kesselheim et al., 2012). “If doctors do not be- lieve the evidence, what hope is there for evidence-based medicine?” Loder asked. Sharing data may generate problems that cannot be anticipated to- day, but it will also generate unanticipated benefits. “We are engaged in one of the great struggles of human knowledge—the struggle to liberate clinical trial information and make sure it is put to its best and highest use now and in the future,” Loder concluded. “It is a thrill to be part of this historic meeting.” Commitment to Open Science Every day, many people face difficult questions about health care, observed Harlan Krumholz, Harold H. Hines, Jr., Professor of Medicine at the Yale University School of Medicine. They need all of the infor- mation that is relevant to the options they are considering. If data are

OCR for page 9
16 SHARING CLINICAL RESEARCH DATA missing, their ability to make informed decisions will be impaired. This is the central argument in favor of open science, Krumholz said. Krumholz’s experience has been that whenever data are shared, whether voluntarily or not, new and important things are learned. In par- ticular, the release of participant-level data has generated vital new in- formation about the risks and benefits of drugs and devices. In some cases, access to this information leads to conclusions that contrast with the prevailing knowledge and changes the use of a drug or device. In other cases, it provides “nuance and understanding.” For instance, Krumholz described a study (also described by Loder) which found that unreleased data are about as likely to strengthen evidence for the use of a product as to weaken such evidence (Hart et al., 2012). “What is im- portant is that we support the idea that data are a social good and the best science takes place in the light,” he said. Krumholz shared his vision of a future where data sharing is widely accepted as being in everyone’s best interest and will be the cultural norm. “Data sharing [will be] an essential characteristic of being a good scientist and a good citizen,” he said. With the full release of data, com- panies would compete on the basis of science, not marketing. Academic researchers could get credit not only for the papers they publish, but for the knowledge generated from the databases they create. Industry has the opportunity to demonstrate leadership, restore trust, and reclaim its position of integrity through meaningful actions to share data, Krumholz continued. “You have a meaningful motivation,” he said. “The [medical] profession has less trust in your science than in [National Institutes of Health]-sponsored studies and is less likely to act on the re- sults of the trials you sponsored, not just the ones you conduct. The pharmaceutical and device industries no longer have the respect they once held. . . . The result is a situation that does a disservice to the pub- lic, the medical profession, and the vast majority of professionals in in- dustry who have extraordinarily high integrity and are in that industry for the right reasons.” Krumholz noted that an important cultural shift is already taking place. Some industry leaders have already taken steps to support data sharing and have contributed to major scientific advances as a result. For example, Medtronic’s decision to release the company’s data on a prod- uct that has nearly a billion dollars in annual sales was a powerful state- ment that the company was seeking the truth. The individuals who have made these decisions “realize that studies are only possible due to the generosity of people who consented to participate, and that we have an

OCR for page 9
THE BENEFITS OF DATA SHARING 17 obligation to ensure that the efforts of those subjects contribute as much as possible to knowledge generation.” Such transparency will also be essential to ensure the continuing flow of individuals who are willing to participate in trials, Krumholz added. In return for the privilege of selling a medical product to the public, industry bears a responsibility to ensure that all the data concerning the risks and benefits are available to everyone, said Krumholz. The current challenge is not to decide whether data should be released, but how to do so while being attentive to the needs and concerns of all stakeholders. In addition, the publication of summary results is not enough, according to Krumholz. Rather, individual patient-level data need to be broadly and freely available for investigators. “We need the protocols and case report forms. We need full sharing of the source data. . . . With the talent in this room, and with those listening on the webinar and those who are inter- ested, I know solutions can be found. If we are committed to the path, we can figure out how to do it.” CAUTIONS ON DATA SHARING Jesse Berlin, vice president of epidemiology for Janssen Research & Development, LLC, provided a countervailing view by asking whether participant-level data are always needed. Complications can arise when the data are reexamined, he said. Decisions may have been made during a clinical trial that cannot be replicated. Published studies may not al- ways incorporate the appropriate intent-to-treat analysis. Endpoints may be defined differently in different trials. Study designs, patient popula- tions, and treatments can vary from trial to trial. As a result of these and other potential problems, such analyses can go “seriously wrong,” Berlin warned. “It is not just a matter of feeling more comfortable having the individual-level data. You can actually get wrong answers.” Although there is a common belief that participant-level data can enable verification and reproduction of trial results, that premise is reli- ant on the trustworthiness of the shared data, warned Peter Doshi. Even participant-level data can lead investigators astray. For example, a com- puterized database of participant-level data may not reflect what is actu- ally recorded on a case report form. In some cases, it may be necessary to look beyond what people typically consider data (i.e., numbers) into more narrative forms of documentation depending on the intended use of the shared data.

OCR for page 9