The third panel to consider the current landscape for clinical trial data sharing and reuse provided stakeholder perspectives on balancing the value and benefits of sharing data with the risks and costs. Or, as expressed by speakers throughout the workshop, “is the juice worth the squeeze?” David DeMets, professor emeritus in the Department of Biostatistics and Biomedical Informatics at the University of Wisconsin, described four case examples of the value that can be derived from data sharing. Jeffrey M. Drazen, New England Journal of Medicine group editor, discussed the Systolic Blood Pressure Intervention Trial (SPRINT) Data Analysis Challenge as an example of how clinical trial data can be used to identify new findings of medical importance. Deborah Peel, founder and president of Patient Privacy Rights, discussed public attitudes toward health data privacy. The panel discussion was moderated by Bernard Lo.
David DeMets, Professor Emeritus, Department of Biostatistics and Biomedical Informatics, University of Wisconsin
DeMets shared four case examples of the value that can be derived from secondary analysis of shared data, three of which demonstrate how data sharing can uncover issues and provide lessons (e.g., elucidate the impact of methodology on interpretation, identify errors or fraud), and a fourth that led to new insights and additional publications and collaborations.
Data Sharing Can Uncover Issues
Anturane Reinfarction Trial
The Anturane Reinfarction Trial was a randomized controlled trial comparing Anturane (sulfinpyrazone) to placebo for preventing mortality in patients who had experienced myocardial infarction. At the request of the U.S. Food and Drug Administration, DeMets conducted an independent secondary analysis of the trial data, which revealed that not all mortality events had been reported in a 1980 publication on sudden death events during the trial (ART Research Group, 1980; Temple and Pledger, 1980). Participants had been removed from the analysis because it was determined that they were not eligible per the preestablished trial entry criteria (i.e., they should not have been enrolled in the trial). Analyzing the full dataset, including the excluded patients, DeMets found additional mortality events. Taking these additional events into account did not
change the overall trends observed with the treatment, DeMets said, but did eliminate the statistical significance. He suggested that this trial “contributed to the concept of intention to treat (ITT) as the primary analysis method.” (In an ITT analysis, all participants who are randomized into the trial are included in the analysis of the group into which they were randomized.) This example of a secondary analysis of the data demonstrates how the methods of analysis matter in interpretation.
The Adenomatous Polyp Prevention on Vioxx (APPROVe) Trial was a randomized controlled trial of Vioxx (rofecoxib) versus placebo for the prevention of colon cancer.1 An analysis published in 2005 suggested that Vioxx was associated with increased cardiovascular events, and the trial was terminated early (Bresalier et al., 2005). Per the protocol, participants were followed for 14 days after discontinuation of therapy. After the trial was terminated, DeMets and colleagues conducted a follow-up analysis of cardiovascular outcomes in patients for an additional year after termination of treatment and found that participants had an increased rate of select cardiovascular events (Baron et al., 2008). This secondary analysis allows the impact of informative censoring2 to be shown. In other words, “off drug does not mean off study,” DeMets said.
Genomic Predictors of Cancer: The Duke Translational Omics Case
In the third example described by DeMets, secondary analysis of shared data uncovered fraud and led to the retraction of publications. Duke University researchers Anil Potti and Joseph Nevins published several high-profile papers describing genomic predictors of cancer risk and therapeutic response, and these genomic predictors were subsequently used in Duke clinical trials. However, statisticians at MD Anderson were unable to reproduce the Duke researchers’ results. Further analysis of data shared by Duke revealed numerous concerns with both the data and the analysis (Baggerly and Coombes, 2009). Ultimately, DeMets said, the work by Potti and Nevins was shown to be fraudulent, and many of the papers were withdrawn. He referred participants to
2 Censored data are data that are outside of an established boundary or threshold and are therefore not fully elucidated (in this case, the censored data are the cardiovascular events that occurred beyond the 14-day follow-up period). Informative censoring is when clinical trial participants are lost to follow-up because of study-related reasons (in this case, the lack of data collection beyond 14 days after discontinuation of treatment is a result of the predefined study protocol).
the 2012 Institute of Medicine (IOM) consensus study report Evolution of Translational Omics: Lessons Learned and the Path Forward for more detailed information about the case (IOM, 2012b).
Data Sharing Can Lead to Additional Publications and Collaborations
The COMPANION Trial
The Comparison of Medical Therapy, Pacing, and Defibrillation in Heart Failure (COMPANION) Trial was a randomized controlled clinical trial of a pacemaker, with or without defibrillator, versus optimal pharmacologic treatment in patients with heart failure.3 The study found that a pacemaker, alone and with a defibrillator, provided significant survival benefit over medical therapy (Bristow et al., 2004).
DeMets, the lead statistician for the study, had the opportunity to revisit the data a decade after the primary and secondary papers were published and develop additional manuscripts for publication. DeMets set out to reproduce the original 2004 study prior to conducting additional analyses and faced several barriers. Finding the data from a decade ago was difficult (data are often migrated to a different system), and the dataset he first retrieved and used was not the final file that had been used for the published paper. When the correct documentation was ultimately retrieved, it lacked key details needed to reproduce the study, and the software used a decade prior had changed, he said. After finally reproducing the original paper, DeMets was able to publish several additional analyses and said that several more are in progress.
DeMets and Lo have previously published on the need to promote collaboration between the original clinical investigators and the researchers conducting secondary analyses (Lo and DeMets, 2016), and DeMets said he engaged members of the COMPANION Trial steering committee. He observed that being offered co-authorship seemed to help reduce their resistance to data sharing and secondary analysis. DeMets also engaged new cardiovascular investigators (i.e., medical fellows, younger faculty). Together, this approach allowed the authors to gain new insights into the use of pacemakers and defibrillators in heart failure patients, he said.
Reflections on the Benefits and Costs of Data Sharing
Speaking from his perspective as a data analyst, DeMets reflected on the potential benefits and costs associated with data sharing. Archiving a complete clinical trial data file in a form and level of detail sufficient for
future use requires significant effort and cost. Despite one’s best efforts and intentions, it is likely that details needed to reproduce the findings are not documented sufficiently, he said. Secondary analyses can catch errors and fraud which, although infrequent, are important. Secondary analyses also allow for the investigation of alternative analysis approaches. Importantly, “data sharing can produce further research and maximize the benefit of the trial,” DeMets said. In closing, he said, “Phase 3 randomized controlled trials collect more data than are ever typically used,” which creates challenges for documentation and storage. He advocated for designing simpler trials that minimize the amount of data that are documented and stored and suggested concentrating data-sharing resources on pivotal Phase 3 clinical trials over earlier phase studies.
Jeffrey M. Drazen, New England Journal of Medicine Group Editor
Drazen discussed some of the concerns about data sharing and secondary analysis raised by three key stakeholder groups: clinical trialists, participants, and data scientists; he further presented the SPRINT Data Analysis Challenge as an example of how clinical trial data can be used to identify new findings of medical importance. He also took the opportunity to respond to the comment by DeMets (see above) about the vast amount of data collected in Phase 3 clinical trials. Drazen said that investigators tend to collect as much information as possible because they fear reaching the end of the study and realizing an important data element was not collected.
Historically, clinical trialists have essentially owned their trial data by default and have collaborated almost exclusively with other clinical groups studying the same disease area, Drazen said. Around the mid-2000s, the emergence of data science and the ability to store large volumes of data led to data scientists requesting access to clinical trial data to conduct independent analyses. Sophisticated statistical packages are now available, and early-career scientists are adept at writing code for data analysis.
Drazen explained that clinical trialists are invested in formulating a sound clinical question, collecting data, analyzing the data according to a preestablished statistical analysis plan, and drawing a conclusion to answer the clinical question. In general, any deviation from the estab-
lished clinical protocol is avoided, as it can introduce bias. “When you hunt for signals, you find them,” Drazen said, and when health and lives are at stake, it is essential to be sure that the signals identified, whether positive or negative, are real. Therefore, clinical trialists are concerned about drawing clinically directive conclusions from post hoc analyses. Post hoc analyses are instead used for informing the design of the next clinical trial. Drazen noted that there are many examples where a strong signal that was apparent in the dataset was not borne out in a subsequent clinical trial. Another concern is that conducting a trial takes a lot of time and effort, and trialists want to reap the academic reward for their work, one form of which is publications in journals.
A primary concern for trial participants is privacy. As discussed by Moses Taylor, Jr. (see Chapter 2), patients want their data to be widely but responsibly used, Drazen said. The data users, he said, need to define how the data will be used and to ensure responsible use. A question for consideration is who should be allowed to use the data? For example, Drazen said, if participants could potentially be identified from the data, should the study be subject to institutional review board review to ensure that data are not misused?
Data scientists believe transparency and data sharing can promote public confidence in clinical research, Drazen said (e.g., through the reproduction or replication of studies). When data are not made public, there can be mistrust of the research and study outcomes. Importantly, secondary analysis of shared data by data scientists can reveal new information and ideas that can advance human health, he said.
SPRINT Data Analysis Challenge
Drazen elaborated on SPRINT, introduced by Taylor (see Chapter 2). Planning for SPRINT began in 2009 with the trial anticipated to run until 2017. The study compared more intensive blood pressure control (lowering to at or below 120 mm Hg) to standard blood pressure control (lowering to at or below 140 mm Hg) in nearly 10,000 patients age 50 years or older with increased cardiovascular risk and systolic blood pressure between 130 and 180 mm Hg.
In 2015, SPRINT was stopped early when the Data and Safety Monitoring Board (DSMB) observed the significant positive outcomes associ-
ated with intensive intervention (SPRINT Research Group, 2015). Drazen relayed that the case rate of high blood pressure complications (e.g., heart attack, stroke) was significantly lower in the intensive treatment group versus the standard treatment group (16 cases per 1,000 versus 21 cases per 1,000, respectively).
Following the announcement by the National Institutes of Health (NIH) that SPRINT was being stopped early due to the important positive findings, Harlan Krumholz and Eric Topol opined in The New York Times that the data should be made public immediately, which was prior to the peer-reviewed publication of the final results.4 Drazen said that, in his opinion, the clinical investigators know the data best and deserve the opportunity to analyze it, and report on it on behalf of the study participants who have put themselves at risk for the study, before making the dataset public. He noted that the study results were published just 2 months later.
The dataset underlying the findings presented in the 2015 SPRINT publication was released 1 year later. To explore how clinical trial data can be used to identify new findings of medical importance, the New England Journal of Medicine sponsored the SPRINT Data Analysis Challenge.5 Of 279 requests for access to the data that were received from around the world, 218 advanced to a qualifying round designed to demonstrate that the requester had the necessary statistical skillset to use the data. Two hundred research groups advanced to the challenge round, and 143 entered the final round. Drazen reported that, among the three finalists, one group of mid-career researchers said they spent several weeks deriving their response to the challenge round (and required a hint from the Challenge), while a group of second-year medical students said they obtained the correct answer in about 20 minutes. The difference, Drazen observed, was that the younger-generation researchers had developed the skills to code their own statistical analysis packages as needed. The new generation entering medicine has expertise in data science, he said, and can use shared clinical trial data to identify novel findings. Drazen also noted that after the dataset was made public, errors were found by others; although none impacted the outcome, they were corrected. He concluded that there are many positive aspects of having many different people analyze clinical trial data, but there are also challenges, such as the cost and effort to facilitate sharing. To encourage both data sharing and secondary analysis, it is important to demonstrate the value that can be obtained, and examples are needed of how secondary analysis has led to behavioral change that impacted health as confirmed by another clinical trial.
4 See https://www.nytimes.com/2015/09/18/opinion/dont-delay-news-of-medical-breakthroughs.html (accessed February 10, 2020).
Deborah Peel, Founder and President, Patient Privacy Rights
Peel, a practicing physician and Freudian psychoanalyst, spoke from her perspective as a patient privacy rights advocate. She discussed public attitudes toward the privacy of health information in general (i.e., not specific to clinical trials). She emphasized that there is a universal human right to privacy.6 In Europe, she said, “privacy means the right to self-determination, autonomy, respect, and control over personal information.” She noted that there is not an established definition of this privacy in the United States. Furthermore, she referred participants to a section of the 2002 Amended Health Insurance Portability and Accountability Act Privacy Rule that she said actually strips individuals of their right to control their own health data (67 Federal Register 53,183). Specifically, Peel said that “covered entities can use and disclose information for treatment, payment, and health care operations.” She expressed concern that most uses could be deemed to fall into those categories, and that health data are being treated as a “corporate asset.” She also pointed out that the flow of personal data—or where and with whom the data are shared—is not mapped, making the environment difficult to understand.
Peel mentioned two current examples of sharing of real-world health data. Project Nightingale by Google is collecting patient health data through an agreement with a large health system and without the knowledge of patients or providers.7 Apple is now involved in medical research by, for example, enabling clinical research recruiting through a phone app and using new technologies, such as the Apple Watch that can collect health data, to conduct research in new ways.
Patient Attitudes About Privacy
The research on the public’s attitudes about privacy and the use of their health data is very limited, Peel said. Although there have been surveys of clinical trial participants (e.g., Mello et al., 2018), there are few, if any, published surveys of the general public. She mentioned one survey done in 2007 by Alan Westin at the request of an IOM consensus
6 This is mentioned in Article 12 of the Universal Declaration of Human Rights. See https://www.un.org/en/universal-declaration-human-rights (accessed March 2, 2020).
7 Peel listed several recent news articles as background on Project Nightingale, including https://www.wsj.com/articles/google-s-secret-project-nightingale-gathers-personal-health-data-on-millions-of-americans-11573496790 (accessed February 10, 2020), and on the Apple initiative, https://www.nytimes.com/2019/11/14/technology/apple-harvardhealth-studies.html (accessed February 10, 2020).
study committee to inform their work. Peel highlighted Westin’s finding that “only 1 percent of the public would ever agree to unfettered access to their records without consent.” In a 2016 survey of U.S. consumer attitudes toward health information technology, 89 percent of respondents said they withheld health information from their health care providers.8 There is public distrust in the system, which impacts the quality of data in the health system, which can impact health outcomes, she continued.
Peel noted that there is some movement toward ensuring that the public’s health data remain private. As mentioned, the Apple initiative is enabling researchers to reach out to members of the public who are interested in participating in research. Apple has assured Apple Watch users that their health data will never be shared by Apple or its partners without their consent. Peel suggested the need for something similar to a “nutrition label” for smartphone and device apps that would provide users with information on the level of privacy protection afforded them.9 Smartphones and wearables can now collect large volumes of data about an individual’s health as well as many other relevant parameters, and Peel said the first users of those data should be the individuals themselves. She referred participants to the work of Alex “Sandy” Pentland of the Massachusetts Institute of Technology and his “New Deal on Data” for further discussion of an individual’s ownership of their own data and control of the data flow.10
Metrics of Value
Colin Baigent asked about the metrics available to determine whether the value created by the sharing of clinical trial data is worth the costs. Quantitative metrics of the impacts and outputs of data sharing are needed to make the business case for the allocation of resources to support sharing. He said he had searched and found that the publications arising directly from work done by the three SPRINT Data Analysis Challenge winners had been cited only 14 times, which he said did not seem to demonstrate sufficient value for the effort.11
8 Peel referred participants to https://blackbookmarketresearch.newswire.com/news/healthcares-digital-divide-widens-black-book-consumer-survey-18432252 (accessed February 10, 2020).
9 Further information about the proposed label is available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3439701 (accessed February 10, 2020).
11 See https://www.nature.com/articles/s41746-019-0156-3 (accessed March 31, 2020); https://www.ncbi.nlm.nih.gov/pubmed/31067189 (accessed March 31, 2020); and https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1002410 (accessed March 31, 2020).
Drazen recalled that, in the mid-1990s, NIH and Wellcome Trust required their grantees involved in the Human Genome Project to deposit DNA sequence data in a public database. He noted that opinion in the genetics community was divided, with some supporting the sharing of sequences to advance the field and others opposing the requirement to give away the results of their hard work. Today, he continued, it is standard practice to deposit sequences into the National Center for Biotechnology Information database of Genotypes and Phenotypes (dbGaP)12 and other databases. Researchers worldwide can access the shared sequence data to inform the development of experimental questions and study design.
With this in mind, Drazen suggested that the number of peer-reviewed publications associated with shared clinical trial data might not be the ideal metric of the value of clinical trial data sharing. Like sequence data, clinical trial data might be accessed to inform study design or help conceptualize questions, which he said could reduce the need for putting patients at risk for exploratory studies. Baigent agreed that citation metrics are not the only option. Both agreed that the challenge is how to measure uses of shared data that do not directly result in a publication. Lo added that there might be other unpublished uses of trial data and emphasized the importance of data-sharing platforms asking users who accesses the data, what they are using it for, and whether the data were helpful for their task.
Matthew Sydes, professor of clinical trials and methodology at the University College London, observed that SPRINT was a large trial of 10,000 participants, resulting in a vast amount of data. He asked whether there is a minimum enrollment size at which trials have more utility for data sharing and for analysis challenges such as that done with the SPRINT data. DeMets agreed that, in general, it is most efficient to target resources and efforts toward shared data from well-powered studies. Drazen cautioned against defining which trials are important to share based on enrollment size. A large trial can have no relevant findings or impact, and a small trial can change clinical practice. It is not possible to know until the end. Data from trials that could potentially change clinical thinking should be shared, Drazen noted.
Combining Data Across Trials and Platforms
Alex Sherman highlighted the importance of being able to merge and harmonize data from multiple clinical trials.13 He mentioned two such
existing models of data sharing. The Critical Path Institute, for example, has been facilitating data sharing for more than a decade, merging data from multiple clinical trials in a disease area or across multiple diseases.14 There is also the amyotrophic lateral sclerosis (ALS or Lou Gehrig’s disease) dataset that merges harmonized data from 23 ALS clinical trials. He said that 1,500 researchers from 64 countries have used the data, leading to multiple publications. He also mentioned an ALS crowdsourcing challenge in which 1,100 researchers participated. Lo suggested that perhaps the time has come for platforms that facilitate sharing of single studies to combine their resources and enable researchers to combine data and conduct analyses across platforms.
Informed Consent for Data Sharing
Sherman noted the need for standardized data-sharing practices and standardized legal documents. He said the informed consent form for all clinical trials his institution conducts includes participant consent that their data will be used for any medical research purpose. Peel raised concern that a choice between not sharing one’s data or giving blanket consent to share one’s data “forever in the future without asking the person again” is not a meaningful choice and contributes to patient distrust in the health system. Taylor said that, as a trial participant, “as long as I am aware that the data are going to be shared responsibly and I agree to be in the trial, I have given up the right to control those data. I am doing it voluntarily because I can see the benefit of giving up that right for my future health.” He said he understood that other trial participants might not want to sign away their privacy rights for the future, but he said people can also choose not to participate in the trial.
Sherman observed that participant willingness to share can be dependent on the disease, and that those with rare diseases are often very willing to share their data to facilitate medical advances. Drazen agreed that patients with rare or life-threatening conditions are often willing to take on greater risk for incremental progress. However, he suggested that people might not be willing to put their privacy at risk for a more common and relatively mild condition (e.g., hypertension). Sherman noted that there is greater risk of deidentification of data in trials for rare conditions, especially if the participants have been enrolled in multiple trials.
DeMets supported participants’ rights but added his concern that secondary analyses can be subject to bias depending on which partici-
pants consent or decline to share their data. The primary publication would include the full trial population. However, downstream analyses are impacted by if—and the extent to which—participants have consented to sharing of their data.