The second panel presented four current examples of data-sharing repositories. Rebecca Li, executive director of Vivli, shared lessons learned from the Vivli platform since its launch in 2018. Joseph Ross, professor of medicine and public health at the Yale School of Medicine, discussed the operations and impact of the Yale University Open Data Access Project. Frank Rockhold, professor of biostatistics and bioinformatics at the Duke Clinical Research Institute (DCRI), described the multicomponent data-sharing process used by the Supporting Open Access for Researchers (SOAR) platform. Scott Shaunessy, co-founder and chair of ClinicalStudyDataRequest.com (CSDR), discussed challenges and solutions associated with launching CSDR, one of the first data-sharing platforms. The panel discussion was moderated by Deborah Zarin.
Additional details about each of the platforms were provided on posters that were available in the meeting room throughout the workshop.1
Rebecca Li, Executive Director, Vivli
Vivli is a nonprofit organization focused on clinical research data sharing that provides a convening function for stakeholders in biomedical
1 The posters presented are available on the workshop webpage, http://nationalacademies.org/hmd/Activities/Research/DrugForum/2019-Nov-18.aspx (accessed February 10, 2020), under the Attachments menu.
research, including industry, academia, nonprofit funders and foundations, government, and patient advocates.2 Launched in July 2018, Vivli houses a clinical trial data-sharing platform also called Vivli, Li explained.
Since its inception, more than 4,600 clinical trials have been contributed to the Vivli platform by Vivli members.3 The trials include data from more than 2 million participants from more than 100 countries. Vivli has met strong and sustained enthusiasm, Li said. During the repository’s first year, more than 80 data request proposals were submitted, and data from more than 80 trials involving approximately 65,000 participants were accessed. Li added that the platform is anticipated to achieve financial sustainability in the first quarter of 2020. Searches of the repository span a broad range of outcomes, diseases, and products. Zarin asked about the extent to which trials contributed to Vivli link back to their ClinicalTrials.gov registry entry. Li said that some members are starting to carry out such linking, but some have expressed concern about the need to update the entire ClinicalTrials.gov record retrospectively.
The Vivli Model
The 2015 Institute of Medicine (IOM) consensus study report Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk (IOM, 2015) was used as a blueprint in the development of the Vivli platform, Li said. Vivli strives to uphold the report’s general principles, especially “conducting the sharing of trial data in a fair manner.” Vivli uses the flowchart for what data to share and when from the 2015 IOM report when training new members,4 and employs a harmonized data use agreement (DUA). As mentioned by Kochhar (see Chapter 2), the Wellcome Trust is the secretariat for the Vivli independent review panel (IRP), although some Vivli members choose to use their own IRP. In achieving sustainability, Li said that Vivli is a membership-driven model and operates under the principle expressed in the IOM report that those who benefit from data sharing should also be responsible for contributing financially (IOM, 2015). Vivli is also working toward becoming more searchable and interoperable, she said.
3 At the time of the workshop there were 21 members of Vivli, including industry, academia, nonprofit funders and foundations, and patient/disease advocacy organizations. See https://vivli.org/members/ourmembers (accessed March 2, 2020).
4 A printable brochure of the 2015 IOM report recommendation for when to share specific types of data is available at http://nationalacademies.org/hmd/~/media/Files/Report%20Files/2015/SharingData/RAAG_ShareData_Print.pdf (accessed February 10, 2020).
Li described some of the challenges for the Vivli data contributors and data users. The anonymization of contributed datasets can be challenging, especially for rare disease datasets. Sharing large datasets, often including tens of thousands of patients, also presents challenges, especially when the data include imaging. In response to a question, Li said that Vivli can offer solutions to researchers facing governance hurdles, such as when researchers want to share, but their institutions are reluctant.
One challenge for data users is the siloing of data sources. Some, but not all, of the data might be submitted to Vivli, with some data remaining elsewhere (e.g., within the institution). Li said this challenge is frustrating for users and limits analysis. Another challenge is the heterogeneity of data formats. Li added that data supply and demand are not balanced, and Vivli is working to increase awareness of this rich data resource and to raise the number of users.
Li concluded with three lessons learned from the Vivli experience:
- Balancing flexibility and harmonization. A data-sharing platform, Li noted, must be both flexible in meeting the needs of its individual users (e.g., in respecting the data contributor’s governance, review process, data types, and tools), and harmonious in providing the best user experience (e.g., by developing master DUAs to improve efficiency).
- Addressing the complexities of sharing individual participant data. The technical and governance aspects of data sharing are complicated, Li said. Platforms need to engage stakeholders to effectively address these challenges. She observed that in many cases, potential data contributors are not deciding which platform to use to share but are deciding whether to share or not. The challenge for platforms is to present sharing as the compelling option.
- Enabling data sharing. The role of a nonprofit organization that manages a platform is to enable data sharing by, for example, providing training and resources, lowering cost barriers, and providing digital object identifiers. A nonprofit platform cannot influence or incentivize data sharing. “The broader research ecosystem of funders, academia, and publishers holds the levers to reward investigators for sharing data for the public good,” Li said.
Joseph Ross, Professor of Medicine and Public Health, Yale School of Medicine
The Yale University Open Data Access (YODA) Project was launched in 2011 with the intent of making research data available to the broader scientific community.5 In 2014, the YODA Project formed a partnership with Johnson & Johnson to facilitate sharing of clinical trial data for the company’s pharmaceutical products (including data from legacy trials), as well as devices and diagnostics, Ross said. Policies and procedures for data access were developed with input from the YODA Project steering committee, stakeholders, and the public.
Ross outlined the guiding principles of the YODA Project:
- Promote the sharing of clinical research data to advance science and improve public health and health care.
- Promote the responsible conduct of research.
- Ensure good stewardship of clinical research data by external investigators.
- Protect the rights of research participants.
The YODA Model
Researchers can search for trials by National Clinical Trial number or can filter the repository entries by criteria such as product or generic name, therapeutic area, condition, or enrollment parameters. Information provided for each trial includes general information about the product and trial and (when available) links to the Clinical Study Report Summary, the ClinicalTrials.gov trial record, the primary citation, study data specifications, and the annotated case report form.
When secondary data users identify a trial of interest, Ross explained, they must apply for access to the data, while listing all investigators and their affiliations and funding and including a narrative summary, a public abstract, and a detailed proposal for the intended research. Ross said that the detailed proposal facilitates accountability and integrity as the broader research community can later compare the submitted request with the subsequent findings or publication. Data requesters must also submit a timeline and a dissemination plan and complete DUA training, Ross added.
5 Further information about the YODA project, including the data request process and current metrics, can be found at https://yoda.yale.edu (accessed February 10, 2020) and in Ross et al. (2018). The YODA Project is funded by a research grant through Yale from Johnson & Johnson and was formerly funded by Medtronic.
The YODA Project completes the process of reviewing applications for scientific merit within about 2 weeks. Proposals are reviewed to ensure that the scientific purpose of the work is clearly described and that it is possible to conduct the proposed study using the requested data. In addition, the review verifies that “the data requested will be used to generate or materially enhance generalizable scientific and/or medical knowledge to inform science and public health,” Ross said. The data generator partner reviews the proposal simultaneously to determine whether they can make the data available. The data generator partner’s due diligence includes assessment of the appropriateness of the requested data for the proposed research, and also considers whether patient privacy will be protected and if similar research studies have used the same data being requested.
After the data request is approved and the DUA is signed, the investigators can access the data on a secure platform via a virtual private network, which Ross said protects patient privacy and helps prevent further distribution of the data. He added that DUA negotiations add the most time to the request process, often taking 3 months or more to come to an agreement.
YODA Project Metrics
Since forming its partnership with Johnson & Johnson during the last quarter of 2014, YODA has made data available from 350 trials for request, 75 percent of which have been requested for use, Ross said. About 90 percent of the investigators making data requests are from academia, he said, and requests have come from around the world. The median number of trials requested per data request is three. Of 134 data request applications submitted, 87 percent (117) were approved, 3 percent are under review, and 10 percent have been withdrawn or closed because the data are not in the repository or cannot be made available for use (e.g., unable to adequately deidentify). He added that no request has yet been rejected. About 80 percent of the requests required administrative revisions, and about 30 percent required scientific revision prior to approval. Ross illustrated the status of the data requests over the previous 5 years (see Figure 3-1). He clarified that the studies indicated in orange have not pursued publication, but the results are reported on the YODA Project website. All requests for data are publicly posted on the YODA Project website after the DUA is signed, regardless of status, Ross said. It is not expected that all requests will lead to a publication, he said, but added that 31 manuscripts have been submitted for publication thus far, of which 26 have been published, and there have also been 26 posters and conference presentations. Ross highlighted several of the key publications that used data made available through the YODA
Project (see Corbett et al., 2017; Mbuagbaw et al., 2019). More than half of the data requests indicate that the purpose of the analysis is meta-analysis or to answer a new question. Other responses include clinical prediction, validation, statistical methods, clinical trial methods, pilot research, comparison group, or other. Ross noted that investigators can indicate more than one purpose of the analysis.
In conclusion, Ross said that as a result of the YODA Project, “there have now been numerous studies that might not otherwise have been feasible to pursue, some of which have impacted health policy and guidelines.” The platform has also enabled researchers to directly collaborate with the original investigators at Johnson & Johnson. Importantly, Ross said, Johnson & Johnson now conducts clinical trials with the intent that the data will be shared (i.e., planning in advance for data sharing).
With regard to concerns about potential unintended consequences, Ross said that “replication studies have supported, not undermined, the original study, there have been no instances of patient privacy breaches, no publications of spurious safety findings that received unwarranted attention or disrupted patient care, and no data have been used for commercial or litigious purposes.”
Ross highlighted several areas where attention is needed to better reap the rewards of data sharing. He reiterated the point by Li that there is a need to raise researcher awareness of the available data resources. He also noted that many researchers requesting data do not have expertise in using clinical data. It would be helpful for analyses if older trial data could be made available in a contemporary format and if sponsors adopted uniform data standards for current trials. Ross additionally noted a need to address sustainability and the costs of data sharing.
Frank Rockhold, Professor of Biostatistics and Bioinformatics, Duke Clinical Research Institute
The SOAR platform is a collaboration among the DCRI, academia, and industry that is intended to facilitate open and transparent sharing of clinical research data among investigators, data scientists, and statisticians to inform and accelerate science for the benefit of human health.6
6 Further information about the SOAR platform is available at https://dcri.org/our-work/analytics-and-data-science/data-sharing (accessed February 10, 2020).
Rockhold explained that one goal of the platform is to facilitate progress on data sharing in the academic world, as much of the clinical trial data sharing thus far has been done by industry. He outlined the Duke University School of Medicine’s principles for open science and data sharing (see Box 3-1) and emphasized that there are active discussions regarding the final principles about how to provide proper academic credit to researchers who share original data.
The SOAR Model
Although SOAR also serves as an independent review panel for clinical trial datasets being shared by industry, specifically Bristol-Myers Squibb (BMS), it contains additional components that make it different from other platforms. For example, SOAR shares the DCRI datasets, including the Duke Cardiac Catheterization Research Dataset and the Duke Cardiac Catheterization Educational Dataset. SOAR also functions as a portal to other datasets, including the Aggregate Analysis of ClinicalTrials.gov
database,7 and datasets that are shared by the National Institutes of Health, the American Heart Association, and others. Rockhold referred participants to his poster for full details about the SOAR platform.8
Rockhold briefly described the process for requesting access to BMS clinical trial datasets, which is comparable to that of the other platforms and includes IRP administrative review of proposals, internal review by BMS for availability of the data for sharing, and scientific review by content experts from within Duke. Rockhold described the scientific review as interactive, involving two-way communication between the IRP and the researchers to address concerns and resubmit the proposal.9 He recalled only one proposal that had been rejected by the IRP thus far, and it was rejected because the proposed study had been done and published by another group. One of the primary reasons the IRP requests additional information or revision is a lack of necessary expertise to conduct the proposed study (e.g., informatics, data scientists, statisticians). In most cases, he said, the investigators add the necessary subject matter experts.
Requests for BMS or Duke data are primarily from academia, with only a small number of requests coming from industry, Rockhold said, noting that sources of requests are similar to those experienced by the YODA project discussed by Ross (see above). He added that there are also about 100 pending requests that will be reviewed after the data become available for sharing, as noted above, 2 years from “database lock,” a step in clinical research that limits unauthorized modifications to clinical trial data.
There seems to be little overlap between those who contribute data and those who seek to access it, Rockhold observed. He reiterated that most of the data sharing is being done by industry while most of the data requesters are from academia, which he noted are two scientific communities that do not necessarily speak the same scientific language. Persistent barriers to data sharing include patient privacy concerns, costs, inoperability/lack of data standards, and data integrity. Rockhold said the challenge is to demonstrate that the benefits of data sharing outweigh the risks (see Figure 3-2).
8 Available on the workshop webpage, http://nationalacademies.org/hmd/Activities/Research/DrugForum/2019-Nov-18.aspx (accessed February 10, 2020), under the Attachments menu.
9 The review criteria are available at https://dcri.org/our-work/analytics-and-datascience/data-sharing/bms-studies (accessed February 10, 2020).
Rockhold suggested that the process of data sharing will become a bit easier when there is greater overlap between the communities (i.e., when more data contributors are also data users and vice versa). With a better understanding of each other’s perspective, data generators might collect and steward data in a way that better facilitates sharing, and data users might be able to propose secondary analyses that make the best use of the available data.
Scott Shaunessy, Co-Founder and Chair, ClinicalStudyDataRequest.com
CSDR was launched in 2013 as a clinical trial data-sharing initiative of GlaxoSmithKline, and then relaunched as a multi-sponsor site in 2014 with the addition of Roche, Novartis, and Boehringer Ingelheim. As one of the first data-sharing platforms, the initial focus of the platform was to protect data from misuse, Shaunessy said. The platform is constantly evolving, he said, and in recent years CSDR has worked with technology partners, including SAS,10 to provide analytics tools and to make
the platform experience more “researcher centric.” Shaunessy focused his panel remarks on challenges and solutions as he said CSDR essentially works the same way as the other three platforms presented (see above), and the metrics for CSDR were already discussed by Kochhar (see Chapter 2).11
Challenges and Solutions
Establishing a Truly Independent IRP
An early challenge, Shaunessy said, was that the IRP was not seen as truly independent because CSDR and the sponsor companies selected the IRP. To remedy this, Wellcome Trust was approached in 2015 about serving as the secretariat for the IRP (as discussed by Kochhar in Chapter 2). The IRP is now a rotating group of five members, with additional therapeutic area experts engaged by Wellcome Trust as needed.
Transparency and Metrics
Another challenge is ensuring transparency, and Shaunessy said that CSDR began to publish metrics on the website as soon as possible after launch and updates the numbers quarterly. He referred participants to his poster and the CSDR website for more information.12
Speed to Research
Shaunessy reiterated that a current focus is on making the platform more useful for researchers, and one element of doing so is reducing the time from request to data access. A range of factors can cause delays, but he said that “the single biggest obstacle has been the data-sharing agreement.” CSDR worked with all member companies to develop a single data sharing agreement that he said is favorable to the researchers’ institutions. However, he said that institutions still seek to negotiate aspects of the agreement. In an ongoing effort to increase transparency, Shaunessy noted that one of the metrics now being posted on the website is the “cycle time” from initial proposal to start of research.
12 The poster is available on the workshop webpage, http://nationalacademies.org/hmd/Activities/Research/DrugForum/2019-Nov-18.aspx (accessed February 10, 2020), under the Attachments menu.
Sharing Data Between Other Multi-Sponsor and Single-Sponsor Platforms
Another challenge for researchers is attempting to conduct a study across multiple clinical trials and across multiple companies when the data are housed in different data-sharing platforms, company systems, and/or academic institutions. In many cases, researchers must resort to making multiple separate data requests and then conducting parallel research projects, which they must then interpret and attempt to combine the results.
The next step for data sharing is for platforms and clinical trial sponsors to work together to identify solutions that allow secondary analyses to combine data across platforms. “The result will be better research, better outcomes, and the advancement of human health,” Shaunessy concluded.
Another challenge raised by Colin Baigent, deputy director of the Clinical Trial Service and Epidemiological Studies Units at Oxford University, is that data-sharing platforms, in their current form, do not readily enable sophisticated statistical meta-analysis, which requires holding a local copy of the data from multiple trials and sponsors. Therefore, his approach has been to try to obtain data directly from the originator (i.e., keep a local copy of the data with which to work). Shaunessy agreed and said the mindset about directly sharing data is shifting, and he said he was aware of one CSDR member company that is planning to directly share data.
This page intentionally left blank.