Cloud computing challenges relating to clinical trial and research data largely center on data sharing and related topics such as data curation and harmonization, said Lee Lancashire. This requires aligning datasets to accommodate various outcome measures, standardizing data formats, and harmonizing analysis workflows and pipelines, he said.
Data sharing requires investment both in dollars and time, but may provide benefits in terms of transparency and enabling meta-analysis, triangulation, and distribution, said Lara Mangravite, president of Sage Bionetworks. Each research group may value these benefits differently, she said. Therefore, Sage has been exploring a series of models that pair governance and policy with infrastructure to support their individual goals. These models may range from the most open of open resource sharing to models that restrict who can see and use the data, she said. Rebecca Li added that to incentivize data sharing, the individual goals of each researcher or research group need to be addressed, whether those goals include publications, increased funding, fulfilling an ethical commitment to participants, or driving new science and discoveries.
Michael Egan, vice president of clinical research for neuroscience at Merck, said there has been a movement in industry to share more data. Some companies will give academic researchers access to datasets for analysis, while other companies use a web portal where researchers can submit data requests or specific analysis plans. However, Li noted that usage of data-sharing platforms has often lagged behind contribution of data on many platforms. Academic participation has been particularly slow, she said, although the culture in academia is changing with regard to data sharing. Educating researchers about the availability of data and different platforms may help boost data-sharing efforts, said Li.
According to Ramona Hicks, director of science and technology at One Mind, NIH’s approach to data collection in clinical trials has flipped from a sense that clinical trials were collecting too much unnecessary data and causing unnecessary expense, to the attitude that because one does not necessarily know what data will prove to be important for subtyping and other types of analysis, that may become necessary. Mangravite agreed that deciding how much data to collect, manage, and share is a challenge for data-sharing platforms and depends on the goals of the data collectors. Funders certainly want to see data sharing to improve transparency and reproducibility of the primary analyses, she said, while other stakeholders may be more interested in building a common resource for many people to use.
Harmonizing research data from the AMP and other non-clinical research projects present additional challenges, said Mangravite. She noted that the AMP-Alzheimer’s disease (AD) is actually a cluster of five different projects with different sample sets, collection methodologies, and
data types. While recognizing that harmonizing these data would increase their value, she said this benefit has to be balanced against the amount of work that would be required to do so. She said the AMP investigators have taken a middle ground, harmonizing the major human datasets and leaving some of the other data less harmonized, but making them available for reuse.
Li added that big, bloated clinical trials often result when researchers try to piggyback adjunct studies onto the trial, for example, by requesting collection of additional samples for a pharmacogenomics study. More simplified, streamlined trials are currently more common, she said. However, Hicks argued that one reason so many clinical trials have failed is because of insufficient understanding of the disorder, which can only be gained by conducting large observational, natural history studies that incorporate biomarkers, subtyping, and comparative effectiveness analyses. This could be achieved, she said, by piggybacking population-based data studies onto clinical trials. Li agreed, noting that there have been efforts to compare control arms from many Alzheimer’s trials. She added that clinical trials should plan for data sharing up front and for the reuse of data by understanding better what similar trials are being conducted and leveraging data that have already been collected.
Li described Vivli, a nonprofit, global, data-sharing platform built on the Microsoft Azure cloud. Vivli hosts a diverse group of pharmaceutical and biotech companies as well as academic centers conducting clinical trials in AD and Parkinson’s disease (PD) as well as other neurological disorders, she said. Each member sets its own boundaries in terms of what data they will share and when those data will be shared, said Li. For example, most stipulate that only anonymized data will be shared, and some will share completed Phase 1 through Phase 4 clinical trials data only after a regulatory decision has been made. She noted, however, that the diversity of stakeholders complicates efforts to come up with a single harmonized data use agreement.
Users coming to the Vivli site can search the platform for studies of interest and then request individual participant-level data (IPD). Vivli reviews the request against the data contributors’ specifications. If approved, the user can access and analyze the data in a secure research environment in the cloud using specific analytical tools provided by Vivli. In some cases users may be given permission to download data, said Li. Completed research results are assigned a DOI. Users may use the Vivli platform to meet publication and funder requirements, she said.
With regard to standardization of data, Li noted that FDA requires submitted data to adhere to CDISC standards.1 She said Vivli recommends this as well, but recognizes that many valuable datasets collected by academic researchers do not conform to CDISC standards. Vivli made the strategic decision to allow such data to be accepted. She suggested that in the future there may be machine-learning approaches that will enable standardization of such data.
When executing good, large, prospective trials, investigators have a duty to curate those data for future analyses, according to Lisa Merck. She identified two challenges: First, these data need to be curated and formatted in a way that enables other investigators to use them; and second, resources are needed to identify biomarkers and other covariates that may be buried as by-products of the first analysis. NIH and possibly other funders have designated funding for this use, said Hicks, although these funds may be underused. Providing investigators with some good examples of learnings derived from secondary analyses might encourage them to take advantage of these resources, she said.
Hicks and Mangravite both advocated building an inventory of data and platforms, adding that cross-community collaboration will be essential to making this a reality. Lancashire said Cohen Veterans Bioscience evaluated more than 100 different platforms prior to identifying the BRAIN Commons, and is currently writing a white paper to make this information widely available. Hicks noted that The Kavli Foundation has also addressed this as part of the International BRAIN Initiative.
With many data aggregation and sharing platforms existing or in development, Lancashire noted the potential value of sharing data across platforms. He suggested that identifying and aligning a core set of metadata would allow integration of cohort data across platforms. Recognizing that there would be barriers to doing this, Sean Horgan suggested picking a few different projects and convening investigators to start working on this. Mangravite agreed that a scientific use case approach is probably the best way to approach this problem. With AMP-AD and AMP-PD, for example, one was initially a target discovery project while the other was a biomarker discovery project. They have since morphed into each having a little bit of both. The opportunity to share data could prove highly productive, she said.
1 CDISC develops and advances data standards for clinical research, with a goal of making the data more interoperable and reusable. For more information, see https://www.cdisc.org (accessed December 12, 2019).
Egan added that FDA has a huge repository of raw data from clinical trials. He envisioned a future where academics could submit queries; then, if an FDA statistician approves and funding is obtained, analysis could be run internally, with the results given back to the researchers. Silvana Borges noted, however, that while FDA has access to a wealth of clinical trial data, most of it is proprietary data that will require consent from the companies who acquired the data. More than FDA willingness will be required, she said, to engage in that conversation with sponsors and others in the scientific community.
Harmonizing and sharing preclinical data represents another and possibly more difficult challenge because preclinical research is even more siloed than clinical research, said Hicks. However, she said that One Mind tried to do this in the traumatic brain injury field by identifying some common data elements. One factor that limits the interoperability of platforms focused on discovery research is a lack of incentives for funders to initiate projects that go beyond what their organization or their country is doing. Horgan noted that technology companies such as Google, Apple, and Microsoft have begun to invest heavily in these areas because they see a business case for it. Moreover, he said, they have the best incentive and the best personnel to think through some of the data sharing, data processing, discoverability, and interoperability challenges.
Horgan added, however, that an additional challenge is the “first mover disadvantage” whereby the developers of the earliest tools may have trouble sustaining their leading edge, and users may be reluctant to choose a new tool or platform because of the likelihood that something better will shortly become available. Consequently, said Horgan, although technology has advanced quickly, investigators have found it difficult to navigate the tool ecosystem and this has put a damper on scientific discovery. “We’re left with hard choices about spending today’s money on something that may not be sustainable, and we’re not budgeting for the transition from whatever we write into our budgets today and something better down the road,” he said.
This page intentionally left blank.