National Academies Press: OpenBook

Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop (2020)

Chapter: 6 Governing, Funding, and Sustaining Cloud-Based Platforms

« Previous: 5 Assigning Credit, Determining Ownership, and Licensing Data in the Cloud
Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×

6

Governing, Funding, and Sustaining Cloud-Based Platforms

Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×

As data migrate to a cloud-based environment, issues of data ownership, how the data will be used for scientific discovery, and who has access to the data become uncoupled, making the need for clear governance and oversight plans essential, said Anthony Philippakis, chief data officer at the Broad Institute of MIT and Harvard. Indeed, said Sean Horgan, lead project manager at Verily Life Sciences of the company’s biomedical research platform, data access policies inherited through large existing datasets have failed to keep up with what scientists now see as the need for cross-dataset analysis. New policies need to be drafted for the next generation of data, he said, which will require coordination across new datasets such as those being generated by the various AMP initiatives, the All of Us research program,1 Sage Bionetworks, and others.

Institutional policies around cloud and data governance are set primarily by chief information officers (CIOs), lawyers, privacy officers, and information security officers, with little engagement of scientists themselves, said Ruth Marinshaw, chief technology officer for research computing at Stanford University. Scientists need to advocate more strongly for a seat at the table where governance decisions are made, she said. Perhaps a new position needs to be defined that brings the researcher’s perspective to these deliberations, said Adam Ferguson, associate professor of neurosurgery at the University of California, San Francisco. From the researcher’s perspective, institutional restrictions on data sharing can be viewed as restrictions on academic freedom, and completely at odds with the NIH mandate, added Ferguson. “These are freight trains going at a head-on trajectory toward each other, and should be sorted out with transparency,” he said. Research participants should also be involved in this decision-making process, added Gregory Farber, director of the Office of Technology Development and Coordination at NIMH.

At NIH, dbGaP has provided the voice of the government and served as an honest broker in bringing groups together to decide who can access genomic data and for what research purposes, said Philippakis. As dbGaP data move to the cloud, NIH plans to continue playing that role, said Farber. Among the issues to be addressed are whether data use aligns with existing informed consent policies, or whether current policies reflect the world of 20 years ago and need to be updated.

CURRENT PROMISING PRACTICES FOR DATA GOVERNANCE IN THE CLOUD

The Office of Data Science Strategy has as one of its tenets sustainability around data, said Nick Weber. They are currently piloting a program with

___________________

1 For more information, see https://allofus.nih.gov (accessed November 11, 2019).

Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×

Figshare2 where NIH is providing funding up front for anyone with a dataset of a certain size that will be put into general purpose repository for long-term sustainability, said Weber. He added that NIH is encouraging researchers to use STRIDES to manage very large datasets in the cloud in part so that NIH can gather reporting insights, information on costs, and information on funding to help make long-term sustainability decisions.

The All of Us research program has been innovative on two fronts related to the research participant and the dynamic between the research participant and researcher, said Philippakis. First, all data collected on a research participant are returned to the participant, and second, when a researcher gains access to data, he or she is required to provide information about the research team and how they intend to use the data. “Researcher privacy isn’t really a thing or maybe it shouldn’t be,” said Philippakis. Rather, letting research participants be involved in policing oversight is innovative, he said. Horgan noted that when Verily wanted to create a data user agreement, they started by looking at the All of Us agreement.

Leveraging technology to remove some of the human-specific tasks involved in data use oversight could also make the process more efficient and consistent, said Philippakis. His team showed that a simple machine-readable ontology could be created for about 95 percent of use cases, and then ran an experiment comparing an automated versus traditional data use oversight approach. Not only was the automated approach identical to the traditional approach in most cases, but when there were disagreements, the automated approach provided more consistent answers.

ISSUES TO BE RESOLVED REGARDING DATA USE AND ACCESS, ANALYSIS, USER TRAINING, AND PLATFORMS SUSTAINABILITY

Each institution sets its own rules, which hinders collaboration and efficiency, said Rosa Canet-Avilés, director of neuroscience research partnerships at FNIH. For example, one of the biggest obstacles to data sharing is that every institution requires researchers to obtain IRB and ethics approval even for data generated elsewhere, said Jane Roskams. Thus, even data that are openly accessible can take months and years to obtain. Marinshaw suggested that institutions might be able to avoid creating these regulations in a vacuum if information was available on the governance rules and data use agreements established by other institutions such as NIH, Harvard, and the Broad Institute. Creating standard templates for data user agreements may also be helpful, added Horgan. Canet-Avilés added that harmonizing such templates across different types of data and cohorts could also be valuable.

___________________

2 For more information, see https://figshare.com (accessed November 11, 2019).

Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×

Determining when restrictive access policies are needed presents another governance dilemma, said Farber. The world would be a simpler place and data would be much more useful if general research use (GRU) consents were widely adopted, he said. However, while GRU consent may be applicable to bigger datasets, Farber suggested that smaller and more specialized “edge” cases may need more restrictive policies. Philippakis added that while nearly everyone agrees that individual-level data should not be put into open access domains, aggregated data may be fine to put in the public domain. However, there is no cut point that defines when data are aggregated enough for sharing, he said.

Philippakis suggested that as new cohorts are generated, GRU provides many benefits. He noted, however, that existing cohorts are also extremely valuable even though the consents obtained in setting them up may not allow generalized use. Another challenge with integrating data from older studies is that those data may not exist in digital form, said Silvana Borges, associate director for regulatory science in the Office of Drug Evaluation II at FDA’s Center for Drug Evaluation and Research (CDER).

Canet-Avilés said it would be helpful if there was a single clearinghouse where investigators could access information about various aspects of governance, such as data use agreements for different types of data and different levels of access. Valerie Virta, American Association for the Advancement of Science Science & Technology Policy Fellow at NIH, concurred, noting that NIH is poised to provide guidance that could be helpful to the community and help propagate best practices. Bringing a larger group of investigators and organizations together to share learnings on governance problems and solutions could be valuable, said Alyssa Picchini Schaffer, senior scientist at the Simons Foundation. Marinshaw agreed about the need to engage a broader group of participant institutions, possibly by issuing requests for information on various issues related to governance practices.

A system that defines the required qualifications of researchers to access controlled data, and to track researchers when they move from one cloud to another, is also needed, said Philippakis. The technology exists to build such a system, he said, but the organizational structure does not exist.

Governance committees may also address when cloud storage is appropriate, considering factors such as cost, safety, and the amount of data involved, said Farber. The cost of cloud storage is low at first glance, said Marinshaw, but the data management, movement, and curation can be expensive. Generally, when data are stored in the cloud there are more resources and technologies that can be employed in cost-effective ways, but researchers need to be educated on costs and benefits, said Horgan.

For example, Lisa Merck, associate professor of emergency medicine and vice chair of research at the University of Florida, said that for the

Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×

BOOST3 clinical trial, which is looking at cerebral oxygenation-driven therapy after severe traumatic brain injury, continuous brain oxygenation and multiparametric data are being collected and stored in the cloud from 45 centers. She suggested that an alternative might be to publish datasets that have been curated and analyzed in a large national library that would be publicly accessible rather than relying on cloud-based services. Farber said there are some efforts to move in this direction, but added that this approach raises other governance issues such as how long to keep the data in storage.

Philippakis added that while data storage on the cloud versus on an institution’s own infrastructure may be somewhat cheaper, it can be painful simply because it is a change. But he suggested that cloud storage also incentivizes other good outcomes such as data sharing. Whether data are stored in the cloud or “on prem” (i.e., on the premises of a research organization), Philippakis said another important concern for investigators is getting locked into a certain technology that could disappear if the company goes out of business or becomes obsolete as technology improves. He suggested that investing in open-source technologies that can be built and maintained in the community offers the best defense against that problem. Horgan added that open source is valuable not just for software, but for configurations of datasets and best practices associated with sharing code as well.

One of the main impediments to the goal of using the cloud to accelerate science is a lack of knowledge among researchers about how to work with different cloud-native data models and tools, said Marinshaw. Increasing training and providing researchers with information from a variety of demonstration cases could help address this problem, she said. Horgan added that there are also gaps and disparities with the tools that exist in the cloud and how these tools provide different user experiences in different cloud environments. Dedicated experts investing time with a user research team to understand the specific tasks a researcher wants to accomplish, rather than forcing the researcher to learn how to write their own queries to accomplish that task, could support making cloud use more efficient, he said. Roskams added that the user journey is further complicated by the fact that most platforms have failed to provide users with roadmaps that will guide them in how to manage, store, and wrangle their data. Developing training modules, possibly through INCF, or conducting hands-on training workshops could alleviate this problem, said Roskams.

Governance policies may also address training. Most institutions currently require animal ethics and/or human ethics certification for researchers working with animals or humans, noted Roskams. She suggested that it might also be helpful to require data ethics and data understanding certification. Huerta said his office is also looking at staff training, so that

Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×

program officers who do not have portfolios dedicated to computational biology will better understand these concepts when they are evaluating budgets and proposals.

Finally, an important consideration related to governance is how to ensure the sustainability of cloud-based platforms. Magali Haas noted that many platforms are funded for a limited time period through grant mechanisms with no mechanism for renewal. Canet-Avilés noted, however, that for AMP, a public–private partnership between NIH and private organizations, the model they are developing is that data platforms eventually will be sustainable through government funding. Funding is not the only factor that affects sustainability, however. Sustaining the kind of cloud support engineer talent needed to support research projects has also proved challenging, according to Russell Poldrack and Weber. One approach taken by the Office of Data Science Strategy, according to Weber, is to develop programs that recruit people from outside government for one year or two for projects they might find very interesting, enabling them to internally train and raise the knowledge level of the rest of the research staff.

Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 37
Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 38
Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 39
Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 40
Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 41
Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 42
Next: Part 2: Different Types of Neuroscience Data: Challenges and Potential Opportunities »
Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop Get This Book
×
 Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop
Buy Paperback | $45.00 Buy Ebook | $36.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The cloud model of data sharing has led to a vast increase in the quantity and complexity of data and expanded access to these data, which has attracted many more researchers, enabled multi-national neuroscience collaborations, and facilitated the development of many new tools. Yet, the cloud model has also produced new challenges related to data storage, organization, and protection. Merely switching the technical infrastructure from local repositories to cloud repositories is not enough to optimize data use.

To explore the burgeoning use of cloud computing in neuroscience, the National Academies Forum on Neuroscience and Nervous System Disorders hosted a workshop on September 24, 2019. A broad range of stakeholders involved in cloud-based neuroscience initiatives and research explored the use of cloud technology to advance neuroscience research and shared approaches to address current barriers. This publication summarizes the presentation and discussion of the workshop.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!