National Academies Press: OpenBook

Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop (2020)

Chapter: 6 Governing, Funding, and Sustaining Cloud-Based Platforms

« Previous: 5 Assigning Credit, Determining Ownership, and Licensing Data in the Cloud
Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 37
Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 38
Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 39
Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 40
Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 41
Suggested Citation:"6 Governing, Funding, and Sustaining Cloud-Based Platforms." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 42

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

6 Governing, Funding, and Sustaining Cloud-Based Platforms Highlightsa • New data access policies and a clearinghouse of information about governance are needed to manage access to the next generation of data (Canet-Avilés, Farber, Horgan, Philippakis, Picchini Schaffer, Virta). • Scientists and research participants should be included in mak- ing decisions about governance policies (Farber, Marinshaw). • Making governance rules and data use agreements available from institutions such as the National Institutes of Health, Harvard, and the Broad Institute; creating standard templates for data use agreements; or creating a clearinghouse of infor- mation about governance and data use agreements could enable institutions to establish more harmonized rules to enable data sharing (Canet-Avilés, Marinshaw). • Providing researchers with use cases that demonstrate success- ful use of the cloud could inform them about available tools and accelerate science (Horgan, Marinshaw). • Increased training is needed for researchers to learn to work with data models and tools in the cloud (Horgan, Marinshaw, Roskams). a These points were made by the individual workshop participants identified above. They are not intended to reflect a consensus among workshop participants. 37 PREPUBLICATION COPY—Uncorrected Proofs

38 NEUROSCIENCE DATA IN THE CLOUD As data migrate to a cloud-based environment, issues of data owner- ship, how the data will be used for scientific discovery, and who has access to the data become uncoupled, making the need for clear governance and oversight plans essential, said Anthony Philippakis, chief data officer at the Broad Institute of MIT and Harvard. Indeed, said Sean Horgan, lead ­ roject manager at Verily Life Sciences of the company’s biomedical p research platform, data access policies inherited through large existing datasets have failed to keep up with what scientists now see as the need for cross-dataset analysis. New policies need to be drafted for the next genera- tion of data, he said, which will require coordination across new datasets such as those being generated by the various AMP initiatives, the All of Us research program,1 Sage Bionetworks, and others. Institutional policies around cloud and data governance are set pri- marily by chief information officers (CIOs), lawyers, privacy officers, and information security officers, with little engagement of scientists themselves, said Ruth Marinshaw, chief technology officer for research computing at Stanford University. Scientists need to advocate more strongly for a seat at the table where governance decisions are made, she said. Perhaps a new position needs to be defined that brings the researcher’s perspective to these deliberations, said Adam Ferguson, associate professor of neurosurgery at the University of California, San Francisco. From the researcher’s perspec- tive, institutional restrictions on data sharing can be viewed as restrictions on academic freedom, and completely at odds with the NIH mandate, added Ferguson. “These are freight trains going at a head-on trajectory toward each other, and should be sorted out with transparency,” he said. Research participants should also be involved in this decision-making pro- cess, added Gregory Farber, director of the Office of Technology Develop- ment and Coordination at NIMH. At NIH, dbGaP has provided the voice of the government and served as an honest broker in bringing groups together to decide who can access genomic data and for what research purposes, said Philippakis. As dbGaP data move to the cloud, NIH plans to continue playing that role, said Farber. Among the issues to be addressed is whether data use aligns with existing informed consent policies, or whether current policies reflect the world of 20 years ago and need to be updated. CURRENT PROMISING PRACTICES FOR DATA GOVERNANCE IN THE CLOUD The Office of Data Science Strategy has as one of its tenets ­ ustainability s around data, said Nick Weber. They are currently piloting a program with 1  For more information, see https://allofus.nih.gov (accessed November 11, 2019). PREPUBLICATION COPY—Uncorrected Proofs

GOVERNING, FUNDING, AND SUSTAINING 39 Figshare2 where NIH is providing funding up front for anyone with a dataset of a certain size that will be put into general purpose repository for long-term sustainability, said Weber. He added that NIH is encouraging researchers to use STRIDES to manage very large datasets in the cloud in part so that NIH can gather reporting insights, information on costs, and information on funding to help make long-term sustainability decisions. The All of Us research program has been innovative on two fronts related to the research participant and the dynamic between the research participant and researcher, said Philippakis. First, all data collected on a research participant are returned to the participant, and second, when a researcher gains access to data, he or she is required to provide information about the research team and how they intend to use the data. “Researcher privacy isn’t really a thing or maybe it shouldn’t be,” said Philippakis. Rather, letting research participants be involved in policing oversight is innovative, he said. Horgan noted that when Verily wanted to create a data user agreement, they started by looking at the All of Us agreement. Leveraging technology to remove some of the human-specific tasks involved in data use oversight could also make the process more efficient and consistent, said Philippakis. His team showed that a simple machine- readable ontology could be created for about 95 percent of use cases, and then ran an experiment comparing an automated versus traditional data use oversight approach. Not only was the automated approach identical to the traditional approach in most cases, but when there were disagree- ments, the automated approach provided more consistent answers. ISSUES TO BE RESOLVED REGARDING DATA USE AND ACCESS, ANALYSIS, USER TRAINING, AND PLATFORMS SUSTAINABILITY Each institution sets its own rules, which hinders collaboration and efficiency, said Rosa Canet-Avilés, director of neuroscience research part- nerships at FNIH. For example, one of the biggest obstacles to data shar- ing is that every institution requires researchers to obtain IRB and ethics approval even for data generated elsewhere, said Jane Roskams. Thus, even data that are openly accessible can take months and years to obtain. Marinshaw suggested that institutions might be able to avoid creating these regulations in a vacuum if information was available on the gover- nance rules and data use agreements established by other institutions such as NIH, Harvard, and the Broad Institute. Creating standard templates for data user agreements may also be helpful, added Horgan. Canet-Avilés added that harmonizing such templates across different types of data and cohorts could also be valuable. 2  For more information, see https://figshare.com (accessed November 11, 2019). PREPUBLICATION COPY—Uncorrected Proofs

40 NEUROSCIENCE DATA IN THE CLOUD Determining when restrictive access policies are needed presents another governance dilemma, said Farber. The world would be a simpler place and data would be much more useful if general research use (GRU) consents were widely adopted, he said. However, while GRU consent may be applicable to bigger datasets, Farber suggested that smaller and more specialized “edge” cases may need more restrictive policies. Philippakis added that while nearly everyone agrees that individual-level data should not be put into open access domains, aggregated data may be fine to put in the public domain. However, there is no cut point that defines when data are aggregated enough for sharing, he said. Philippakis suggested that as new cohorts are generated, GRU provides many benefits. He noted, however, that existing cohorts are also extremely valuable even though the consents obtained in setting them up may not allow generalized use. Another challenge with integrating data from older studies is that those data may not exist in digital form, said Silvana Borges, associate director for regulatory science in the Office of Drug Evaluation II at the FDA’s Center for Drug Evaluation and Research (CDER). Canet-Avilés said it would be helpful if there was a single clearing- house where investigators could access information about various aspects of governance, such as data use agreements for different types of data and different levels of access. Valerie Virta, American Association for the Advancement of Science Science & Technology Policy Fellow at NIH, concurred, noting that NIH is poised to provide guidance that could be helpful to the community and help propagate best practices. Bringing a larger group of investigators and organizations together to share learn- ings on governance problems and solutions could be valuable, said Alyssa Picchini Schaffer, senior scientist at the Simons Foundation. Marinshaw agreed about the need to engage a broader group of participant institu- tions, possibly by issuing requests for information on various issues related to governance practices. A system that defines the required qualifications of researchers to access controlled data, and to track researchers when they move from one cloud to another, is also needed, said Philippakis. The technology exists to build such a system, he said, but the organizational structure does not exist. Governance committees may also address when cloud storage is appro- priate, considering factors such as cost, safety, and the amount of data involved, said Farber. The cost of cloud storage is low at first glance, said Marinshaw, but the data management, movement, and curation can be expensive. Generally, when data are stored in the cloud there are more resources and technologies that can be employed in cost-effective ways, but researchers need to be educated on costs and benefits, said Horgan. For example, Lisa Merck, associate professor of emergency medicine and vice chair of research at the University of Florida, said that for the PREPUBLICATION COPY—Uncorrected Proofs

GOVERNING, FUNDING, AND SUSTAINING 41 BOOST3 clinical trial, which is looking at cerebral oxygenation-driven therapy after severe traumatic brain injury, continuous brain oxygenation and multiparametric data are being collected and stored in the cloud from 45 centers. She suggested that an alternative might be to publish datasets that have been curated and analyzed in a large national library that would be publicly accessible rather than relying on cloud-based services. Farber said there are some efforts to move in this direction, but added that this approach raises other governance issues such as how long to keep the data in storage. Philippakis added that while data storage on the cloud versus on an institution’s own infrastructure may be somewhat cheaper, it can be pain- ful simply because it is a change. But he suggested that cloud storage also incentivizes other good outcomes such as data sharing. Whether data are stored in the cloud or “on prem” (i.e., on the premises of a research orga- nization), Philippakis said another important concern for investigators is getting locked into a certain technology that could disappear if the com- pany goes out of business or becomes obsolete as technology improves. He suggested that investing in open-source technologies that can be built and maintained in the community offers the best defense against that problem. Horgan added that open source is valuable not just for software, but for configurations of datasets and best practices associated with sharing code as well. One of the main impediments to the goal of using the cloud to accelerate science is a lack of knowledge among researchers about how to work with different cloud-native data models and tools, said Marinshaw. Increased training and providing researchers with information from a variety of­ demonstration cases could help address this problem, she said. Horgan ­ added that there are also gaps and disparities with the tools that exist in the cloud and how these tools provide different user experiences in different cloud environments. Dedicated experts investing time with a user research team to understand the specific tasks a researcher wants to accomplish, rather than forcing the researcher to learn how to write their own queries to accomplish that task, could support making cloud use more efficient, he said. Roskams added that the user journey is further complicated by the fact that most platforms have failed to provide users with roadmaps that will guide them in how to manage, store, and wrangle their data. Developing training modules, possibly through INCF, or conducting hands-on training workshops could alleviate this problem, said Roskams. Governance policies may also address training. Most institutions cur- rently require animal ethics and/or human ethics certification for ­ esearchers r working with animals or humans, noted Roskams. She suggested that it might also be helpful to require data ethics and data understanding cer- tification. Huerta said his office is also looking at staff training, so that PREPUBLICATION COPY—Uncorrected Proofs

42 NEUROSCIENCE DATA IN THE CLOUD program officers who do not have portfolios dedicated to computational biology will better understand these concepts when they are evaluating budgets and proposals. Finally, an important consideration related to governance is how to ensure the sustainability of cloud-based platforms. Magali Haas noted that many platforms are funded for a limited time period through grant mecha- nisms with no mechanism for renewal. Canet-Avilés noted, however, that for AMP, a public–private partnership between NIH and private organiza- tions, the model they are developing is that data platforms eventually will be sustainable through government funding. Funding is not the only factor that affects sustainability, however. Sustaining the kind of cloud support engineer talent needed to support research projects has also proved chal- lenging, according to Russell Poldrack and Weber. One approach taken by the Office of Data Science Strategy, according to Weber, is to develop pro- grams that recruit people from outside government for one year or two for projects they might find very interesting, enabling them to internally train and raise the knowledge level of the rest of the research staff. PREPUBLICATION COPY—Uncorrected Proofs

Next: Part 2: Different Types of Neuroscience Data: Challenges and Potential Opportunities »
Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop Get This Book
×
Buy Paperback | $45.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The cloud model of data sharing has led to a vast increase in the quantity and complexity of data and expanded access to these data, which has attracted many more researchers, enabled multi-national neuroscience collaborations, and facilitated the development of many new tools. Yet, the cloud model has also produced new challenges related to data storage, organization, and protection. Merely switching the technical infrastructure from local repositories to cloud repositories is not enough to optimize data use.

To explore the burgeoning use of cloud computing in neuroscience, the National Academies Forum on Neuroscience and Nervous System Disorders hosted a workshop on September 24, 2019. A broad range of stakeholders involved in cloud-based neuroscience initiatives and research explored the use of cloud technology to advance neuroscience research and shared approaches to address current barriers. This publication summarizes the presentation and discussion of the workshop.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!