Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
6 Governing, Funding, and Sustaining Cloud-Based Platforms Highlightsa â¢ New data access policies and a clearinghouse of information about governance are needed to manage access to the next generation of data (Canet-AvilÃ©s, Farber, Horgan, Philippakis, Picchini Schaffer, Virta). â¢ Scientists and research participants should be included in mak- ing decisions about governance policies (Farber, Marinshaw). â¢ Making governance rules and data use agreements available from institutions such as the National Institutes of Health, Harvard, and the Broad Institute; creating standard templates for data use agreements; or creating a clearinghouse of infor- mation about governance and data use agreements could enable institutions to establish more harmonized rules to enable data sharing (Canet-AvilÃ©s, Marinshaw). â¢ Providing researchers with use cases that demonstrate success- ful use of the cloud could inform them about available tools and accelerate science (Horgan, Marinshaw). â¢ Increased training is needed for researchers to learn to work with data models and tools in the cloud (Horgan, Marinshaw, Roskams). a These points were made by the individual workshop participants identified above. They are not intended to reflect a consensus among workshop participants. 37 PREPUBLICATION COPYâUncorrected Proofs
38 NEUROSCIENCE DATA IN THE CLOUD As data migrate to a cloud-based environment, issues of data owner- ship, how the data will be used for scientific discovery, and who has access to the data become uncoupled, making the need for clear governance and oversight plans essential, said Anthony Philippakis, chief data officer at the Broad Institute of MIT and Harvard. Indeed, said Sean Horgan, lead Â roject manager at Verily Life Sciences of the companyâs biomedical p research platform, data access policies inherited through large existing datasets have failed to keep up with what scientists now see as the need for cross-dataset analysis. New policies need to be drafted for the next genera- tion of data, he said, which will require coordination across new datasets such as those being generated by the various AMP initiatives, the All of Us research program,1 Sage Bionetworks, and others. Institutional policies around cloud and data governance are set pri- marily by chief information officers (CIOs), lawyers, privacy officers, and information security officers, with little engagement of scientists themselves, said Ruth Marinshaw, chief technology officer for research computing at Stanford University. Scientists need to advocate more strongly for a seat at the table where governance decisions are made, she said. Perhaps a new position needs to be defined that brings the researcherâs perspective to these deliberations, said Adam Ferguson, associate professor of neurosurgery at the University of California, San Francisco. From the researcherâs perspec- tive, institutional restrictions on data sharing can be viewed as restrictions on academic freedom, and completely at odds with the NIH mandate, added Ferguson. âThese are freight trains going at a head-on trajectory toward each other, and should be sorted out with transparency,â he said. Research participants should also be involved in this decision-making pro- cess, added Gregory Farber, director of the Office of Technology Develop- ment and Coordination at NIMH. At NIH, dbGaP has provided the voice of the government and served as an honest broker in bringing groups together to decide who can access genomic data and for what research purposes, said Philippakis. As dbGaP data move to the cloud, NIH plans to continue playing that role, said Farber. Among the issues to be addressed is whether data use aligns with existing informed consent policies, or whether current policies reflect the world of 20 years ago and need to be updated. CURRENT PROMISING PRACTICES FOR DATA GOVERNANCE IN THE CLOUD The Office of Data Science Strategy has as one of its tenets Â ustainability s around data, said Nick Weber. They are currently piloting a program with 1â For more information, see https://allofus.nih.gov (accessed November 11, 2019). PREPUBLICATION COPYâUncorrected Proofs
GOVERNING, FUNDING, AND SUSTAINING 39 Figshare2 where NIH is providing funding up front for anyone with a dataset of a certain size that will be put into general purpose repository for long-term sustainability, said Weber. He added that NIH is encouraging researchers to use STRIDES to manage very large datasets in the cloud in part so that NIH can gather reporting insights, information on costs, and information on funding to help make long-term sustainability decisions. The All of Us research program has been innovative on two fronts related to the research participant and the dynamic between the research participant and researcher, said Philippakis. First, all data collected on a research participant are returned to the participant, and second, when a researcher gains access to data, he or she is required to provide information about the research team and how they intend to use the data. âResearcher privacy isnât really a thing or maybe it shouldnât be,â said Philippakis. Rather, letting research participants be involved in policing oversight is innovative, he said. Horgan noted that when Verily wanted to create a data user agreement, they started by looking at the All of Us agreement. Leveraging technology to remove some of the human-specific tasks involved in data use oversight could also make the process more efficient and consistent, said Philippakis. His team showed that a simple machine- readable ontology could be created for about 95 percent of use cases, and then ran an experiment comparing an automated versus traditional data use oversight approach. Not only was the automated approach identical to the traditional approach in most cases, but when there were disagree- ments, the automated approach provided more consistent answers. ISSUES TO BE RESOLVED REGARDING DATA USE AND ACCESS, ANALYSIS, USER TRAINING, AND PLATFORMS SUSTAINABILITY Each institution sets its own rules, which hinders collaboration and efficiency, said Rosa Canet-AvilÃ©s, director of neuroscience research part- nerships at FNIH. For example, one of the biggest obstacles to data shar- ing is that every institution requires researchers to obtain IRB and ethics approval even for data generated elsewhere, said Jane Roskams. Thus, even data that are openly accessible can take months and years to obtain. Marinshaw suggested that institutions might be able to avoid creating these regulations in a vacuum if information was available on the gover- nance rules and data use agreements established by other institutions such as NIH, Harvard, and the Broad Institute. Creating standard templates for data user agreements may also be helpful, added Horgan. Canet-AvilÃ©s added that harmonizing such templates across different types of data and cohorts could also be valuable. 2â For more information, see https://figshare.com (accessed November 11, 2019). PREPUBLICATION COPYâUncorrected Proofs
40 NEUROSCIENCE DATA IN THE CLOUD Determining when restrictive access policies are needed presents another governance dilemma, said Farber. The world would be a simpler place and data would be much more useful if general research use (GRU) consents were widely adopted, he said. However, while GRU consent may be applicable to bigger datasets, Farber suggested that smaller and more specialized âedgeâ cases may need more restrictive policies. Philippakis added that while nearly everyone agrees that individual-level data should not be put into open access domains, aggregated data may be fine to put in the public domain. However, there is no cut point that defines when data are aggregated enough for sharing, he said. Philippakis suggested that as new cohorts are generated, GRU provides many benefits. He noted, however, that existing cohorts are also extremely valuable even though the consents obtained in setting them up may not allow generalized use. Another challenge with integrating data from older studies is that those data may not exist in digital form, said Silvana Borges, associate director for regulatory science in the Office of Drug Evaluation II at the FDAâs Center for Drug Evaluation and Research (CDER). Canet-AvilÃ©s said it would be helpful if there was a single clearing- house where investigators could access information about various aspects of governance, such as data use agreements for different types of data and different levels of access. Valerie Virta, American Association for the Advancement of Science Science & Technology Policy Fellow at NIH, concurred, noting that NIH is poised to provide guidance that could be helpful to the community and help propagate best practices. Bringing a larger group of investigators and organizations together to share learn- ings on governance problems and solutions could be valuable, said Alyssa Picchini Schaffer, senior scientist at the Simons Foundation. Marinshaw agreed about the need to engage a broader group of participant institu- tions, possibly by issuing requests for information on various issues related to governance practices. A system that defines the required qualifications of researchers to access controlled data, and to track researchers when they move from one cloud to another, is also needed, said Philippakis. The technology exists to build such a system, he said, but the organizational structure does not exist. Governance committees may also address when cloud storage is appro- priate, considering factors such as cost, safety, and the amount of data involved, said Farber. The cost of cloud storage is low at first glance, said Marinshaw, but the data management, movement, and curation can be expensive. Generally, when data are stored in the cloud there are more resources and technologies that can be employed in cost-effective ways, but researchers need to be educated on costs and benefits, said Horgan. For example, Lisa Merck, associate professor of emergency medicine and vice chair of research at the University of Florida, said that for the PREPUBLICATION COPYâUncorrected Proofs
GOVERNING, FUNDING, AND SUSTAINING 41 BOOST3 clinical trial, which is looking at cerebral oxygenation-driven therapy after severe traumatic brain injury, continuous brain oxygenation and multiparametric data are being collected and stored in the cloud from 45 centers. She suggested that an alternative might be to publish datasets that have been curated and analyzed in a large national library that would be publicly accessible rather than relying on cloud-based services. Farber said there are some efforts to move in this direction, but added that this approach raises other governance issues such as how long to keep the data in storage. Philippakis added that while data storage on the cloud versus on an institutionâs own infrastructure may be somewhat cheaper, it can be pain- ful simply because it is a change. But he suggested that cloud storage also incentivizes other good outcomes such as data sharing. Whether data are stored in the cloud or âon premâ (i.e., on the premises of a research orga- nization), Philippakis said another important concern for investigators is getting locked into a certain technology that could disappear if the com- pany goes out of business or becomes obsolete as technology improves. He suggested that investing in open-source technologies that can be built and maintained in the community offers the best defense against that problem. Horgan added that open source is valuable not just for software, but for configurations of datasets and best practices associated with sharing code as well. One of the main impediments to the goal of using the cloud to accelerate science is a lack of knowledge among researchers about how to work with different cloud-native data models and tools, said Marinshaw. Increased training and providing researchers with information from a variety ofÂ demonstration cases could help address this problem, she said. Horgan Â added that there are also gaps and disparities with the tools that exist in the cloud and how these tools provide different user experiences in different cloud environments. Dedicated experts investing time with a user research team to understand the specific tasks a researcher wants to accomplish, rather than forcing the researcher to learn how to write their own queries to accomplish that task, could support making cloud use more efficient, he said. Roskams added that the user journey is further complicated by the fact that most platforms have failed to provide users with roadmaps that will guide them in how to manage, store, and wrangle their data. Developing training modules, possibly through INCF, or conducting hands-on training workshops could alleviate this problem, said Roskams. Governance policies may also address training. Most institutions cur- rently require animal ethics and/or human ethics certification for Â esearchers r working with animals or humans, noted Roskams. She suggested that it might also be helpful to require data ethics and data understanding cer- tification. Huerta said his office is also looking at staff training, so that PREPUBLICATION COPYâUncorrected Proofs
42 NEUROSCIENCE DATA IN THE CLOUD program officers who do not have portfolios dedicated to computational biology will better understand these concepts when they are evaluating budgets and proposals. Finally, an important consideration related to governance is how to ensure the sustainability of cloud-based platforms. Magali Haas noted that many platforms are funded for a limited time period through grant mecha- nisms with no mechanism for renewal. Canet-AvilÃ©s noted, however, that for AMP, a publicâprivate partnership between NIH and private organiza- tions, the model they are developing is that data platforms eventually will be sustainable through government funding. Funding is not the only factor that affects sustainability, however. Sustaining the kind of cloud support engineer talent needed to support research projects has also proved chal- lenging, according to Russell Poldrack and Weber. One approach taken by the Office of Data Science Strategy, according to Weber, is to develop pro- grams that recruit people from outside government for one year or two for projects they might find very interesting, enabling them to internally train and raise the knowledge level of the rest of the research staff. PREPUBLICATION COPYâUncorrected Proofs