National Academies Press: OpenBook

Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop (2020)

Chapter: 4 Managing Data and Promoting Interoperability in the Cloud

« Previous: 3 Protecting Privacy in the Cloud
Suggested Citation:"4 Managing Data and Promoting Interoperability in the Cloud." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×

4

Managing Data and Promoting Interoperability in the Cloud

Suggested Citation:"4 Managing Data and Promoting Interoperability in the Cloud." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×

Many of the issues related to data management and integration are not cloud specific, said Alan Evans, James McGill Professor of Neurology and Psychiatry at McGill University. Indeed, he said, getting the major platforms to develop interoperability definitions to enable data sharing transcends the cloud. But without that cooperation, there will continue to be islands and communities that are unable to communicate.

The web of regulations referred to in the section on privacy (see Chapter 3) further complicates efforts to integrate data across geographic boundaries, noted Eline Applemans, scientific program manager in neuroscience at the Foundation for the National Institutes of Health (FNIH). Benjamin Neale, associate professor in the Analytic and Translational Genetics Unit at Massachusetts General Hospital and the Broad Institute of MIT and Harvard, concurred that the GDPR regulations require cloud environments to be set up in each country, allowing investigators to analyze data within national boundaries. He suggested that although research could proceed more rapidly if data were housed in a single place, the community should be open to federated models and storage of different levels of data in different ways. For example, summary-level information might be shared in a highly interoperable environment, while individual-level data may be housed in a more restricted capacity.

Interoperability is facilitated by standards, but developing widely accepted data standards requires cooperation and is itself challenging. Data standards could provide the opportunity for large cloud-based neuroscience resources to work together; however, in a dynamic field like neuroscience with changing data modalities and technologies, it can be difficult to corral standards, said Michael Huerta. Daniel Marcus, professor of imaging neuroscience at the Washington University School of Medicine in St. Louis, suggested that inadequate cooperation arises not from a lack of interest, but a lack of incentives. Governments can play a role in creating such incentives, as well as in coordinating collaborative efforts, said Rebecca Li.

Maryann Martone added that the ideal people to develop standards may not be researchers themselves because they may lack expertise in informatics and coding. However, the standards developed should map onto what the researchers actually do in a way that they can understand how these constructs represent their experimental paradigms, she said. Thus, she said, it is probably helpful to start by asking researchers what they need from the data and what they are willing or unwilling to do to achieve their goals.

Data management can be costly and time consuming, said Huerta. Researchers should think about data integration and data sharing from the beginning as they are developing and designing their projects and should balance the costs versus benefits (value assessment) in deciding what level of data

Suggested Citation:"4 Managing Data and Promoting Interoperability in the Cloud." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×

management is needed, he said. Martone suggested that it may be helpful to ask researchers to fill out templates of metadata schemes. These templates, she said, should be simple and not overly prescriptive. Huerta added that NIH staff need to understand the complexities of data management; one example is that data cleaning is essential and can be expensive, said Huerta.

CURRENT PROMISING PRACTICES REGARDING STANDARDS DEVELOPMENT AND INTEROPERABILITY

Huerta recalled that about 20 years ago, to develop the Neuroimaging Informatics Technology Initiative (NIFTI) as an imaging standard, the major neuroimaging labs and software developers came together for workshops to develop standards, which are still widely used. Now, he said, his office is working to accelerate the promotion and adoption of Fast Healthcare Interoperability Resources (FHIR) standards to promote health care information exchange across NIH. He added that NIH is preparing to release for public comment a data management and sharing policy, which will require NIH-funded researchers to include a data management and sharing plan in their grant proposals.1

Neale suggested that genetics is one domain within the field of neuroscience that has already made progress in sharing data. The Psychiatric Genomics Consortium (PGC) was launched in 2007 with the goal of conducting huge genome-wide analyses of psychiatric disorders by bringing researchers together from around the world to work collaboratively (Psychiatric GWAS Consortium Steering Committee, 2009). The more than 800 investigators from 38 countries that have joined this consortium share data on a research compute cluster in the Netherlands that functions in a manner similar to a cloud, enabling many different groups to share and work together with a standardized kind of processing and analysis, said Neale.

The UK Biobank has created a different kind of data model in which data are made available for downloading, said Neale. He suggested that it may be possible to set up cloud-based methods that would enable investigators to point to and analyze those data without downloading it.

Meanwhile, the National Center for Biotechnology Information (NCBI) has developed a database of genotypes and phenotypes (dbGaP)2 to archive and distribute data and results from genotype/phenotype studies conducted in humans, said Neale. He added that the National Human Genome

___________________

1 This policy was released since the date of the workshop. For more information, see https://osp.od.nih.gov/scientific-sharing/nih-data-management-and-sharing-activities-related-to-public-access-and-open-science (accessed November 24, 2019).

2 For more information, see https://www.ncbi.nlm.nih.gov/gap (accessed November 11, 2019).

Suggested Citation:"4 Managing Data and Promoting Interoperability in the Cloud." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×

Research Institute (NHGRI) and NHLBI are trying to move toward a centralized dataset model where researchers can apply for access and then work with the data in a centralized environment. Many pieces are on the table that are not all totally linked and interoperable, he said, suggesting that opportunities remain to improve the data management approach. However, Lyn Jakeman, director of the division of neuroscience at the National Institute of Neurological Disorders and Stroke (NINDS), suggested that there may not be one model that works for all areas within neuroscience.

Interoperability among a multiplicity of data management platforms can also be a problem, said Evans. For example, the Canadian Open Neurosciences Platform uses LORIS (Longitudinal Online Research and Imaging System)3 as its main data management platform, he said, but other institutions across Canada use other systems. Users see only a common application programming interface (API) that sits on top of these platforms, he said.

DATA MANAGEMENT ISSUES TO BE RESOLVED

Transforming data from a raw state into a standardized format capable of being analyzed and/or shared—a process called “data munging” or “data wrangling”—is costly and time consuming, sometimes accounting for as much as 70 percent of a project’s budget, said Michael Nalls, founder and CEO of Data Tecnica International and a consultant for the National Institute on Aging. Rachel Ramoni, chief research and development officer for the Department of Veterans Affairs (VA), suggested that funding agencies might be able to come together to support the development of harmonized approaches for data munging and then incentivize the use of these harmonized approaches by funding projects that use them. Heather Snyder, vice president of medical science relations at the Alzheimer’s Association, agreed that funders could make data cleaning (i.e., correcting or removing inaccurate or irrelevant data) a condition of funding, adding that the Alzheimer’s Association sometimes pays to have datasets cleaned, believing that there is tremendous value in those data being available and shared. Huerta added that here is a trans-NIH effort to train program officers about good data management practices, with tools that will enable even those who do not have computational biology in their portfolios to better understand the costs of data management, including the amount of principal investigator and technician time required for data cleaning and munging.

Derek Merck, director of medical informatics at the University of Florida, advocated moving as much of the burden of data cleaning as pos-

___________________

3 For more information, see http://loris.ca (accessed November 11, 2019).

Suggested Citation:"4 Managing Data and Promoting Interoperability in the Cloud." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×

sible from the producers of the data to repositories, while requiring data producers to meet only minimal requirements in order to contribute data. Investigators who adhere to that format would be incentivized by having access to multiple automated processes, he said. Huerta added that in academic environments it can be difficult to recruit and retain people who have the skills necessary for data standardization. Martone said that researchers tend to have little interest in standards, often leaving data management to graduate students and postdoctoral fellows.

Michael Hawrylycz, senior director for informatics at the Allen Institute for Brain Science, said that because large datasets are often generated in a systematic way, standardization is less of an issue. However, standardization becomes more problematic with smaller datasets. Incentivizing researchers to standardize these datasets is especially important, he said. Benchmarking datasets and software that will be used in the cloud is also important when designing experiments, said Nalls.

Harmonizing and federating similar types of data residing in multiple repositories and platforms is only one challenge, said Marcus. When systems hold different types of data, the challenges are magnified. For example, he said, if neuroimaging data identifies a potentially interesting region of interest, one might want to examine gene expression in that region. Making such connections is currently a manual process. Merging these data types would require defining a common coordinate frame, said Evans.

Similarly, analyzing genetics data across datasets without being able to integrate phenotypic data and other information limits what can be learned from those data, added Snyder. Neale said that in the genetics field, a strategic decision early on to go broad in sample, but shallow in phenotype, has slowly shifted, especially with projects such as the UK Biobank, where links with electronic health records and other data sources are providing deeper and richer phenotypic information.

Suggested Citation:"4 Managing Data and Promoting Interoperability in the Cloud." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×

This page intentionally left blank.

Suggested Citation:"4 Managing Data and Promoting Interoperability in the Cloud." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 25
Suggested Citation:"4 Managing Data and Promoting Interoperability in the Cloud." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 26
Suggested Citation:"4 Managing Data and Promoting Interoperability in the Cloud." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 27
Suggested Citation:"4 Managing Data and Promoting Interoperability in the Cloud." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 28
Suggested Citation:"4 Managing Data and Promoting Interoperability in the Cloud." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 29
Suggested Citation:"4 Managing Data and Promoting Interoperability in the Cloud." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
×
Page 30
Next: 5 Assigning Credit, Determining Ownership, and Licensing Data in the Cloud »
Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop Get This Book
×
 Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop
Buy Paperback | $45.00 Buy Ebook | $36.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The cloud model of data sharing has led to a vast increase in the quantity and complexity of data and expanded access to these data, which has attracted many more researchers, enabled multi-national neuroscience collaborations, and facilitated the development of many new tools. Yet, the cloud model has also produced new challenges related to data storage, organization, and protection. Merely switching the technical infrastructure from local repositories to cloud repositories is not enough to optimize data use.

To explore the burgeoning use of cloud computing in neuroscience, the National Academies Forum on Neuroscience and Nervous System Disorders hosted a workshop on September 24, 2019. A broad range of stakeholders involved in cloud-based neuroscience initiatives and research explored the use of cloud technology to advance neuroscience research and shared approaches to address current barriers. This publication summarizes the presentation and discussion of the workshop.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!