National Academies Press: OpenBook
« Previous: 7 Clinical Trial and Research Data
Suggested Citation:"8 Genetic Data." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
Page 51
Suggested Citation:"8 Genetic Data." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
Page 52
Suggested Citation:"8 Genetic Data." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
Page 53
Suggested Citation:"8 Genetic Data." National Academies of Sciences, Engineering, and Medicine. 2020. Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25653.
Page 54

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

8 Genetic Data Highlightsa • Cloud usage has become common in the genetics landscape in part because of the size of the datasets; however, new methods and tools are need to make full use of cloud infrastructure (Neale). • General Data Protection Regulation is a particular challenge for genomic data because DNA provides an identifiable foot- print or label to the data and there is no way to process away individual-level genetic variation (Agrawal, Neale, Rosati). • Hybrid models that combine cloud computing with high-­ performance computing on local clusters are useful for man- aging genetic data in the cloud because they allow investigators to prototype software before applying it systematically to large datasets (Neale). • Genetic studies in vulnerable populations, such as underserved communities or those with mental health disorders, present challenges related to participant engagement and the risk of stigmatization (Jakeman, Nalls, Neale, Rosati). • Best practices need to be developed for disclosure of genetic results to research participants (Cohen, Hanson, Neale, Rosati). a These points were made by the individual workshop participants identified above. They are not intended to reflect a consensus among workshop participants. 51 PREPUBLICATION COPY—Uncorrected Proofs

52 NEUROSCIENCE DATA IN THE CLOUD In the genetic landscape, cloud usage is becoming more common, in large part because of the size of the datasets, said Benjamin Neale. With more than a petabyte of raw data and additional processed data, the cloud is the only computational environment capable of managing these datasets, said Neale. The challenge, he said, is balancing large centralized infrastruc- tural resources with systems that enable scientists to perform analyses and explore data locally. Neale added that making full use of cloud infrastructure to process these ever-growing genetic datasets requires novel methods and software tools. The need for a lot of computing power becomes particularly onerous ­ when trying to integrate genomic data with other types of data, such as cognitive data, added Michael Nalls. On top of that, as investigators try to navigate GDPR and other regulatory environments, new methods for working in federated systems and switching from local to cloud computing will be increasingly important, said Nalls. GDPR is a particular challenge for genomic data, which require spe- cial protections because some forms of DNA can provide an identifiable footprint, said Arpana Agrawal, professor of psychiatry at the Washington University School of Medicine. As was discussed in the privacy breakout sessions (see Chapter 3), GDPR treats genetic information as personal data while HIPAA treats it as completely de-identified data, said Kristen Rosati. CURRENT PROMISING PRACTICES FOR MANAGING GENETIC DATA IN THE CLOUD Neale said there are large datasets already available in the cloud and a clear NIH investment in building infrastructure and supporting upload and access with a variety of different approaches. Nalls, for example, is working with hybrid models that combine cloud computing with high-performance systems such as the high-performance computing Biowulf cluster at NIH.1 This sandbox approach, said Nalls, allows researchers to test their software locally or on a small local cluster before going to production scale in the cloud, thus maximizing resources and reducing costs. He said his group externally audits all datasets before they are pushed to the local cluster to ensure identifiable data are not inadvertently uploaded, and also checks the code to ensure privacy is maintained since links in laboratory notebooks could potentially cause inadvertent breaches. Neale added that for genomic data, this approach of prototyping software with small data before applying it systematically to large datasets is beneficial because on a 5-year horizon, there will probably be whole-genome sequences on millions of individuals. 1  For more information, see (accessed November 11, 2019). PREPUBLICATION COPY—Uncorrected Proofs

GENETIC DATA 53 The Psychiatric Genomics Consortium takes a different approach, with data storage and analysis conducted not on the cloud, but on a dedicated and highly protected server in the Netherlands, said Agrawal. This allows EU researchers to conduct studies without the data leaving the European Union, and U.S. researchers also deposit data there. Nalls added that working in federated learning scenarios in local clusters before switching to the cloud is increasingly important as a means of adhering to GDPR regulations, although he acknowledged that exactly how GDPR will be interpreted and implemented has yet to be determined. ISSUES TO BE RESOLVED REGARDING GENETIC DATA IN THE CLOUD While the GDPR does not regulate “anonymized” data, it is unclear whether genetic data can be anonymized, said Kristen Rosati. She also noted that anticipated guidance under the Common Rule regarding the identifiability of genetic data are likely to present challenges for researchers in the United States, because nearly all U.S. institutions have been treating genetic information not accompanied by identifiers as deidentified informa- tion with no regulatory controls on data sharing. An additional problem related to identification and de-identification is that there is no way to process away the individual-level genetic varia- tion that in and of itself may be identifiable, said Neale. Researchers use this and other sensitive information all the time and sometimes in a non-­ anonymized form, he said, but do it with the explicit perspective of not doing nefarious things with those data. He added that engaging in indi­ vidual re-­dentification should have serious consequences. The language i used in consent forms related to the risks of re-identification varies consid- erably, said Rosati, and there is also a whole landscape of genetic informa- tion gathered for treatment purposes where there is no consent at all. People assume that if their genetic data are moved to shared repositories that those data are de-identified, noted Jonathan Cohen. He suggested that the com- munity might want to work to ensure that this is clarified in consent forms. Rosati agreed that genetic information and other rich, clinical, formerly de-identified data are going to require some protections. A question that has been debated, she said, is whether consent can provide adequate protection or whether much stronger federal laws are needed to protect against the re- identification of individuals. Neale noted, however, that as more barriers to ­ access are introduced, it will become hard to realize the potential of these data to improve lives. Engaging study participants for genetic research studies of vulner- able populations raises significant challenges, said Neale. Partnering with different ancestral groups from study inception is advisable although not PREPUBLICATION COPY—Uncorrected Proofs

54 NEUROSCIENCE DATA IN THE CLOUD required, he said. Educating potential participants about the benefits and risks of the study need to be managed with transparency and openness, he said. However, he acknowledged that there are potential risks for group characterization that can emerge from studies of vulnerable populations, which can cause distress. Rosati added that mental health disorders, addic- tion, and some other conditions that could be revealed by genetic infor- mation are associated with a substantial amount of stigma. Indeed, said Lyn Jakeman, neuroethicists and others are beginning to question whether genetic data become a code with which one can compare brain function. For example, if the genetic code is linked to functional imaging or other phenotypic data, it could become code for the individual, she said. Nalls said attempts have already been made to harmonize data from many dif- ferent sources using unsupervised learning methods. These require a lot of computing power and cloud-based technologies, he said. Neale raised another potential challenge: Suppose a new class of muta- tion is identified that enables a new interpretation of genetic data. Should a reanalysis of older data be conducted, and if so, who is responsible for doing such studies? Moreover, this scenario raises questions about whether individuals consented to be contacted again after the initial study, said William Hanson. These questions are being grappled with not only in the research arena, but in clinical practice as well, he said. Disclosure of genetic results to research participants raises other thorny issues, said Rosati. Cohen asserted that withholding genetic information that may raise medical concerns is wrong. How to disclose this informa- tion is not clear, however, said Cohen; for example, the medical risks of non-disclosure must be weighed against the risks (in anxiety and unnec- essary tests) of disclosing what amount to non-consequential or false- positive findings. Neale added that institutions will need to identify the infrastructure needed to deliver these kinds of results. Rosati suggested that best practices need to be identified regarding whether an institution has a duty to inform research participants when something concerning arises in a research study. Hanson added that he and his colleagues are exploring expanded consent for genetic testing in the clinical environment that would clarify the responsibility. An additional complication, Rosati said, is that the Centers for Medicare & Medicaid Services (CMS) have taken the posi- tion that any results reported back to individuals or their care providers for treatment purposes must be generated in a lab certified in accordance with the Clinical Laboratory Improvement Amendments. PREPUBLICATION COPY—Uncorrected Proofs

Next: 9 Neuroimaging Data »
Neuroscience Data in the Cloud: Opportunities and Challenges: Proceedings of a Workshop Get This Book
Buy Paperback | $45.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The cloud model of data sharing has led to a vast increase in the quantity and complexity of data and expanded access to these data, which has attracted many more researchers, enabled multi-national neuroscience collaborations, and facilitated the development of many new tools. Yet, the cloud model has also produced new challenges related to data storage, organization, and protection. Merely switching the technical infrastructure from local repositories to cloud repositories is not enough to optimize data use.

To explore the burgeoning use of cloud computing in neuroscience, the National Academies Forum on Neuroscience and Nervous System Disorders hosted a workshop on September 24, 2019. A broad range of stakeholders involved in cloud-based neuroscience initiatives and research explored the use of cloud technology to advance neuroscience research and shared approaches to address current barriers. This publication summarizes the presentation and discussion of the workshop.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook,'s online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!