Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
2 Harnessing Cloud-Based Technologies to Advance Neuroscience Research: Select Current Initiatives Highlightsa â¢ The International Neuroscience Coordinating Facility coordi- nates the development and endorsement of findable, accessible, interoperable, and reusable (FAIR) community standards and best practices (Martone). â¢ The cloud alone is not a replacement for depositing data into a proper archive. To maximally impact neuroscience research, data need to be maintained in a manner that makes them FAIR (Martone). â¢ To enable data sharing, data standards are needed at all levels of datasets and become more specialized according to user needs (Martone). â¢ The platform OpenNeuro enables free and open sharing of multiple data types using a community standard called the Brain Imaging Data Structure (BIDS) (Poldrack). â¢ BIDS appsâcontainerized applications of diverse neuroÂmaging i software packagesâallow users to analyze data easily and reproducibly, which provides users with additional benefits and incentivizes adherence to the BIDS standards (Poldrack). â¢ The Science and Technology Research Infrastructure for DisÂ covery, Experimentation, and Sustainability (STRIDES) Initiative is designed to make it easier for researchers to use cloud-based services such as Google Cloud and Amazon Web Services (Weber). 7 PREPUBLICATION COPYâUncorrected Proofs
8 NEUROSCIENCE DATA IN THE CLOUD â¢ STRIDES has facilitated the building of several large and high- value datasets managed by the National Institutes of Health and enabled the transfer of more than 30 petabytes of data to the cloud, where they are accessible to the research community (Weber). a These points were made by the individual workshop participants identified above. They are not intended to reflect a consensus among workshop participants. Neuroscience is well on its way to moving to the cloud if it is not there already, said Maryann Martone, professor emerita at the University of California, San Diego, and chair of the governing board of the International Neuroinformatics Coordinating Facility (INCF). The cloud is ubiquitous across global neuroscience projects, including those pictured in Figure 2-1, due to the recognition that the cloud is necessary both for compiling large amounts of data and for taking algorithms to the data, said Martone. She added that many other projects beyond those included in the figure are using cloud infrastructure. INTERNATIONAL NEUROSCIENCE COORDINATING FACILITY Neuroscience is never going to be served by a single platform or a single infrastructure because there are too many different types of data and too much technological flux, Martone said, and the cloud alone should not be seen as a replacement for contributing data to a proper data repository or archive. To seriously impact the field and move neuroscience forward, efforts are needed to ensure that data are maintained in a manner that makes them accessible to both humans and machines, she said. INCF1 was initially established to create global infrastructure and data standards with a goal of facilitating organization and usability of neuroscience data. INCF has grown into a membership organization with members from 18 coun- tries across 4 different continents. It aims to coordinate the development and endorsement of open and FAIRâfindable, accessible, interoperable, and reusableâcommunity standards and best practices that will enable data to be shared in a maximally useful way for both humans and machines. INCF also focuses on developing and providing training and educational resources, and serves as an interface among international large-scale brain projects, said Martone. Martone noted that developing and implementing FAIR standards requires a partnership among researchers, repositories, indexers, and aggre- 1â For more information, see https://www.incf.org (accessed November 12, 2019). PREPUBLICATION COPYâUncorrected Proofs
PREPUBLICATION COPYâUncorrected Proofs FIGURE 2-1â An exciting time for global neuroscience. Global neuroscience projects such as those pictured here have been enabled by the cloud. 9 SOURCE: Presented by Maryann Martone, September 24, 2019.
10 NEUROSCIENCE DATA IN THE CLOUD gators. Moreover, for any given dataset there may be dozens of standards and best practices that need to be brought together in a way that can be navigable by a range of users. Martone illustrated how these various stan- dards relate to one another using what she calls the FAIR onion, shown in Figure 2-2. A host of organizations, societies, and others act as convening authorities to bring experts together to establish standards at the different layers of the onion. At the outer layer of the onion where data are more specialized, problems can only be solved by the neuroscientists generating those data rather than by general organizations, said Martone. Organiza- tions like INCF play a critical role in bringing these researchers together at the outer layers of the onion, she said. To facilitate the development of standards, INCF is developing a stan- dards portal and will institute a review and endorsement process with a consistent set of criteria and clear governance procedures, said Martone. The Â portal will also house âtraining spaceââa collection of neuroÂnformatics i courses given by worldwide experts, said Martone. Although standards are important to enable data sharing and many types of standards have been created, Martone noted that few have been piloted, tested, and validated. Moreover, standards and best practices will always be in flux as technologies evolve, she said. Neuroscientists, other researchers, and infrastructure providers have little experience working with standards. The first thing they want to do is change and adapt the standard to their needs. However, Martone said research requires a delicate touch when it comes to revising standards. She suggested that the research community will need to learn and monitor how far one can deviate from a standard before it becomes meaningless. Nearly all standards have a core that will work for most use cases, she said. The edges are where modifica- tions are needed. By contrast, more rigid standards may be appropriate for purely clinical or purely industrial users, she said. OPEN NEURO OpenNeuro,2 is a platform that enables free and open sharing of mag- netic resonance imaging (MRI), magnetoencephalography (MEG), electro- encephalography (EEG), invasive EEG (iEEG), and electrocorticography (ECoG) data. According to Russell Poldrack, director of the Stanford C Â enter for Reproducible Neuroscience, OpenNeuro was built upon an early project called OpenfMRI, a resource that was developed to enable open sharing of data from task-based functional MRI (fMRI) studies (Poldrack et al., 2013). In creating OpenfMRI, Poldrack and colleagues developed a data organization scheme that was specific to the type of data that would 2â For more information, see https://openneuro.org (accessed November 12, 2019). PREPUBLICATION COPYâUncorrected Proofs
FIGURE 2-2â The FAIR onion. Standards are needed at all levels of datasets, as illustrated by the FAIR onion. At the core, standards PREPUBLICATION COPYâUncorrected Proofs for basic data descriptors are needed. As data become more complex and specialized, additional layers of standards are required: first, standardized community vocabularies and data types; followed by domain-specific vocabularies, minimal information models, and common data elements. At the outer layers of the onion, specialized vocabularies and information models as well as customized standards and formats are needed for specific applications. NOTE: CDE = common data elements; FAIR = findable, accessible, interoperable, reusable. 11 SOURCE: Presented by Maryann Martone, September 24, 2019.
12 NEUROSCIENCE DATA IN THE CLOUD be submitted and that would allow automatic analysis of these data. There was no way to validate a dataset other than to run it through the pipeline. If the pipeline crashed, manual curators at Stanford had to figure out what went wrong. The process was very labor intensive, said Poldrack. Realizing the need for a less costly data-sharing approach, Poldrack and his colleague Krzysztof Gorgolewski created a community standard called the Brain Imaging Data Structure (BIDS), with funding from the Arnold Foundation and the National Institutes of Health (NIH). BIDS specifies file naming and organization and also a metadata structure, said Poldrack. By using relatively simple directory and file naming templates and common formats, BIDS reduces the learning curve for users. Because it has an automated validator that is built in Java script, it also allows users to run the validator in the browser before uploading their data. It can run a huge dataset in just a few seconds and quickly provides feedback about whether a particular dataset has met the standard. Because they wanted users to accrue benefits from moving to the stan- dards, they also created BIDS appsâcontainerized applications for more than 30 diverse neuroimaging software packages, said Poldrack. BIDS is flexible enough that by sticking to the core elements, users realize how they can work with standards and still achieve the specificity they need, noted Martone. The BIDS apps allow users to run large data analyses packages easily and reproducibly without having to reformat their data for those specific packages (Gorgolewski et al., 2017). Each version of a shared data- set is given a digital object identifier (DOI) that can be cited in published papers, said Poldrack. To facilitate data sharing, OpenNeuro also creates a discussion page for each dataset through which questions can be submitted to the dataset owner. The project has been funded by the BRAIN Initiative through 2023 and is growing at a rate of about 10 to 20 new datasets and was a total of 5,000 to 6,000 users per month, said Poldrack. He added that OpenNeuro enables anyone to download de-identified data with no restrictions and no data use agreements, thus providing an unparalleled degree of openness. The goal, he said, is to maximize the value of these data. STRIDES The NIH Center for Information Technology has also made a major commitment to adopting and developing best practices related to cloud technologies as a means of supporting the research community, said Nick Weber, program manager for Cloud Services at the NIH Center for Infor- mation Technology. The Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative, launched in July 2018, now has partnerships with Google Cloud and PREPUBLICATION COPYâUncorrected Proofs
HARNESSING CLOUD-BASED TECHNOLOGIES 13 A Â mazon Web Services (AWS) and is working on additional partnerships with other commercial providers, said Weber. STRIDES aims to make it easier for researchers to use these services, access data, and employ the latest tools and technologies while protecting the security and privacy of data, he said. Other important elements of the STRIDES Initiative include a training component across the full range of users, including technical staff, bench researchers, data scientists, and informaticians, and providing insight into sustainability by gathering data on data usage to inform fund- ing decisions, said Weber. STRIDES has facilitated building the operational environment for several large and high-value NIH-managed datasets such as those gener- ated by Common Fund programs, the Trans-Omics for Precision Medicine (TOPMed) program sponsored by the National Heart, Lung, and Blood Institute (NHLBI), and the Accelerating Medicines Partnership-Parkinsonâs Disease (AMP-PD) program, said Weber. Already, STRIDES investments have provided benefits to these research programs in terms of cost sav- ings and improved access to professional services and enterprise support, he said. He added that STRIDES has enabled the transfer of more than 30 petabytes of data into the cloud, making it more widely accessible to the research community. Ultimately, Weber predicts that STRIDES will facilitate improved interconnections among datasets that otherwise would not have been connected. To achieve this, he said, STRIDES has initiated efforts to make sure funding agencies and partners understand how to leverage STRIDES resources, for example, by including information about STRIDES in funding opportunity announcements. PREPUBLICATION COPYâUncorrected Proofs
PREPUBLICATION COPYâUncorrected Proofs