Neuroscience is well on its way to moving to the cloud if it is not there already, said Maryann Martone, professor emerita at the University of California, San Diego, and chair of the governing board of the International Neuroinformatics Coordinating Facility (INCF). The cloud is ubiquitous across global neuroscience projects, including those pictured in Figure 2-1, due to the recognition that the cloud is necessary both for compiling large amounts of data and for taking algorithms to the data, said Martone. She added that many other projects beyond those included in the figure are using cloud infrastructure.
Neuroscience is never going to be served by a single platform or a single infrastructure because there are too many different types of data and too much technological flux, Martone said, and the cloud alone should not be seen as a replacement for contributing data to a proper data repository or archive. To seriously impact the field and move neuroscience forward, efforts are needed to ensure that data are maintained in a manner that makes them accessible to both humans and machines, she said. INCF1 was initially established to create global infrastructure and data standards with a goal of facilitating organization and usability of neuroscience data. INCF has grown into a membership organization with members from 18 countries across 4 different continents. It aims to coordinate the development and endorsement of open and FAIR—findable, accessible, interoperable, and reusable—community standards and best practices that will enable data to be shared in a maximally useful way for both humans and machines. INCF also focuses on developing and providing training and educational resources, and serves as an interface among international large-scale brain projects, said Martone.
Martone noted that developing and implementing FAIR standards requires a partnership among researchers, repositories, indexers, and aggre-
gators. Moreover, for any given dataset there may be dozens of standards and best practices that need to be brought together in a way that can be navigable by a range of users. Martone illustrated how these various standards relate to one another using what she calls the FAIR onion, shown in Figure 2-2. A host of organizations, societies, and others act as convening authorities to bring experts together to establish standards at the different layers of the onion. At the outer layer of the onion where data are more specialized, problems can only be solved by the neuroscientists generating those data rather than by general organizations, said Martone. Organizations like INCF play a critical role in bringing these researchers together at the outer layers of the onion, she said.
To facilitate the development of standards, INCF is developing a standards portal and will institute a review and endorsement process with a consistent set of criteria and clear governance procedures, said Martone. The portal will also house “TrainingSpace”—a collection of neuroinformatics courses given by worldwide experts, said Martone.
Although standards are important to enable data sharing and many types of standards have been created, Martone noted that few have been piloted, tested, and validated. Moreover, standards and best practices will always be in flux as technologies evolve, she said. Neuroscientists, other researchers, and infrastructure providers have little experience working with standards. The first thing they want to do is change and adapt the standard to their needs. However, Martone said research requires a delicate touch when it comes to revising standards. She suggested that the research community will need to learn and monitor how far one can deviate from a standard before it becomes meaningless. Nearly all standards have a core that will work for most use cases, she said. The edges are where modifications are needed. By contrast, more rigid standards may be appropriate for purely clinical or purely industrial users, she said.
OpenNeuro2 is a platform that enables free and open sharing of magnetic resonance imaging (MRI), magnetoencephalography (MEG), electroencephalography (EEG), invasive EEG (iEEG), and electrocorticography (ECoG) data. According to Russell Poldrack, director of the Stanford Center for Reproducible Neuroscience, OpenNeuro was built on an early project called OpenfMRI, a resource that was developed to enable open sharing of data from task-based functional MRI (fMRI) studies (Poldrack et al., 2013). In creating OpenfMRI, Poldrack and colleagues developed a data organization scheme that was specific to the type of data that would
be submitted and that would allow automatic analysis of these data. There was no way to validate a dataset other than to run it through the pipeline. If the pipeline crashed, manual curators at Stanford had to figure out what went wrong. The process was very labor intensive, said Poldrack.
Because they wanted users to accrue benefits from moving to the standards, they also created BIDS apps—containerized applications for more than 30 diverse neuroimaging software packages, said Poldrack. BIDS is flexible enough that by sticking to the core elements, users realize how they can work with standards and still achieve the specificity they need, noted Martone. The BIDS apps allow users to run large data analyses packages easily and reproducibly without having to reformat their data for those specific packages (Gorgolewski et al., 2017). Each version of a shared dataset is given a digital object identifier (DOI) that can be cited in published papers, said Poldrack.
To facilitate data sharing, OpenNeuro also creates a discussion page for each dataset through which questions can be submitted to the dataset owner. The project has been funded by the BRAIN Initiative through 2023 and is growing at a rate of about 10 to 20 new datasets and has a total of 5,000 to 6,000 users per month, said Poldrack. He added that OpenNeuro enables anyone to download de-identified data with no restrictions and no data use agreements, thus providing an unparalleled degree of openness. The goal, he said, is to maximize the value of these data.
The NIH Center for Information Technology has also made a major commitment to adopting and developing best practices related to cloud technologies as a means of supporting the research community, said Nick Weber, program manager for Cloud Services at the NIH Center for Information Technology. The Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative, launched in July 2018, now has partnerships with Google Cloud and
Amazon Web Services (AWS) and is working on additional partnerships with other commercial providers, said Weber. STRIDES aims to make it easier for researchers to use these services, access data, and employ the latest tools and technologies while protecting the security and privacy of data, he said. Other important elements of the STRIDES Initiative include a training component across the full range of users, including technical staff, bench researchers, data scientists, and informaticians, and providing insight into sustainability by gathering data on data usage to inform funding decisions, said Weber.
STRIDES has facilitated building the operational environment for several large and high-value NIH-managed datasets such as those generated by Common Fund programs, the Trans-Omics for Precision Medicine (TOPMed) program sponsored by the National Heart, Lung, and Blood Institute (NHLBI), and the Accelerating Medicines Partnership-Parkinson’s Disease (AMP-PD) program, said Weber. Already, STRIDES investments have provided benefits to these research programs in terms of cost savings and improved access to professional services and enterprise support, he said. He added that STRIDES has enabled the transfer of more than 30 petabytes of data into the cloud, making it more widely accessible to the research community. Ultimately, Weber predicts that STRIDES will facilitate improved interconnections among datasets that otherwise would not have been connected. To achieve this, he said, STRIDES has initiated efforts to make sure funding agencies and partners understand how to leverage STRIDES resources, for example, by including information about STRIDES in funding opportunity announcements.
This page intentionally left blank.