Key Highlights Discussed by Individual Participants
- Basic research is the foundation of the neuroscience enterprise (Landis).
- Exposing trainees to the interplays among basic, translational, and clinical research can help to ensure a proper balance of the three is maintained through the next generation of neuroscientists (Landis).
- Trainees need to understand the fundamental principles that underlie the tools they use in order to understand the limitations of those tools and the situations in which they can be appropriately deployed (Marder).
- As the complexity of new tools increases, novel mechanisms for teaching trainees and other scientists to use them can facilitate widespread adoption (Landis).
- Handling and analyzing large amounts of data will be a major challenge in the next era of neuroscience (Sejnowski).
- There are three major aspects to working with big data: data literacy, data management, and data sharing (Martone).
NOTE: The items in this list were addressed by individual participants and were identified and summarized for this report by the rapporteurs. This is not intended to reflect a consensus among workshop participants.
Basic research is the fuel that powers advances in neuroscience, noted Story Landis. A solid understanding of how neurons function, form neural circuits, and ultimately influence behavior underlies every effort to develop clinical treatments for neurological diseases (Koroshetz and
Landis, 2014). Several speakers and workshop participants discussed how to structure graduate programs to instill in trainees the best practices for conducting basic research. In addition, many participants highlighted the need for trainees to have a fundamental knowledge of the new tools and technology that are used to make basic research discoveries, as well as the ability to properly handle and analyze the big data that are generated from them.
THE NEED FOR INCREASED TRAINING IN BASIC RESEARCH
In her presentation, Landis called attention to the important role that basic research plays in neuroscience. Without basic science discoveries, she said, there would be nothing to translate into clinical treatments and the whole neuroscience enterprise would collapse. Yet, an analysis of NINDS’s funding portfolio,1 overseen by Landis during her tenure as director, revealed that the institute’s funding of basic science has decreased over the years (see Figure 2-1). In 1997, basic research accounted for 52 percent of NINDS’s overall budget. By 2012 that proportion dropped to 27 percent, while funding of clinical and translational science increased by a corresponding amount (Landis, 2014). Looking at data from 2 years, 2008 and 2011, requests for funding of so-called basic-basic research dropped by 21 percent, while disease-focused basic requests increased 23 percent, applied-translational requests increased 42 percent, and applied-clinical requests increased 38 percent (Landis, 2014). The success rate for basic science grants, however, remained unchanged over that time period (and was actually higher than all other categories in both 2008 and 2011). While similar trends were not seen at the National Institute of Mental Health or NSF, Landis and several workshop participants expressed the need for training programs to emphasize to graduate students and postdoctoral researchers the importance of basic research, its relationship to translational and clinical research, and the
1See http://blog.ninds.nih.gov/2014/03/27/back-to-basics (accessed October 28, 2014).
FIGURE 2-1 Percentage of the competing budget spent on unsolicited, investigator-initiated grants in the four subcategories at the National Institute of Neurological Disorders and Stroke.
SOURCE: Landis, 2014.
need for balance among these three areas (Yamaner, 2014). In particular, Landis and Eve Marder, professor of biology at Brandeis University, stated that trainees need to hear the message that not everyone has to conduct translational research to get jobs and funding. Exposure to this message about the critical role of basic science could occur in core courses as well as nano-courses or seminars that use successful neurological treatments as case examples to trace through lines from basic science discoveries, to their translation into drugs or devices, treatments, and finally to clinical testing. There is scope for specialization; however, one participant noted that training programs can become centers of excellence for basic, clinical, or translational science.
TRAINING IN TOOL AND TECHNOLOGY DEVELOPMENT
Increasingly, basic research discoveries have become dependent on the development of new tools and technologies, as well as the ability to handle, manage, and analyze the large quantities of data being collected with those tools. One participant recalled the deep reluctance that many students in the past had toward working to develop probes or assays, or otherwise push the technological aspects of neuroscience forward. Work on such projects was not highly valued, the participant noted; instead, students were more excited to use the new tools to make discoveries. While making important discoveries is still a priority, much of the current excitement in neuroscience stems from the development of tools and technologies, for example, optogenetics,2 CLARITY,3 and CRISPR.4 Many workshop participants noted that along with this excitement come a number of challenges, not only in training students how to develop powerful tools but also in training students on how best to deploy them while thinking deeply about their limits. As technologies are applied to advanced discoveries in basic neuroscience, there is also a growing realization that those same or similar technologies can be used to provide therapeutic functions, noted Douglas Weber, program manager of the Biological Technologies Office at the Defense Advanced Research Project Agency (DARPA).
Enabling Tool Development Through Transdisciplinary Collaboration
Using DARPA’s Revolutionizing Prosthetics Program as an example, Weber discussed the myriad skill sets needed to develop the next generation of tools and technology. With the rapid growth and diversification of the field of neuroscience, he said, there has been a tendency for disparate groups to work in silos. He noted that groups work across different scales—from molecules to cells to networks—and study different systems—from autonomic and sensory systems to cognitive functions.
2The use of genetically encoded light-sensitive proteins to control neural activity with flashes of light.
3Clear, Lipid-exchanged, Acrylamide-hybridized Rigid, Imaging/immunostaining compatible, Tissue hYdrogel. A process for replacing brain tissue with hydrogels to make the brain transparent in order to visualize neural ensembles.
4Clustered Regularly Interspaced Short Palindromic Repeats. An RNA-based gene-editing platform that allows scientists to engineer any part of the human genome with extremely accurate precision.
Integrating information across these many scales and systems can be challenging. However, Weber expressed hope that these challenges can be overcome through programs such as the Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative,5 which incorporates a strong focus on finding ways to synthesize information across these many scales to yield a more holistic understanding of how the brain works.
The goal of DARPA’s program is to modernize the design and function of prosthetic hands and arms, which have lagged far behind lower limb prosthetics, he noted. Until recently, artificial hands consisted of a hook system attached to a cable wrapped around the user’s shoulder that is controlled using simple shoulder shrug maneuvers. This basic design had remained relatively untouched since the days of the Civil War. In thinking about its redesign, DARPA used as its inspiration the prosthetic hand given to Luke Skywalker after his had been severed at the wrist. That is, they sought to build a realistic-looking articulated hand with several degrees of freedom for the wrist and each digit, all integrated into the user’s nervous system and controlled directly by the brain. Weber mentioned several skill sets that the 400-member team charged with creating the integral pieces of DARPA’s revolutionary prosthetic hand needed:
- Neuroscientists with expertise in sensory feedback and haptics, neural motor decoding and neural stimulation
- Materials science: Materials for every physical piece of the hand—from the lifelike cosmetic covering that needs to be flexible, durable, and waterproof to the biocompatible electrodes that interface with the user’s nerves—need to be carefully selected, designed, and tested
- Systems engineering
- Mechanical engineering
- Software engineering
- Wireless communications
- Signal processing
- Modeling: Models for how information to control specific motor movements (e.g., reaching and grasping) is encoded in the patterns of neural activity that are represented in the brain
- Human factors
- Data analysis
- Behavioral analysis
- Physical therapy
- Occupational therapy
- Human subjects research
- Project and program management
Several workshop participants discussed strategies for enabling this type of transdisciplinary collaboration at the level of graduate training programs to encourage tool development. One example would be developing courses with other departments that offer hands-on labs for students to examine specific topics might encourage collaboration among disciplines. One example of this approach is the University of Pennsylvania’s course on “Brain-Computer Interfaces” in which neuroscientists work collaboratively with engineers and physical scientists on programming projects.6 A few workshop participants noted that another method for encouraging transdiscipline approaches is the NSF Research Traineeship (NRT) grant program (formerly the IGERT [Integrative Graduate Education and Research Traineeship] program),7 which provides training funds for a group of graduate students from different departments within a university to work together on a single project. NIH can facilitate cross-discipline approaches by issuing awards similar to NRT and through the creation of centers of excellence, such as the Morris K. Udall Centers for Parkinson’s Disease Research, which are hosted at nine universities. NINDS has created a novel type of center, called the Epilepsy Centers without Walls, which bring together dozens of scientists to work on a single aspect of epilepsy regardless of their physical location. One center is focused on the investigation of sudden death in epilepsy and includes expertise in neuroscience, genetics, anatomy, clinical research, imaging, pathology, stem cells, informatics, molecular biology, and data analytics. John Morrison, professor of neuroscience at the Icahn School of Medicine at Mount Sinai, also emphasized the importance of neuroengineering and suggested that neuroscience departments establish links with schools of engineering. He also suggested that more universities develop Ph.D. programs in neuroengineering to create more expertise in this area.
6See Chapter 3 for further discussion about this course.
7See Chapter 3 for further discussion about NSF Research Traineeship grants.
Demystifying Neuroscience Tools
Each new neuroscience tool and technique has its own idiosyncrasies and drawbacks, as well as unique demands related to analyzing the data it produces, said Landis. Therefore, she cautioned, it will not be enough to simply know how to use a tool, but rather it is important for trainees to know the fundamentals of the tool and its function(s) and shortcomings, which in turn help them to troubleshoot problems that arise. Marder agreed, stating that trainees need to demystify all of the tools they are using, and not be mere consumers. She highlighted that this is particularly true when it comes to optics and microscopy. For fluorescence microscopy, it is intuitive how the microscope works as one manually focuses and changes the objective. But for 2-photon microscopy and other less intuitive tools, students’ lack of understanding of the technology can be a detriment because they are less likely to recognize when a problem is occurring. Marder is equally concerned with whether next-generation microscopes will be too complicated for most students to learn to use proficiently. The expense of these new microscopes, which can run in the millions of dollars, means that only students enrolled in a few well-endowed programs will have the opportunity to learn to use them. Marder identified this issue as having the potential to be a major gap in training. One step Marder has taken to close the gap in understanding the fundamentals of optics is by encouraging her students to take a microscope and optics lab course at Brandeis University (which is open to neuroscientists), in which students build their own microscopes.
Marder suggested a number of steps that graduate programs can take to enhance students’ understanding of the tools they use. Programs can develop more tool-based lab courses and they can also look to outside sources of training. For example, programs can encourage students to attend courses at Cold Spring Harbor Laboratory and the Marine Biological Laboratory at Woods Hole8 that focus on teaching the fundamentals of a variety of lab tools and techniques. Programs can also fund student enrollment in mini-courses devoted to single techniques that teach trainees the practicalities and specific details of new tools and techniques.
8See further discussion about these courses are provided later on in this chapter.
Dissemination of Tools
To close the gap in student understanding of new tools, several workshop participants asserted that novel mechanisms for tool dissemination are needed. Landis suggested that plans for dissemination of any new tool could be part of the grant applications seeking funding to build the tool. While the BRAIN Initiative has no requirement for such plans in the grants it issues, the BRAIN 2025: A Scientific Vision9 report clearly values the widespread dissemination of the new tools that it is funding (NIH, 2014). Accordingly, the NIH BRAIN Initiative is funding a short course in the use of new tools and another in the analysis of large datasets.10
Some neuroscientists have taken the initiative to set up training opportunities to ensure the spread of the technology they have developed, rather than restricting its access. Optogenetics has been successful in part because its creator, Karl Deisseroth of Stanford University, used a research supplement from NINDS to organize free 3-day workshops to train faculty and students from around the world in the required surgeries and techniques. These are held both in university settings and in course modules at Cold Spring Harbor Laboratory and the Marine Biological Laboratory at Woods Hole. Furthermore, when Deisseroth discovered that scientists were struggling to use one of his more recent technologies, CLARITY, he published a highly detailed methods paper to explain some of the more complex aspects of the technique (Tomer et al., 2014). He has also organized free 3-day workshops on CLARITY throughout the year at Stanford. As is the case for the optogenetics workshops, the CLARITY workshops have a dedicated expert in the technique to act as education manager.
Mark Schnitzer, a scientist at Stanford University, has taken a different approach to disseminating his state-of-the-art invention. Along with several colleagues, Schnitzer founded a company called Inscopix to produce the nVista HD—a miniaturized, head-mounted microscope to visualize large-scale neural circuit dynamics in freely behaving animals. To encourage scientists to use the device, Inscopix has set up a competitive grant program that will offer the use of one to four nVista HD microscopes as well as extensive training in their operation.
9See http://www.braininitiative.nih.gov/2025/BRAIN2025.pdf (accessed October 29, 2014).
10These courses are described in more detail in Chapter 4.
Another enterprising neuroscientist, Raphael Yuste of Columbia University, has recently founded the NeuroTechnology Center along with a chemist, bioengineer, and statistician. The goal of the center is to develop advanced optical, electrical, and computational technologies to study the nervous system. In addition, the center plans to use funds from the Kavli Foundation to offer training in these new technologies to neuroscientists at all levels.
Several workshop participants also discussed opportunities for graduate students and postdoctoral researchers to engage in intensive summer courses in the use of cutting-edge tools, including courses offered by two well-established training facilities:
Marine Biological Laboratory Summer Courses11
- Neural Systems and Behavior
Cold Spring Harbor Laboratory Summer Courses12
- Advanced Techniques in Molecular Neuroscience
- Imaging Structure and Function in the Nervous System
TRAINING IN BIG DATA
Until recently, the primary challenge in neuroscience has been collecting useful information about the brain, said Sejnowski. In the first half of the 20th century, neuroscientists exploited principles of physics to record electrical signals from neurons and develop optical methods to visualize anatomy and morphology. In the latter half of the century, molecular biology techniques further expanded the repertoire of data that could be collected. The next era of neuroscience will be dominated by challenges in the ability to handle, manage, and analyze all of the data that are now becoming readily available, noted Sejnowski. Not only will there be challenges in how to manage this large amount of data, but entirely new methods will be needed for integrating different data types and analyzing enormous, multidimensional datasets, he added.
Neuroscience is not the first discipline to be faced with big data issues. For decades, physicists have had to manage large amounts of data;
however, many of the datasets in physics are collected in manners that have standardized data structures and annotation. Neuroscience data collection is less standardized and the scale and organization more closely resemble the field of genetics, which has been deluged by servers full of genetic data generated by increasingly more powerful sequencing machines since the first genome was cracked more than 20 years ago (Choudhury et al., 2014). Walter Koroshetz, acting director of NINDS, suggested that neuroscience would benefit from considering lessons learned by geneticists regarding their strategies for managing data. Maryanne Martone, co-director of the National Center for Microscopy and Imaging Research at the University of California, San Diego, discussed the critical need for training future scientists to work with big data, focusing on data literacy, data management, and data sharing.
Defining the Gaps in Handling Big Data
In discussing the big data challenges facing neuroscience trainees, Martone quoted Michael Nielsen, author of Reinventing Discovery, “An unaided human’s ability to process large datasets is comparable to a dog’s ability to do arithmetic, and not much more valuable” (Nielsen, 2012, pp. 112–113). She went on to discuss three highly interrelated aspects of data handling that all trainees need to be educated about data literacy, data management, and data sharing.
Martone noted that although not all neuroscientists need to be data scientists they will be required to use platforms to share and analyze data, and need to be able to understand the fundamentals of large datasets. She made the analogy of taking a class on auto mechanics in high school, not because she ever intended to fix her car, but because she wanted to be able to talk to the people who were going to fix the car. Likewise, attaining a minimum level of data literacy will require some specialized training in areas such as data type, structured data, databases, metadata, query languages, and data formats. In addition, an important aspect of data literacy, said Martone, is being able to navigate the “web of data” to find the right dataset. Knowledge of Web services, application program-
Martone pointed out that most of the data that scientists encounter are not actionable. Instead, data get locked away within journals as static figures due to the current publication process. Some journals, such as Nature Scientific Data, are already providing open access databases for all of the data presented in an article’s figures and tables. Martone added that the more that trainees are taught to understand the difference between static and actionable data, the more pressure will be put on all journals to adopt similar practices. Open access to data will also enhance a culture of sharing and make the scientific enterprise more transparent, she noted. Several participants stated that both of these developments might help to address the crisis of irreproducible data that the scientific community is now beginning to face.
Another aspect of data literacy, according to Martone, pertains to knowing one’s data rights. Trainees need to know what rights they have to their data when making them public. They also need to know the rules concerning the use of publicly available data. Specifically, Martone said that trainees need the skills to evaluate which datasets are relevant to their own projects and have been collected with the proper vigilance and rigor.
For data to be useful, they need to be properly managed, noted Martone. That is, they need to be collected in an appropriate standardized format, made readily accessible and interoperable on standardized platforms, annotated, and securely stored. She added that part of the challenge of sharing data is properly annotating them in order for others to understand the context in which they were collected. Having annotation standards in place ensures that each lab that collects a certain type of data could effectively use data shared from another lab. Standards take the guesswork out of what information to collect during the experiment. Many participants stated that standard data formats are also critical to sharing data. According to Brian Litt, director of the Center for Neuroengineering and Therapeutics at the University of Pennsylvania, standard data platforms, rather than a proliferation of individual databases, are helpful for groups and
13Snippets of computer code that allow web-based applications to share information with one another.
14The automatic extraction of useful data from websites.
individuals to keep track of their data, share data with others, and find relevant data that others have shared. Data platforms are central Web-based hubs that can be used to integrate and validate multidimensional, heterogeneous data from multiple sources and present them in a clean, standardized manner. Data platforms can also be used to share experimental procedures, analytics programs, and models.
For high-output labs, which can produce more than a petabyte per year (see Box 2-1), and even many smaller labs, backing up data has become more complicated than simply saving everything to a series of external hard drives or DVDs. Without a well-considered data management strategy in place, data are at risk of being lost, and older data can be difficult to trace. Martone noted that she has heard senior scientists lament the fact that they feel like they have lost control of their own lab because they no longer know where their data are stored.
Some funding agencies, such as NSF, have mandated data management plans to ensure that data generated via agency grants are secure and easily shared. However, because the plans are not enforced, sharing has been stymied and a significant number of labs are still at risk of potential data loss, noted Martone. A change in the overall culture, starting with trainees, regarding data management will be the only effective means of ensuring widespread sharing and prevention of potential data loss, she added.
Martone mentioned several opportunities to improve data management. For example, some labs manage data with electronic laboratory notebooks to keep track of their data and to maintain digital records of experiment notes. Martone also noted that while most universities do not have centralized data depositories or support networks in place many libraries have been serving as curators of the digital assets that the labs at their universities produce. Data curators will be essential to the neuroscience enterprise; however, there is currently a lack of training programs and defined career paths to consider. Until the field of data curation becomes more formalized and more valued by universities, many data scientists will likely occupy a status in labs similar to research technicians employed directly by investigators, said Martone.
How Big Are Big Data?
How much neuroscience data are currently being collected is difficult to quantify, but some high-profile projects have estimated their output:
- Jeff Lichtman at Harvard University estimates his connectomics projects can generate 1 terabyte (Tb) per day (or 365 Tb/year), with a 1 cc brain tissue sample containing roughly 2,000 Tb of data.a
- The Human Connectome Project, which plans to collect diffusion tensor imaging and resting state functional magnetic resonance imaging from 1,200 human subjects, is expected to generate more than 30 Tb of data.b
- The Kavli Foundation estimates that a single advanced brain laboratory could produce 3,000 Tb of data annually—roughly as much data as the world's largest and most complex science projects currently produce.c
- Calcium imaging studies in mice produce approximately 1 gigabit per second of data; anatomical datasets will readily grow to the approximately 10 petabyte scale and beyond.d
ahttp://www.quantamagazine.org/20131007-our-bodies-our-data (accessed October 29, 2014).
bhttp://www.humanconnectome.org/documentation/Q1/data-sizes.html (accessed October 29, 2014).
chttp://www.kavlifoundation.org/science-spotlights/brain-initiative-survivingdata-deluge (accessed October 29 2014)
dhttp://www.braininitiative.nih.gov/2025/BRAIN2025.pdf (accessed October 29, 2014).
Of all the aspects of big data, data sharing was the most frequently discussed by the workshop speakers and participants. See Box 2-2 for recommendations and key points for academic institutions noted in Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk, a report by the IOM’s Committee on Strategies for Responsible Sharing of Clinical Trial Data (IOM, 2015). Although most scientists would agree that making data public helps push science forward, at the individual level there are reservations about sharing that revolve around control, trust, and fear. Several workshop participants noted that educating trainees
about the benefits and risks of sharing data can help to alleviate these emotional concerns and facilitate a shift in culture around sharing. Control over one’s data has always been sacrosanct in science. But as Landis pointed out, data sharing is “the wave of the future” and scientists will no longer be able to take their data to their graves. Trainees, she said, need to embrace the idea of making their data public. Akil brought up a potentially common anxiety among trainees, and scientists in general, about immediately making public the data they spend months or even years of their lives collecting, only to watch their colleagues publish the initial articles related to those data (Soranno et al., 2014).
Litt suggested two potential mechanisms for creating a system that respects the rights of data collectors while maximizing the community’s access to important or hard-to-acquire data. First is the idea of using data licenses to share data in stages or layers. Perhaps data can be initially shared among collaborators or a smaller group of scientists after a set period of time, and then later shared with the whole scientific community, noted Litt. The second idea is a sharing index, or S-index, akin to the well-known impact factor of the proposed H-index.15 The S-index, which would need support from universities, funding agencies, and publishers, could reward prolific sharing by playing a role in hiring and promotional decisions as well as in grant review.
Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk
The Institute of Medicine convened an ad hoc committee to develop guiding principles and a framework for the responsible sharing of clinical trial data. Related recommendations and key points for trainees are listed below.
- Recommendation 1: Stakeholders in clinical trials should foster a culture in which data sharing is the expected norm, and should commit to responsible strategies aimed at maximizing the benefits, minimizing the risks, and overcoming the challenges of sharing clinical trial data for all parties.
- Recommendation 2: Sponsors and investigators should share the various types of clinical trial data no later than the times specified in this (IOM, 2015) report (e.g., the full analyzable dataset with metadata no later than 18 months after study completion—with specified exceptions for trials intended to support a regulatory
15Index used to measure the impact and scientific importance of a researcher’s publications.
application—and the analytic dataset supporting publication results no later than 6 months after publication).
- Recommendation 3: Holders of clinical trial data should mitigate the risks and enhance the benefits of sharing sensitive data by implementing operational strategies that include employing data use agreements, designating an independent review panel, including members of the lay public in governance, and making access to clinical trial data transparent.
Research Institutes and Universities
- Infrastructure Support: “High-quality data curation and management are required to prepare for data sharing, so that investigators must both recognize this need and have appropriately skilled personnel available to them…. Better overall support of the clinical trials enterprise within most institutions is needed to support the kinds of data structuring and documentation that will be needed for data sharing” (p. 62).
- Incentives: “Appropriate recognition of data sharing activities in the promotion process would provide incentives for sharing data and obtaining maximal value from completed trials. Other promotion-related incentives for data sharing would exist if promotion committees took into account secondary publications by others based on clinical trial data produced and shared by their faculty” (p. 63).
- Training: “Most of the workforce that would be involved in activities related to the sharing of clinical trial data are trained in universities. Currently, there is little or no training within traditional clinical research education in the procedures and structures needed to share data. The development of such modules, either online or in classroom settings, could be instrumental in helping to move the field of data sharing forward” (p. 63).
SOURCE: IOM, 2015.
Similar to the idea of an S-index, Martone proposed the notion of separate acknowledgment in papers for those scientists who originally collected the data upon which the paper was based. Headed by Martone, FORCE 1116—a community of scholars, librarians, archivists, publishers, and research funders seeking to improve data sharing—is actively trying to create a mechanism to issue such data citations. Several participants noted that such incentives might help to reduce the anxiety among trainees and investigators, and encourage data sharing.
A few participants noted that trust was another challenge with making data public. Much like clinical trial data, several workshop participants noted that there is a moral imperative to share data to offer the greatest return to the public (see IOM, 2015). In addition, trainees need to learn how to effectively evaluate the trustworthiness of public data, as well as engender trust in data they themselves make public. Several participants stressed that without mechanisms in place to create trust scientists will be reluctant to devote large amounts of time analyzing shared data, or to put their reputations at risk publishing papers about those analyses. Fear of scrutiny and criticism are additional concerns that Martone speculated might make some scientists reluctant to share data. Scientists may be afraid that errors found in the raw data they make public could lead to embarrassment or more serious repercussions.17 One way to alleviate such fears, she offered, is for a certain level of data etiquette to develop around sharing so that unintentional errors found in data are dealt with in a non-punitive fashion.
Setting aside the various reservations scientists have about making their data public, Koroshetz, as well as several other participants, said that annotations, or metadata, are the most expensive and time consuming part of sharing data. Most experimental data have several pieces of metadata associated with them to include stereotaxic coordinates, cell type, stimulation parameters, and other experimental conditions. Even seemingly innocuous factors, such as the sex of the experimenter or the source of the food, have been known to significantly alter results in experiments with rodents. According to a few participants, tagging each set of electrophysiology traces or fluorescent images with the appropriate annotations is not trivial, but this information needs to be integrated into the experiment workflow to maximize the utility of any shared data. Another challenge with metadata noted by several workshop participants is determining which parameters need to be included and which can be reasonably excluded.
Several participants agreed that not all data are worthy of being shared, particularly given the potential cost; for example, Koroshetz noted
17The Research Council of Norway: Norwegian Researchers Want to Share Data but Fear Jeopardizing Their Career. See http://erc.europa.eu/sites/default/files/content/pages/pdf/2.4%20Roar%20Skalin.pdf (accessed October 29, 2014).
Survey data in this workshop summary show that approximately one quarter of scientists say data sharing will negatively impact their careers due to at least one of the following reasons: making data available takes away valuable time for research; lack of technical infrastructure; open access would reduce possibilities of scientific publications; concerns connected to misinterpretation of data; and/or cannot give access due to sensitivity issues. Scientists with less than 3 years’ experience reported fewer concerns over sharing data.
the cost of NINDS’s databases for traumatic brain injuries ($2 million/year), autism ($2 million/year), and Alzheimer’s disease ($1.5 million/year). Some data, such as the human subject data from the Framingham Heart Study,18 are rare, while other data will become obsolete as technology continues to improve. In addition, Litt stated that public data deemed to be more valuable are also more likely to be annotated by users until they eventually become a gold standard. Sejnowski recounted an example from astrophysics with regard to creating high-quality public data. Grants for the Hubble Space Telescope are issued in two tracks: (1) a typical R01-style study where data collection leads to individual publications; and (2) the collection of archival datasets that require significant effort to calibrate, but that are still used extensively as standards against which to compare new data. Sejnowski noted that neuroscience would benefit if NIH funded similar types of calibrated datasets.
As Landis mentioned, it is not enough to hope that trainees will pick up enough knowledge about data handling through informal means; trainers need to have an active role structuring programs for these competencies to be developed. Trainees can be exposed to data-handling issues in lab courses, seminar series, or webinar series (see example skills in Box 2-3). According to a few workshop participants, training programs can set requirements that students write data-management plans for their projects to accompany their thesis proposal or their Ruth L. Kirschstein National Research Services Award or NSF grants, which many programs already require as part of students’ qualifying exams. In addition, training programs can consider having an expert on data handling on staff, or share such a person with one or more departments, to act as a resource for students and faculty.
Example Data Handling Skills and Knowledge Presented by Individual Participants
- Data management plans (and funding agency requirements)
- Data-sharing platforms
- Incentives for sharing (data citation, S-index)
- Evaluation of data trustworthiness
- Evaluation of data worth
- Data licenses
18See https://www.framinghamheartstudy.org/about-fhs/history.php (accessed October 29, 2014).
- Data rights
- Data standardization
- Data formats
- Data annotation
- Open-access journals
- Actionable versus static data
- Application program interfaces
- Web scraping
- Web services
- Online databases
- Cloud computing
- Data storage
NOTE: The items in this list were addressed by individual participants and were identified and summarized for this report by the rapporteurs. This is not intended to reflect a consensus among workshop participants.
Defining the Gaps in Data Analysis
Although all neuroscientist trainees would benefit from training in best practices for data literacy, management, and sharing, a number of special skills are required to analyze large, complex datasets. Litt enumerated those skills and identified the best disciplines outside of neuroscience with which to build collaborations to address gaps in data analysis (see Box 2-4). Litt also described two projects he is involved with that strive to enhance training in data analysis among graduate students:
- American Epilepsy Society Seizure Prediction Competition: Competitors are invited to download large datasets of intracranial electroencephalogram (EEG) recordings from dogs with epilepsy and develop algorithms to optimally predict seizure onset.
- www.ieeg.org: Litt engineered a model data-handling platform—found at ieeg.org—that lives on Amazon’s S3 browser-based cloud computing service. The platform, currently used by more than 500 people, enables sharing and annotation of computer code and EEG data from epilepsy patients. It also provides tools for large-scale analyses. Trainees at University of Pennsylvania’s Center for Neuroengineering and Therapeutics are required to use this platform, not Litt. They learn how to version their code, share data (structuring it into a common format), and use the cloud.
Data Analysis: Relevant Skills for Neuroscientists and Disciplines to Build Collaborations
Data Analysis Skills Relevant to Neuroscientists
- Visualization software
- Multivariate statistical analysis
- Competence in cloud computing (storage, retrieval, and distributed processing)
- Versioning of computer code script files
- Digital signal processing (aliasing, Nyquist, analog to digital transforms, filtering)
- Feature extraction (time, frequency, wavelet, chaotic)
- Data classifiers (supervised and unsupervised)
- K-nearest neighbor algorithmf
- Support vector machines
- Data clustering
- Data basics (storage, databasing, integration, search, provenance)
Disciplines with Which to Collaborate on Data Analyses
- Computer science
- Machine learning
- Signal processing
- Materials science
fhttp://www.statsoft.com/textbook/k-nearest-neighbors (accessed October 29, 2014)
SOURCE: Brian Litt presentation, University of Pennsylvania, October 28, 2014.
Institute Example: Allen Institute for Brain Science
Jane Roskams, executive director of strategy and alliances at the Allen Institute for Brain Science (AIBS), described the goal of AIBS as making neuroscience tools, data, and knowledge readily and freely available to the scientific community. AIBS employs multidisciplinary teams with experts in neuroscience, cell biology, modeling, data analysis, theory, engineering, and genetics. Over the past 10 years, AIBS has collected more than 30 brain atlases (mouse, non-human primate, and human) and other large neuroscience-related databases. These atlases and databases, which contain more than three terabytes of combined data, are freely available to the public via an online portal. AIBS offers numerous opportunities for collaboration and training related to data management and analysis through traditional classroom training sessions, summer workshops, hackathons, and online webinars.19
19See http://alleninstitute.org/news-events/events-training (accessed October 29, 2014).