Many questions have arisen in developing these and other databases. Which digital data and data stored on film need to be stored? Do calibrations (i.e., the characterization of an instrument’s response to known stimulus) need to be stored, and if so which ones? Should proprietary tools be stored so that users can see how the primary data were processed? For now, there is reason to err on the side of depositing too much data, because no one knows what subsequent researchers will need. However, it is likely that just a small percentage of databases will find widespread use, which complicates, rather than simplifies, the task of storage.
Complex databases always include errors. Obvious errors, such as coordinates that lie outside the brain, can be found more easily when data are shared. However, policing data before they are added to a database can be so time-intensive that it can discourage database building. Fortunately, new technologies for assuring the quality of data based on advances in such areas as pattern recognition and learning theory, combined with rapid advances in data processing and storage, are providing new and automated methods for testing the quality of data.
Another problem is that most data assigned to databases in the neurosciences are not adequately annotated, and even those with annotation tend to use nonstandard terminology, making them “islands” of diverse resources. Such databases may not be useful for comparative studies or other purposes.
Issues of who has rights to use data also are far from resolved. A researcher may work for 5 years to assemble data on a transgenic mouse and be reluctant to give the data away. To make data open and accessible, incentives may need to be developed to encourage scientists to share their data.
Another issue is whether journals may be responsible for receiving and storing all primary or supplementary data. Most publications lack a suitable place to enter and store supplementary data, and who should pay for this service remains unresolved.
These issues, most of which we discuss later in this report, are being extensively explored in the research and policy-making communities. Many questions do not yet have clear answers that extend across all research disciplines.
SOURCE: This box draws on presentations to the committee by David Van Essen, Washington University in St. Louis, and Maryann Martone, University of California, San Diego on December 10, 2007.