British Oceanographic Data Center
Let me start by providing some background information about the key players in the partnership that has come together to foster data publication in the ocean sciences. The Scientific Committee on Oceanic Research (SCOR) is an international non-governmental organization formed by the International Council of Scientific Unions (ICSU, now the International Council for Science) in 1957. The Committee has scientists from 36 countries participating in different working groups and steering committees. It promotes international cooperation through planning and conducting oceanographic research, and solving methodological and conceptual problems that hinder research.
The second partner organization is the International Oceanographic Data and Information Exchange (IODE). This is a data and information exchange program of UNESCO’s Intergovernmental Oceanographic Commission (IOC), commenced in 1961. The main goal of this program is to establish national oceanographic data centres or coordinators in IOC member states in order to acquire, enhance, and exchange oceanographic data and information. It also aims at extending the national oceanographic data center network through training and capacity building.
The last player in this partnership is the Marine Biological Laboratory Woods Hole Oceanographic Institution (MBLWHOI) Library. The Woods Hole scientific community library has a strong interest in data publication in digital libraries. The Digital Library Archive (DLA) contains:
• WHOI archives;
• Historical photographs and oceanographic instruments;
• Scientific data, e.g., echo sounding records from WHOI research vessel expeditions;
• Technical report collections; and
• Maps, nautical charts, geologic and bathymetric maps, and cruise tracks.
The group had a series of meetings between June 2008 and April 2010 and there is another meeting scheduled for November 2011. The group’s objectives are to:
1 Presentation given by Sarah Callaghan and slides are available at http://sites.nationalacademies.org/PGA/brdi/PGA_064019.
• Engage the IODE data center and marine library communities in data publication issues.
• Provide a network of hosts for cited data.
• Motivate scientists through reward for depositing data in data centers.
• Promote scientific clarity and re-use of data.
However, engaging IODE data centers effectively in data publication and distribution encounters a problem of different approaches. One model is as follows.
Data can change significantly as additional value is added by the data center through metadata generation, quality control (e.g., flagging outliers), and the like.
The “best available” data are served by the data center to other users during data evolution, which means that the dataset is continually changing with no snapshots preserved or formal versioning during work-up. This makes it difficult to go back and get the same data that you got a year or six months ago.
The second model is the Digital Library Paradigm.
A dataset is a “bucket of bytes,” which is:
• Fixed (checksum should be a metadata item)
• Changes generate a new version of the dataset
• Previous versions must persist
• Accessible online via a permanent identifier
• Usable on a decadal timescale (using standards such as the Open Archive Information Standard)
• Citable in the scientific literature to provide links to marine libraries
To summarize these data distribution paradigm issues, the problem is to find ways for IODE data centers to engage in digital library practices while leaving current infrastructure largely intact. Change should happen gradually through evolution and not revolution. Probably the best way to do that is through pilot projects at the British Oceanographic Data Center (BODC) and WHOI.
To that end, the BODC has started a pilot project activity with a decision to establish a repository at IODE called Published Ocean Data (POD), where data will be accessible to many data centers, with technical quality control and good long-term stewardship credentials in place. The process to achieve this goal has taken longer than anticipated due to extended discussions and resource availability. However, specifications are being produced and accepted now, and the actual building of the systems will start in the fall of 2011.
As for the WHOI pilot project, the MBLWHOI library has loaded a number of datasets from the National Science Foundation’s (NSF) Biological and Chemical Oceanography Data Management Office (BCO-DMO). The datasets have been associated with published journal articles. For example dx.doi.org/10.1575/1912/4199, resolves to: https://darchive.mblwhoilibrary.org/handle/1912/4199).
The group is also working with a scientist who is submitting a paper to the American Geophysical Union in September, with a complete publishing process use case including DOI assignments to datasets supporting specific figures. These dataset citations will be incorporated in the final version of the paper, subject to publisher approval. Furthermore, talks are underway concerning incorporation of the Woods Hole Open Access Server (WHOAS) repository in an NSF proposal data management plan. Finally, this partnership also has plans for collaboration with BCO-DMO to develop an automated publication system for all data center accessions.
Let me conclude with a summary of our future plans. We will:
• Complete the pilot projects identified earlier.
• Engage other data centers in data publication through reporting our experiences and disseminating knowledge through appropriate routes, such as workshops, conferences and other publications.
• Engage SeaDataNet II when it starts later in 2011.
• Continue outreach activities to scientific, data management, and marine library communities.
• A further meeting is planned to be held in Liverpool, UK, on November 3-4, 2011.
• Expand BODC activities into an operational service.
• Develop the MBLWHOI Library BCO-DMO ingest system.
This page intentionally left blank.