• Engage the IODE data center and marine library communities in data publication issues.
• Provide a network of hosts for cited data.
• Motivate scientists through reward for depositing data in data centers.
• Promote scientific clarity and re-use of data.
However, engaging IODE data centers effectively in data publication and distribution encounters a problem of different approaches. One model is as follows.
Data can change significantly as additional value is added by the data center through metadata generation, quality control (e.g., flagging outliers), and the like.
The “best available” data are served by the data center to other users during data evolution, which means that the dataset is continually changing with no snapshots preserved or formal versioning during work-up. This makes it difficult to go back and get the same data that you got a year or six months ago.
The second model is the Digital Library Paradigm.
A dataset is a “bucket of bytes,” which is:
• Fixed (checksum should be a metadata item)
• Changes generate a new version of the dataset
• Previous versions must persist
• Accessible online via a permanent identifier
• Usable on a decadal timescale (using standards such as the Open Archive Information Standard)
• Citable in the scientific literature to provide links to marine libraries
To summarize these data distribution paradigm issues, the problem is to find ways for IODE data centers to engage in digital library practices while leaving current infrastructure largely intact. Change should happen gradually through evolution and not revolution. Probably the best way to do that is through pilot projects at the British Oceanographic Data Center (BODC) and WHOI.
To that end, the BODC has started a pilot project activity with a decision to establish a repository at IODE called Published Ocean Data (POD), where data will be accessible to many data centers, with technical quality control and good long-term stewardship credentials in place. The process to achieve this goal has taken longer than anticipated due to extended discussions and resource availability. However, specifications are being produced and accepted now, and the actual building of the systems will start in the fall of 2011.