As Professor Bretherton was describing a structure of trees, roots, and branches in his talk, 5 it seemed to me that he was characterizing a functionality such as DSpace. Institutional repositories of this kind present an opportunity for robust roots in that tree structure that would enable faculty to build repositories of work that they would like to share through just such a model of distribution and management.
There are a number of interesting issues that arise from the design of DSpace. For example, we are interested in the prospect of using Creative Commons licenses as a way of helping those who deposit material in DSpace signal the way they would like to have their material used. There are no conventions such as Creative Commons licenses available for submitters right now. So if a faculty member wants to deposit his or her work in a digital repository that will serve it to the world, and maintain it over time, there is no existing set of licenses that can be built into the metadata that will tag to identify how the work can be used going forward.
We believe that a federation of interested institutions will be needed to establish and maintain sustainability for a digital repository. So our great hope, and our reason for writing the code in open source, is that there will be sufficient interest, first across the United States and perhaps internationally, in the idea of building digital repositories at the institutional level. Despite what one hears about how easy it is to create digital content, preserving, maintaining, and keeping digital content persistently available is a research challenge. Our hope in federation is that we will be able to share that challenge across multiple institutions.
There is also no clearly established model for a relationship of this kind. Libraries themselves have quite a fine and interesting experiment that is now well over 30 years old called the Online Computer Library Center in which libraries have banded together to share cataloging data in a not-for-profit library-managed enterprise. 6 That enterprise has been an interesting model for us as we think about how one would federate digital repositories across institutions. So we think that the library community can figure out how to do this.
A final challenge to DSpace is that disciplines vary greatly in terms of what their expectations are for a repository. Some of the early adopters of the DSpace repository are faculty in ocean engineering, and they deal in datasets that are terabytes in size. Some faculty have large collections of images. Other faculty have much smaller, more text-oriented expectations, which illustrates the fact that scientists like science, not database administration. They have always expected that libraries would be there for them, and so we are. On the other hand, the challenge to us is to help scientists take advantage of new tools.
At the end of the day, research universities, scientists, and funding agencies need a new alliance. We need strategies to advance and expand research-based education. We need to be able to educate and conduct research without Draconian external rules. This probably means developing our own systems for the exchange of data and information on a direct institution-to-institution basis. We need to assure persistent availability and accessibility of research data. This probably means keeping it as close to scientists as possible, and it means new IP options like the Creative Commons need to be deployed.
We need to solve the challenge of the born-digital world. Researchers and educators now routinely produce work that has no paper analog. We know that a great deal of work already has been lost, and we are deeply concerned that there are no easy ways to approach the long-term archiving of work that is digital only. As such, we are faced with the prospect of sentencing work to a five-year shelf life—or only for as long as the proprietary software is interested in addressing the problem.
Last, we need to solve the archiving problems. Bruce Perens and I were talking about some work that he has done in restoring works that Disney owns that were damaged. The cost and effort were phenomenal. Clearly, the losses are mounting similarly in the higher education and scientific communities. Yet we do not have Disney's money. We need a long-term solution to archiving.
Through OpenCourseWare and DSpace, MIT is working hard to develop some prototypes, to share the ideas and the software behind those prototypes, and to interest others in joining us in meeting the challenge.
5See Chapter 8 of these Proceedings, “The Role, Value, and Limits of S&T Data and Information in the Public Domain for Research: Earth and Environmental Sciences,” by Francis Bretherton.
6See the Online Computer Library Center Web site at http://www.oclc.org for additional information.