The Role of the Research University in Strengthening the Intellectual Commons: the OpenCourseWare and DSpace Initiatives at MIT
The challenging environment that has been described during the course of this symposium has the potential to profoundly affect research universities. The economists among us understand that markets are not normally passive in the face of changing economic forces, and research universities do not have the luxury of standing passively by as intellectual property laws and norms change. Research universities are mission-driven, not-for-profit enterprises. They may host technology licensing offices, but their primary mission is education and nonprofit research.
Those outside the academy sometimes talk about the research university community as though it were some kind of monolithic industry. In truth, not only is there considerable variation among research universities, there is also a distinct lack of uniformity within institutions. A personal story illustrates this point. Shortly after I joined the Massachusetts Institute of Technology (MIT), I was worried about a piece of legislation that was coming before the U.S. Congress. I approached the vice president of research at the institute to express my concern and urged MIT to take a position on the issue. His response was, “if you can figure out who MIT is, then maybe you can persuade them to take a position.”
We need to consider this heterogeneity, as well as traditions of intellectual freedom, when we talk about the research university. It is difficult to generalize, because healthy research universities have many diverse activities going on simultaneously under one roof, which is entirely consistent with the mission of such organizations. The mission of MIT, for example, is “to advance knowledge and educate students in science, technology and other areas of scholarship that will best serve the nation and the world in the 21st century.” The mission statement goes on to say that the institute “is committed to generating, disseminating and preserving knowledge and to working with others to bring this knowledge to bear on the world’s great challenges.” MIT’s Technology Licensing Office, like other university licensing offices, operated with the institute’s mission, policies, and procedures.
Research universities play a significant role in the value chain of new knowledge creation in science and technology. At the risk of stating the obvious, research universities recruit and retain faculty and research scientists. They admit students and support them with financial aid. They cover the growing portion of the nonrecovered expenses associated with research. They invest in education and research technology, and they pay for the network infrastructure on our campuses, as well as the libraries. Libraries, by the way, are investing an increasingly significant percentage of their material and resource budgets in support of databases and database resources.
We have heard a fair amount at this symposium about the role of information technology (IT) and the growth in complexity of the scientific and technical data environment. The fact is that IT has affected other aspects of the
university mission as well. Students expect to have ready access to the full panoply of digital content, data included. Faculty need an increasingly sophisticated work environment and a fair amount of IT support. Faculty also need bandwidth, systems support, and library resources delivered to the desktop; all of which require institutional investment—as do new labs, and redesigned, updated existing labs, and new buildings in which to put big science.
Research universities also need academic disciplines to consider how new research methods should be calibrated in the context of promotion and tenure decisions. There was an interesting conversation at MIT not too long ago about how faculty members can subject their work to the scrutiny of peer review when the scholar is working in an environment that is entirely electronic and on the Web.
Finally, IT in the legal, regulatory, and compliance context has caused a distinct rise in the cost of doing business for research universities. For example, libraries now license databases. The MIT libraries are probably not unusual in dedicating two full-time people to license agreement negotiation. Likewise, MIT’s information systems department has a staff of lawyers who negotiate software licenses for the university. And there is now a staff member whose job it is to respond to calls relating to the provisions of the Digital Millennium Copyright Act (DMCA). Bear in mind that people can call up and demand that you take content down, and the bias is in their favor. Research activities also incur new expenses. Vice presidents of research work hard at protecting the scientists and students in their institutions.
These new costs of doing business are all total overhead. They are defensive, and they are dead-weight costs to the university. There has been some question earlier in this symposium about whether the DMCA costs money. The answer is yes, the new legal environment costs a great deal of money in dead-weight overhead.
As research universities have confronted the reality of rising dead-weight costs and diminishing flexibility, in an increasing Draconian intellectual property (IP) environment, they have become increasingly aware of the importance of advocating for and supporting openness. The best students will be diverse in their nationalities and religions. These students need access, as we have heard from Harlan Onsrud,1 to high-quality information, and they should not have to choose between buying lunch, buying a dataset, or buying an article. Faculty should be able to teach the best way they know how, without the requirement to plow through endless permissions and approval processes to obtain the ability to use in their courses information that is now (by default) protected by IP regimes.
You have already heard a great deal about the importance of openness in research and about the need for work to have visibility and impact. I want to second Harlan Onsrud’s comments on subscriptions.2 Because as a practical matter, if the size of a subscription base is reduced to 50 institutions, and the license terms of digital access to that database prohibit interlibrary loan, then one really has to ask the question as to whether publishing in that particular journal does indeed provide appropriate visibility and impact.
One might reasonably conclude from these remarks that research universities, and the home that they provide for many scientists and engineers, are in deep trouble. Imagine for a moment that universities are not focused on teaching students, are not conducting not-for-profit research, but rather are engaged in some other enterprise. Imagine a business trying to operate under the various constraints and uncertainties that apply to research universities. Imagine that MIT was not MIT, but rather a major metropolitan newspaper publisher. The business would be in a situation where its staff authors were writing material on the premises and then sending that material to an outside third party. This newspaper would then have to buy back the work of their staff authors at an arbitrary price so that it could be used in the publication of the newspaper. Imagine, worse yet, that the third parties were near monopolies. Imagine, moreover, that the newswire services it dealt with chose to license content to it under arbitrary and unilateral terms, so that it had no control over the data stream. Then imagine that compliance with all the IP requirements that surround the content that goes into the newspaper had become unrecoverable and unsustainable. This is the situation that universities find themselves in today.
We believe, at least at MIT, that new educational and information data management strategies are required. There are two initiatives under way at MIT, which reflect a market response based on a commitment to openness.
See Chapter 25 of these Proceedings, “Emerging Models for Maintaining Scientific Data in the Public Domain,” by Harlan Onsrud.
Both initiatives emerged from faculty desires and needs as they were articulated and are intended to give faculty a new set of tools that will enable them to create new approaches to and methods for managing their intellectual work. Both initiatives illuminate how profoundly the post-DMCA environment has already distorted work in the academy, and both point to the importance of initiatives such as the Creative Commons.
The first of these initiatives is the OpenCourseWare initiative.3 OpenCourseWare intends, eventually, to put all of MIT’s courses on the Web, free of charge. It illuminates the intellectual framework for how MIT approaches the challenge of teaching MIT students. In a sense, we are publishing MIT courses on the Web so that the educational strategy that MIT uses can be shared openly across the world. This is an effort to create a public good, to put into the public domain what MIT knows about teaching the kinds of students who come to MIT.
In developing this initiative, we encounter the post-DMCA problem. For example, who knows what agreements or license terms apply to the material that is embedded in the faculty member’s course notes? Courses are littered with IP that may have no bonafide reference, where there is no way of tracking the ownership, and for which there is no way of understanding whether the person who contributed that information to a colleague’s teaching activities also intended that it should be put up on the Internet free of charge for the world to see. These issues are very complicated, given that everything is in all probability owned by somebody.
We are working through a variety of astonishingly complex issues as a result of our attempt to make a public good out of a traditional way of teaching. Clearly there are some things that are more problematic than others, just by virtue of the way the law works. Recommended and required reading is going to be difficult to post on the Web without permission from publishers. In our first efforts to obtain permission from publishers, 80 percent of publishers denied permission to post materials free of charge on the Internet in this context, even though what was being posted was a minimal part of any one publication, and even though you could imagine it operating as advertising for that publication.
There are also complexities around software. If a faculty member uses a piece of software in his or her course, and that software comes with a particular licensing agreement to MIT, what were the terms of that licensing agreement? Was it negotiated by the department, or by the faculty member, or by the institute? Is the faculty member using a site license or an individual license? What are the terms and conditions of use of the software that faculty use to manipulate and create content that they would ordinarily consider as essential to the course?
Data are an equally big issue in the OpenCourseWare context. Faculty would like to be able to provide actual access to raw data, particularly in the case of social sciences and hard sciences, so that a student visiting the site could understand the pedagogical intention of the faculty member. Given what we have heard during this symposium in terms of reach-through claims in the patent environment, there is a similar concern about the capacity of original publishers to reach through the teaching environment and constrain what can be put on an open Web site.
In the legal environment in which we currently operate, OpenCourseWare is a publication mechanism of MIT. As such it becomes a highly visible target for those who would object to the use of anyone’s IP in an environment that would otherwise pass the four tests of fair use in the institute’s internal teaching activities. OpenCourseWare is not intended for profit, it does not use a significant percentage of an individual work. It is intended to be factual rather than creative. The market impact of any item is minimal at best. Yet commercial publishers are concerned about putting content up in the OpenCourseWare environment. As a consequence, we are inspired at MIT to think about enabling new tools so that faculty can behave differently. Perhaps the rules of the game can change as well.
The second initiative at MIT is an initiative called DSpace.4 This is not a publishing enterprise. It is an institutional response to faculty having called the libraries and asked “Can I put my stuff with you? I have all this digital content, and no place to keep it. Will you take it for me?” DSpace is a way for faculty to put the good material that they have prepared and are ready to share with the world in a secure stable, preservable, dependable repository with distribution capabilities. DSpace was built in partnership with Hewlett Packard and with additional support from the Mellon Foundation, and it is being written in open source.
As Professor Bretherton was describing a structure of trees, roots, and branches in his talk,5 it seemed to me that he was characterizing a functionality such as DSpace. Institutional repositories of this kind present an opportunity for robust roots in that tree structure that would enable faculty to build repositories of work that they would like to share through just such a model of distribution and management.
There are a number of interesting issues that arise from the design of DSpace. For example, we are interested in the prospect of using Creative Commons licenses as a way of helping those who deposit material in DSpace signal the way they would like to have their material used. There are no conventions such as Creative Commons licenses available for submitters right now. So if a faculty member wants to deposit his or her work in a digital repository that will serve it to the world, and maintain it over time, there is no existing set of licenses that can be built into the metadata that will tag to identify how the work can be used going forward.
We believe that a federation of interested institutions will be needed to establish and maintain sustainability for a digital repository. So our great hope, and our reason for writing the code in open source, is that there will be sufficient interest, first across the United States and perhaps internationally, in the idea of building digital repositories at the institutional level. Despite what one hears about how easy it is to create digital content, preserving, maintaining, and keeping digital content persistently available is a research challenge. Our hope in federation is that we will be able to share that challenge across multiple institutions.
There is also no clearly established model for a relationship of this kind. Libraries themselves have quite a fine and interesting experiment that is now well over 30 years old called the Online Computer Library Center in which libraries have banded together to share cataloging data in a not-for-profit library-managed enterprise.6 That enterprise has been an interesting model for us as we think about how one would federate digital repositories across institutions. So we think that the library community can figure out how to do this.
A final challenge to DSpace is that disciplines vary greatly in terms of what their expectations are for a repository. Some of the early adopters of the DSpace repository are faculty in ocean engineering, and they deal in datasets that are terabytes in size. Some faculty have large collections of images. Other faculty have much smaller, more text-oriented expectations, which illustrates the fact that scientists like science, not database administration. They have always expected that libraries would be there for them, and so we are. On the other hand, the challenge to us is to help scientists take advantage of new tools.
At the end of the day, research universities, scientists, and funding agencies need a new alliance. We need strategies to advance and expand research-based education. We need to be able to educate and conduct research without Draconian external rules. This probably means developing our own systems for the exchange of data and information on a direct institution-to-institution basis. We need to assure persistent availability and accessibility of research data. This probably means keeping it as close to scientists as possible, and it means new IP options like the Creative Commons need to be deployed.
We need to solve the challenge of the born-digital world. Researchers and educators now routinely produce work that has no paper analog. We know that a great deal of work already has been lost, and we are deeply concerned that there are no easy ways to approach the long-term archiving of work that is digital only. As such, we are faced with the prospect of sentencing work to a five-year shelf life—or only for as long as the proprietary software is interested in addressing the problem.
Last, we need to solve the archiving problems. Bruce Perens and I were talking about some work that he has done in restoring works that Disney owns that were damaged. The cost and effort were phenomenal. Clearly, the losses are mounting similarly in the higher education and scientific communities. Yet we do not have Disney’s money. We need a long-term solution to archiving.
Through OpenCourseWare and DSpace, MIT is working hard to develop some prototypes, to share the ideas and the software behind those prototypes, and to interest others in joining us in meeting the challenge.