The Digital Library: An Integrated System for Scholarly Communication
Richard E. Lucier
University of California
Scientific journals have served scientists well for many decades. They have provided a viable means for scientists to communicate their findings to their peers and have served as well as an archival record of scientific progress. Now, however, we are seeing the beginnings of a significant evolution away from what we know as the traditional scientific research journal.
This article is divided into three parts. The first part provides context by outlining what is driving some of the issues that are discussed. Second, the notion of what could be meant by a digital library is discussed. In doing so, I describe what we are doing in California—the California Digital Library—as well as the digital library in general. What we are doing in California is an example of what might occur in other places, so that it is useful and instructive to talk about some of the specifics of that project. Finally, I discuss alternative forms of scholarly communication, i.e., alternatives to the tradiltional research journals.
Crisis in Scholarly Communication
We have been hearing for the past decade that academic research libraries are in crisis. The fact of the matter is that these libraries, as we know them, including the services that we have come to expect from them, are simply no longer sustainable in their current form. Projected costs, for both the acquisition and the storage of information, are significantly higher than the universities can possibly sustain. While current data indicates that university fees are going up 8 percent, library acquisition costs (in the sciences) have been rising at a significantly higher rate, namely 15 to 20 percent annually for the past several years.
It is convenient to blame publishers for these increases and this crisis. Certainly some publishers, particularly commercial publishers, are part of the cause of these difficulties. In many respects, however, the productivity of scientists is a more significant causal factor. Scientists are producing more and more information, e.g., the American Chemical Society is publishing 10 percent more pages each year, and many other publishers report annual increases of 10 to 20 percent. If this increase in the rate at which information is published continues, and that seems likely, university libraries will be unable to
provide scientists with access to that information if the traditional process of scientific communication is maintained.
At the same time, we are now at an early evolutionary stage in the use of digital technology in scholarly communication. What will happen as this evolution continues—how scholars and scientists will eventually integrate digital information technologies into their work—is not yet clear. A number of issues like cost, ease of use, and academic culture are going to have a major impact on the future application of digital technology in this area. There are many who believe that the application of digital technologies to scholarly communication is as revolutionary as the use of the printing press, and indeed I think that is the case. But it is going to take several years to see how this evolves and what the implications are.
Solutions and Strategies
A number of universities are exploring optimal strategies for dealing with this transition in the management of scholarly information and optimizing the opportunities presented by digital technologies. The University of California (UC) has recently completed a 4-year planning process examining all of these issues. The following conclusions form the basis for strategic action:
- Comprehensive access to information is going to replace comprehensive ownership of information. Remember when you expected to be able to get your favorite journal from your university or corporate library? In the future your library will provide you access to that information in some reasonable period of time.
- Solutions will unfold organically. A traditional plan is not desirable.
- The digital library is an agent of change.
- UC should build one digital library.
The California Digital Library
As part of our planning for libraries and scholarly information at the University of California, we developed a shared vision of the library appropriate for the university: A World-class Research Library for the 21st Century Consisting of Complementary Paper and Digital Libraries Comprising a University-wide Knowledge Network With Services Delivered at the Point of Need.
The digital component of this library has been named the California Digital Library (CDL), to reflect its potential to serve all of California, not just UC. Created in October 1997 by the University of California Board of Regents and the president of the university, the CDL will open its “digital doors” in January 1999.
The digital library can be viewed as an integrated system for the management of scholarly information. This moves the library beyond its traditional roles of storage, preservation, and access to an active player in scholarly communication through support for alternative forms of publication, which exploit digital technology.
In this context, the digital library will have the following components:
- High-quality digital resources;
- Consistent network interface;
- Distributed services integrated with digital resources at the point of service;
- Alternative forms of publication and the digital dissemination of scholarship; and
- Underlying business and economic models that are sustainable.
It is important to note that the digital library is not going to replace the paper library for the foreseeable future. Rather, the two are going to coexist, complementing one another. How we implement this complementarity is going to be critical to the continued smooth functioning of the scholarly enterprise.
Content is the heart of all libraries, and that is true for the CDL, as well. Already, we see that there are many different kinds of digital collections. One is the traditional published journal literature in digital format. While this is important at these early stages of building a digital library, it will likely become less important in time. Even at the outset, we have a second kind of content, "digital at birth" content, which has always been in digital form only. Third is primary source data. For libraries, that includes special collections, but our faculty also have a lot of primary source data that we are trying to build into our collections. A fourth important area of content for digital libraries is museum collections. Stronger relationships between libraries and museums are developing in the digital era. Last is what we refer to as alternative forms of scholarly communication. Over the next 15 years the division between journals on the one hand and these alternative communication forms on the other is expected to change significantly. By the year 2015, we expect that less than half of the kind of information that we will be providing in the CDL will be in the form of traditional journals in digital format.
It is important to review the changes taking place with respect to the acquisition of digital content. In many instances, libraries are not buying digital journals and databases; instead, they are licensing them. This change has several important implications. Licenses have very different terms and conditions, depending on the publisher. Of critical importance to the library and user communities is the notion of perpetual access. The approach of many publishers has been to license information for the year in which you pay for it. Thus, if you paid for a 1998 subscription in 1998 but not in 1999, you would lose access to 1998 content in 1999. This is in marked contrast to the past. Traditionally, if the library bought a paper journal, it always had that journal. That is not the case with electronic material. These and many other issues surrounding licensing mark a new relationship between publishers and libraries, one full of challenges and opportunities, requiting great vigilance and care on the part of the library community if it is to ensure access for research and education.
Licensing has allowed us to develop large collections of digital material in a short time frame. When we open the California Digital Library, we will have more than 3,000 electronic journals in scientific and technical areas. That, along with the other databases, is a significant amount of content. Licensing has also provided a platform for cost control. By forming and joining large consortia or groups of libraries to license electronic material, we are able to leverage our collective buying power. This has allowed us to buy more materials than we would have, if we had individually tried to license this material. It also sets the stage for how we are going to work with the publishing community in the future. One of the first digital journal licenses we signed at UC was with the American Chemical Society. It is important to note that ACS has been willing to develop a model that is more beneficial to users than the models developed by many other publishers, particularly commercial ones. We have appreciated ACS's willingness to listen and respond to our concerns.
Licensing is a useful strategy to aid in the transition from paper to digital journals, in moving from an ownership model to a service and access model. In 10 years, however, its various components may not be as significant as they are now. Hopefully, licensing at some level will evolve into a standard operating procedure, with much less emphasis placed on negotiating individual terms with each content provider.
Alternative forms of Scholarly Communication
We are currently using technology in its first phase of adoption: modernization; that is, we are replacing traditional paper journals with traditional digital journals. What we really need to do is to lay the foundation for a future in which how scholars communicate and access the results of research around the world will truly be transformed. We need to invest a significant component of our resources into developing innovations that will facilitate this transformation. It is my belief that the digital library provides the appropriate infrastructure to develop and leverage these innovations, to make collective investments across universities to do self-publishing, digital publishing, and to directly compete with the existing model of scholarly communication.
There are a number of activities to pursue in developing alternative forms of scholarly communication. One is to develop prototype projects with faculty, based on their needs for better ways to disseminate and access information. The most important innovations will come from the scholarly community itself, not from administrators or librarians. A second is policy development. In this area, it is important to examine current copyright policies and behaviors such as the assignment of rights to publishers. Third, the development of new forms of scholarly communication is best pursued in concert with colleagues on national and international levels.
There are a number of potential scenarios for alternative forms of scholarly communication that have been put forward in the last few years. One is called NEAR, the National Electronic Article Repository. What a university provost has proposed is that authors would retain certain rights when they publish a journal article. The rights would permit, within 90 days of the appearance of the paper publication, placing the article on a national server, which is a depository for scholarly articles. Everybody would then have free, perpetual access to the article.
A second scenario, put forward by the Association of American University Presidents, recommends decoupling certification from publication. This is based on the belief that it is the coupling of the promotion and tenure process with publication that is the cause of current financial problems in the publication of scholarly materials. In this scenario, universities would pay professional societies to review the work of their faculty and to certify the work for promotion and tenure purposes. The universities would then place the work on servers so that it would be available to the external community at no or low cost. Only a small portion of this material would then make its way through the regular publication process. So, instead of publications continually increasing, actual paper publications (expensive publications) would decrease, to be replaced by electronic publications on readily accessed servers.
A third scenario calls for universities to assume responsibility for publishing the work of their faculty. A fourth identifies the notion of peer-reviewed servers as the viable alternative. What we have seen in the physics community at Los Alamos, for example, is the establishment of a physics preprint server where not only the physics community, but also the mathematics community as well as others place articles prior to publication. Adding peer review to that preprint process would ensure technical merit, and it would not be necessary to go through the entire publication process.
What is absolutely key to further progress in identifying and implementing sustainable alternatives for scholarly communication is the development of business models for all scenarios. While some of these ideas may sound desirable, their financial feasibility must be demonstrated. Replacement models are not necessarily less expensive, and appropriate due diligence must be rigorous.
A 1998 essay, "To Publish and Perish,"1 recommends five actions as we move toward new alternatives:
- “Turn down the volume"; i.e., we need to concentrate more on the quality of publication rather than quantity in the promotion and tenure process.
- Librarians must be smarter shoppers, just as we are trying to be through consortia and licensing.
- Get a handle on copyright and property rights issues.
- Universities must invest in electronic forms of scholarly communication and should support new efforts that faculty are putting forward in this area.
- The decoupling of publication and faculty evaluation should be seriously investigated.
National efforts are under way, which should lead to some breakthroughs in the coming years. One is SPARC, the Scholarly Publishing and Academic Resources Coalition, originally developed by the Association of Research Libraries. It currently consists of more than 100 research libraries from around the country that have joined together for the following purposes:
- To create a more competitive marketplace,
- To reduce journal prices,
- To ensure fair use of electronic materials, and
- To apply new technologies to information creation and storage.
- Solicits high-quality, fairly priced publications and guarantees a subscription base;
- Provides start-up capital for new projects; and
- Generates support from important groups like our own faculty and administrators and provosts.
The American Chemical Society and the Royal Society of Chemistry are initial SPARC partners.
The challenges that we face in trying to make our way through this evolutionary change are very significant. They range from the political to the technical and financial. As we move toward the realization of digital libraries, basic research is absolutely critical in this area. There are no models; we are in uncharted territory, and we need the research community to inform the way. It is critical that faculty participate in this research. The solutions must reflect your "way of doing business."
Stephen Heller, National Institute of Standards and Technology: I have a couple of questions. First, are you working at all with Highwire Press, and do you have any comments about what they have been doing?
Richard Lucier: Highwire Press is a very interesting operation. It has taken over the production for a number of societies that would not have had the money individually to invest in digital technology, and so has allowed those societies to remain competitive as we move into this digital environment. Highwire Press has done great work. We work with them in the sense that we pay for all of those publications. We have had discussions about more substantive cooperation, but nothing yet has come out between Stanford and UC on that.
Stephen Heller: The second question is, Should libraries be responsible for the actual archiving when a reasonable solution is found? Right now it is a sort of random process in which the publishers have decided to go into the new business of archiving, which is providing information on a long-term basis, not just selling a subscription and washing their hands of any responsibility to provide anything after the subscription expires.
Richard Lucier: I think it depends on the publisher. We need to have someone of repute take responsibility for archiving the world's knowledge base. I am very leery about saying that commercial entities should take that responsibility.
Commercial entities will take that responsibility only as long as it is profitable for them, and as information gets older it may no longer be profitable.
I don't know if it should be libraries. I don't know if it should be universities. I don't know if it should be the federal government or some other organization. I think there has to be a national strategy initially, and there will have to be an international strategy. No one—not the library, the community, the academic disciplines, societies, or publishers—has yet really tackled that problem very well.
A couple of years ago a commission came out with a report that rather scared everyone, and so no one has touched the issue in the last couple of years, and I hope the dialogue can continue again soon. But there are no good answers, and I am not sure who ought to do it, but I guess I would trust universities before I would trust commercial organizations.
I believe, and Lorrin might correct me, that even our license with the American Chemical Society only guarantees access for 5 years, and I am assuming that some of that literature is still important to you after it is 5 years old.
Stanley Sandler, University of Delaware: I guess I am overwhelmed with the amount of information that is available and being generated, particularly the number of journal pages and such. So, maybe this is more an appeal to my colleagues. I think there is an increasing difference between, let us say, the CPU and the LPU. We know that the CPU is increasing speed and power and likewise the rate at which we can generate experimental data. The LPU, the least publishable unit, I would say hasn't changed anywhere nearly as fast as the rate at which we generate simulation and experimental data. So consequently a paper today may contain about the same information as a paper years ago. Years ago it may have required years of intellectual effort. Maybe 2 years ago it required a month of intellectual effort. Maybe this year it requires a week of intellectual effort, and I submit that we as reviewers are not doing a careful enough job of keeping the LPU together with the CPU, and that is why we are overwhelmed with so much published literature.
Richard Lucier: I think that one of my frustrations in this uphill battle is that I cannot solve these problems. I am willing to support you in any way that you want, but it is the disciplines and the academic communities that have to come to some solution. That last point was one of the reasons that we are very concerned about making sure that we are providing only high-quality information within the digital
library, not just everything that is out there, and it is, also, when different groups like the American Association of Universities look at this decoupling process. It is a recognition that everything that is going into print probably ought not to go into print and that somewhere along the line we have to make some qualitative judgments. You have to make some qualitative judgments. I am not trying to impose anything.
Allen Bard, University of Texas: It seems to me that this idea of decoupling publication and certification is a game, because as everybody will know if you decouple it, you say, "Yes, we certify this as great work, but it is not worth publishing; we certify that other work, but it is worth publishing." Everybody will know that game.
Richard Lucier: I think there are other ways to look at it. One can say that we certify this work, but we don't need to publish it in its complete form in the way that we did in the past, and we might only publish in a formal publication a certain excerpt but maintain on file servers that would be a lot cheaper to do over time, the ability to be able to get access to that information.
I think what decoupling does is allow us to look at the publication process differently so that we can find a cost-effective way, and one that exploits technology in a way that makes the data more useful to you as well.
Tom Edgar, University of Texas: What is your business model as you are constructing it at the University of California? If you look down the road, say 10 to 15 years, do you see any changes in human resources needs for the collective libraries of the University of California system?
Richard Lucier: How many years into the future?
Tom Edgar: Ten to 20, let us say.
Richard Lucier: I cannot look 10 to 20. The most I can look is 5, and yes, we do see changes.
Tom Edgar: I guess the gradient is what I am interested in; is it positive or negative in terms of the number of people it is going to take to provide the California Digital Library services compared to the number you have today?
Richard Lucier: I think that what we have projected is that it is going to take probably an equal number of resources, but ones more focused on providing quality access to information than they currently are. What I can tell you with respect to saving money is that in the first year we can document that we have saved the campuses, in licensing costs alone, about $2.5 million for access to information. If they had gone about it separately and bought this information themselves, it would have cost them that much more.
The other thing that we are able to do is to provide access independent of location. It doesn't matter any longer if you are a chemist at Berkeley or if you are a chemist at Santa Cruz; you can get access to the same kind of information. We feel that it is really important for our faculty and students to be able to have that kind of access irrespective of their particular physical location.
Tom Edgar: The second question is one I will ask and then head for cover. You said that the physicists and mathematicians have agreed to go toward putting publications on Web servers. My impression is that the chemists have really not agreed to do that in the same way. I am curious. What are the differences between chemists and the other group that make chemists behave differently?
Richard Lucier: I think you could answer that question better than I, and I would be really interested in the answer.
Steven Heller: It is a cultural thing. Actually the story with the Los Alamos pre-print server is that they had been doing pre-print exchanges for decades before computers, and when computers came they just put the pre-prints on the computer.
Richard Lucier: Is that true with the mathematics community as well?
Steven Heller: Yes.
Richard Lucier: There is a new biological sciences server that you may or may not know about, a preprint server that has begun as well. Having spent most of my career in the biomedical sciences, I was very surprised about that because the exchange of pre-prints has not been traditional in that field, but they see what is happening in physics and mathematics and have moved to that.
Evelyn Goldfield, Wayne State University: First I would like to say something about the pre-print servers. One of the problems that chemists feel, at least the ones I have talked to—and I think this is a problem that you are going to see—is the question of peer review or multiple versions or error corrections, because from what I understand, things can go on to pre-print servers without any review at all. As a physicist friend of mine explained, "Oh, we will just correct it as we go along," which is fine if you are in that community and you know. But a student could easily be getting incorrect information, and I think there is a resistance on the part of a lot of people to risk that.
Steven Heller: That is not true. There is a link between the versions.
Evelyn Goldfield: I believe that many chemists are wary about non-journal Web-based publishing on account of quality control issues, and how it will impact the review process. They are worried about having a lot of non-refereed papers and multiple versions of papers out there.
My question is that if libraries can no longer afford to purchase commercial academic publications or books, then what do you think the future holds? Are academic commercial publishers going to remain viable? What is the future of paper and books? Do you think there is any future at all and if so, what is it? How do you see that?
Richard Lucier: As I mentioned, I don't see electronic versions replacing paper wholesale at this time. I think it is going to be a long evolution, that there are problems such as archiving that have to be solved before one can replace the other.
We are going through a period now, I think, of trying to understand how our faculty and the research community will use the electronic versions, what they prefer about them.
What we are seeing with things like Highwire Press, for example, is that the print version and the electronic version are getting further and further apart, and the electronic technology is being exploited to provide products that are much more beneficial to you than the paper might have been.
So, there is an evolution going on. I hope at UC that we will be able to cancel some print publications in the year 2000. Right now we have as many as, if not more than, nine copies of a particular journal, one at each of our campuses. We could potentially in 2000, if we provide good electronic access
to some of these titles, cancel all the paper except for two and save one in the north and one in the south for archiving purposes.
Evelyn Goldfield: That will cost the publishers money.
Richard Lucier: Right, it will cost the publishers money. That is correct.
Robert de Levie, Georgetown University: You have talked about journals. How about books?
Richard Lucier: It depends on what kind of books you are talking about. I think that digital technology can be very useful for reference books and reference databases. If you are talking about certain kinds of scholarly treatises in the humanities, I don't think we are going to see widespread replacement there at all in the immediate future. I think the digital technologies are going to take much greater hold in the sciences early on. The humanities, and less so the social sciences, are probably 5 to 10 years behind.
Robert de Levie: Even though those books nowadays are produced mostly in digital form?
Richard Lucier: Yes.
Robert de Levie: You mentioned $2.5 million gained. Is that because you reduced the number of subscriptions from nine to one, and what is the offsetting cost of not knowing whether 5 years from now you will have to buy the paper copies anyway?
Richard Lucier: We won't because we won't have gotten rid of all of the paper. We are making certain that we maintain in storage facilities—we have a storage facility in the north and the south—paper copies should we need to do that.
The $2 million plus was gained by expanded access. So, for example, you might have had 30 ACS subscriptions at Berkeley and at LA but only 5 at Riverside and 8 at Santa Cruz, and now everybody at all nine campuses is getting access to all as well as the fact that the access for, let us say, Berkeley alone, which may have subscribed to all of them, costs less because we went as part of a consortium. So, there are savings in those two areas.
Gintaris Reklaitis, Purdue University: One of the most important and underappreciated resources in the entire publications review process is the reviewers. Clearly as the publications process continues to expand, the demands on the reviewers will also. Do any of the business models that you are examining for scientific publication take into account this important resource and how we might stimulate it to handle this expansion?
Richard Lucier: The model where the university moves into publishing very much takes advantage of that resource, which is part of the university already. Essentially what we do now for the most part is give it away the commercial publishers at no cost so that they can then add a huge mark up to it when we buy back that information that has been peer reviewed by our faculty, and so it makes perfect sense for the university or federations of universities to do that together.
My problem with the Highwire model is that it is one university, and science and scholarship cut across universities too much, and it makes much more sense in my opinion to try to federate this in some way across groups of universities rather than try to go solo, and that is why UC isn't pursuing that particular strategy.