1
DIGITAL REVOLUTION, LIBRARY EVOLUTION

INTRODUCTION

Every reader of this report knows that the world of information is changing rapidly. Some readers might choose to emphasize the continuities between old and new and others the discontinuities, but all will agree that the change is rapid, powerful, and important. This chapter seeks to present a snapshot of the ferment, albeit with some inevitable blurring. The Library of Congress (LC) shares in the fate of libraries generally but has the size, distinction, and resources needed to help shape that fate. Accordingly, the evocation of context that follows suggests not only the forces affecting LC but also the roles LC might choose to play.

It is no easy thing in these times to conceptualize—or to manage—a library, large or small. Libraries are creatures of their societies’ intellectual accomplishments. For centuries, at least as early as the founding of the library at Alexandria (circa 280 BC), readers in pursuit of knowledge have expected libraries to gather and make available to them a broad range of texts and artifacts. In every age, these creations have been conveyed by the media of the time: handmade tablets, leaves, scrolls, codices, and then, with the advent of printing, books and journals. The technology of printing enabled rapid production and copying and thereby wide dissemination of information and learning and led to the large-scale institutionalization and popularization of libraries. In the nineteenth and twentieth centuries, libraries—particularly public and academic libraries—achieved a fundamental, almost revered place in society. Libraries



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress 1 DIGITAL REVOLUTION, LIBRARY EVOLUTION INTRODUCTION Every reader of this report knows that the world of information is changing rapidly. Some readers might choose to emphasize the continuities between old and new and others the discontinuities, but all will agree that the change is rapid, powerful, and important. This chapter seeks to present a snapshot of the ferment, albeit with some inevitable blurring. The Library of Congress (LC) shares in the fate of libraries generally but has the size, distinction, and resources needed to help shape that fate. Accordingly, the evocation of context that follows suggests not only the forces affecting LC but also the roles LC might choose to play. It is no easy thing in these times to conceptualize—or to manage—a library, large or small. Libraries are creatures of their societies’ intellectual accomplishments. For centuries, at least as early as the founding of the library at Alexandria (circa 280 BC), readers in pursuit of knowledge have expected libraries to gather and make available to them a broad range of texts and artifacts. In every age, these creations have been conveyed by the media of the time: handmade tablets, leaves, scrolls, codices, and then, with the advent of printing, books and journals. The technology of printing enabled rapid production and copying and thereby wide dissemination of information and learning and led to the large-scale institutionalization and popularization of libraries. In the nineteenth and twentieth centuries, libraries—particularly public and academic libraries—achieved a fundamental, almost revered place in society. Libraries

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress came to be identified not only with the collections they assiduously gathered but also with the wide and free access they gave readers pursuing personal, social, and public-minded goals. Libraries became the places where citizens of modest means could access books and other materials they could never afford to purchase. Libraries became, as well, places of learning and interaction for their readers. The special status of libraries is attested to by the fact that their communities support them with funding and other resources. As well, the copyright law of the United States recognized the role of libraries in serving and educating the public.1 By the 1990s, the development of computers and the networks and protocols that link them together (particularly the World Wide Web protocol developed by Tim Berners-Lee) and then the browsers that were needed to navigate this interconnected world had led to new ways of capturing intellectual creativity and distributing it widely and almost instantaneously to those who sought it. This revolution challenges the traditional role of libraries in contemporary and future society. The computer’s transformation of our world and the upheavals it fosters rival or exceed those that grew out of the development of the printing press in the fifteenth century. It is worth taking a few moments to enumerate some of the recent changes that affect libraries and their service to users, even if we merely skim the peaks and omit much.2 This chapter describes some aspects of the rapidly changing, new, untested, and exciting environment for libraries today. The leaders and managers of the libraries of the present must understand and master this environment if they are to find ways to continue to fulfill the purpose of libraries past and to carry out their mission. CONTEXT The Need for Cooperation Among Libraries Libraries have always made information available. From the earliest days, they housed and facilitated access to information, through the selection, aggregation, organization, service, and ongoing care of their materi- 1   In particular, Section 109 of the copyright law (contained in Title 17 of the United States Code), under the so-called first-sale doctrine, permits libraries to lend materials, even outside their premises, and Section 108 exempts certain reproductions and distributions of copyrighted works conducted by libraries under specific conditions. 2   For recent discussions of the larger cultural issues, see Future Libraries, Howard R. Bloch and Carla Hesse, eds., (Berkeley, Calif.: University of California Press, 1995) or a special issue of the journal Representations, Spring 1993; also, see Avatars of the Word: From Papyrus to Cyberspace, by James J. O’Donnell (Cambridge, Mass.: Harvard University Press, 1998).

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress als, as a shared good. In the last half-century, forces such as the explosion of publishing, the rapid expansion of education at all levels, globalization, and ever-growing funding for many kinds of research have made us realize that no single library—not even the Library of Congress—can today collect and deliver comprehensively, if ever it could, the world’s most important literature and information sources. For decades now, libraries have bonded into groups that work together to share information requested by their readers. Much, though not all, of this effort has concentrated on developing standards by which libraries describe and transfer information, such as the machine-readable cataloging (MARC) record and the variety of emerging approaches for organizing information that are discussed in Chapter 5. Other strategies have focused on cooperative collection development programs within a group of libraries (for example, the creation in the 1930s of the Triangle research libraries’ cooperative collection development program, which continues to be robust to this day3); the creation of highly cost-effective, computer-based, catalog-record-sharing cooperatives (the largest of them all, the Online Computer Library Center (OCLC), was set up in the late 1960s4); the conscious move toward having the catalogs of large research institutions serve from their readers’ point of view as one catalog (the initiatives of the Committee on Institutional Cooperation (CIC) are notable in this respect, presenting to their readers in 12 institutions online catalogs and supporting services that aim to behave transparently as one collection5); the creation of statewide consortia to facilitate user-initiated interlibrary loans for traditional materials (Ohiolink, with more than 70 consortial library members, records more than half a million transactions every year at a cost of well below $10 each6); and the recent accelerated move of libraries to organize them- 3   “Cooperative Collection Development at the Research Triangle University Libraries: A Model for the Nation,” by Patricia Buck Dominguez and Luke Swindler, in College & Research Libraries, November 1993, pp. 470-496. 4   See Chapters 2 and 5 for a discussion of OCLC. 5   For a description of the CIC, visit its Web site at <http://www.cic.uiuc.edu>. For a narrative about the goals and visions of the CIC group of libraries, see “Consortial Leadership: Essential Elements for Success,” a presentation by Barbara McFadden Allen for the Association of Research Libraries at its May 1997 membership meeting, available online at <http://www.arl.org/arl/proceedings/130/allen.html>. For more information, follow the links and footnotes in this presentation. 6   Some articles that describe the Ohiolink interlibrary loan concept include “How the Virtual Library Transforms Interlibrary Loans—the OhioLINK Experience,” by David F. Kohl, in Interlending & Document Supply, Vol. 26, No. 2, 1998, pp. 65-69, and “Resource Sharing in a Changing Ohio Environment,” by David F. Kohl, in Library Trends, Winter 1997, pp. 435-437. Ohiolink’s executive director, Tom Sanville, writes that its interlibrary loan (ILL) system is currently filling about 500,000 requests per year and is still growing (private e-mail correspondence, dated January 23, 2000).

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress selves into consortia that license electronic works from publishers and vendors, securing broad-based site licenses, often at advantageous costs, for dozens or hundreds of libraries of all types per license.7 The Rapid Rise of Information Resources in Electronic Formats The past brought recognizable information formats such as books, journals, film, and other fixed media. Many electronic information resources, however, are brand-new creations that resemble traditional media less and less. They include the millions of Web pages and databases produced all over the world by individuals, companies, institutions, and government agencies. These creations represent a variety of information types, including descriptive materials, corporate reports, datasets, educational offerings, theses and dissertations (some universities now require dissertations to be submitted in electronic form8), and many more. But, along with these recognizable publication types are many new kinds of publications beyond anything anyone has ever imagined, in terms of size, scale, complexity, and function. These resources spring out of the energy and creativity of society’s best minds in a ferment of excitement, and it is reasonable to expect that their variety and novelty will only increase. The committee heard about and reviewed numerous examples of current projects available in digital form that challenge traditional modes of publication, to say nothing of librarianship and collection practices. For example, Corbis offers a huge collection of digital images derived from traditional photo archives, and the Survivors of the Shoah Visual History Foundation collects thousands of videotaped memoirs from Holocaust survivors and integrates them with related materials. Both projects (see Box 1.1) are marked by careful attempts on the part of the creating organizations to limit and focus distribution, in the one case to paying customers, in the other to institutions that will use the resource in a responsible and productive way. Microsoft’s Terraserver project,9 a collaboration with the U.S. Geological Survey, is a collection of satellite photos of many parts of the United States and selected parts of the rest of the world. The project, which is freely available and offers high-quality 7   For a description of library consortia and their activities, particularly in the licensing arena, see the home page of the International Coalition of Library Consortia, available online at <http://www.library.yale.edu/consortia>. 8   For descriptions of activity in the area of electronic dissertations, see <http://www.ndltd.org> and <http://www.theses.org>. 9   The Microsoft Terraserver site with its satellite and aerial images provided by the U.S. Geological Survey is at <http://www.terraserver.microsoft.com>.

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress information, is an experiment in managing a very large database for public access. It is impossible to describe any of these projects in traditional terms that would allow comparing their collections to items now collected by libraries, nor is it easy to imagine how a given library would collect such items, although they are fully analogous to, and might well substitute for, library holdings (i.e., as in map rooms or special collections). Even resources that seem to be more traditional, such as the research projects fostered by the Institute for Advanced Technology in the Humanities (IATH) at the University of Virginia,10 challenge traditional expectations by virtue of their constant state of flux. The institute’s Web site contains no single stable artifact to “collect”—it is very much a work in progress. Arts and Letters Daily (A&LD) is a Web site that goes one step further. In appearance, it is a daily “newspaper” whose content is cultural. But all that the editors provide is a front page with short descriptions and links to content found on hundreds of other freely available sites, many of them well-known resources in their own right. The repackaging accomplished by linking articles intelligently to a single site makes A&LD a hugely popular and highly useful site.11 Is it possible to “collect” such a site if one can collect only the daily “front page”? If one can collect all the linked sites, can one reliably collect the work with all its links intact and stable? Long-established standard newspapers are avoiding links to material from other sources (for legal reasons, possibly); many of the newer newspapers and magazines, however, are filled with links to other sites. Preserving the news nowadays requires collecting not just the specific news source, but also the content of links to other Web spaces.12 10   IATH’s work is described on its Web site at <http://village.virginia.edu/>. IATH’s goal is to explore and expand the potential of information technology as a tool for humanities research. To that end, it appoints fellows and provides them with consulting, technical support, applications programming, and networked publishing facilities. IATH also cultivates partnerships and participates in humanities computing initiatives with libraries, publishers, information technology companies, scholarly organizations, and others interested in the intersection of computers and cultural heritage. 11   This synthesis of news (now a part of Jeffrey Kittay’s Lingua Franca family of publications) from the fields of philosophy, aesthetics, literature, language, ideas, criticism, culture, history, music, art, trends, breakthroughs, disputes, and gossip is found at <http://www.aldaily.com/>. 12   For example, <http://www.mediainfo.com> includes at the bottom of its daily headline page a set of links to stories on other sites. And <http://www.wired.com> provides that kind of link as well as numerous links within each of its own news stories to many other reports.

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress BOX 1.1 Digital Information, Networks, and the Creation of New Media The growing importance of documents represented in nontraditional media is challenging the traditional notion of a library collection. How can a library collect, provide access to, and preserve materials that do not have an identifiable tangible form or whose representation depends on technology with a short useful life? Here are two examples of challenging nontraditional collections: the image collections held by Corbis, at <http://www.corbis.com>, and the multimedia collections developed by the Survivors of the Shoah Visual History Foundation, at <http://www.shoahfoundation.org>. Corbis has assembled a collection of 65 million photographs and fine art images, 2.1 million of which have been digitized (40 terabytes) and are available online. The Corbis collection includes a wide variety of materials, ranging from news photo libraries (e.g., UPI) to stock photo and museum collections. The images in the Corbis collections are made available through professional service centers and directly to consumers over the Internet. The Corbis collection faces two unique challenges. The first is its size. Most of the 65 million images are cataloged with varying levels of detail. The 2.1 million digitized images make up the largest commercial digital image collection in the world. The second challenge is the rapidly changing technology base. Images in the digitized collection have been captured with different scanning technologies, with such technology having improved markedly since Corbis began building the collections in 1989. At the same time, images are output using a wide variety of display devices ranging from relatively low-quality Web browsers to high-quality film for commercial uses. Thus, a variety of image representations need to be mapped to a wide range of supported output devices. The Survivors of the Shoah Foundation was formed in 1994 to record firsthand accounts of the Holocaust. The foundation has conducted over 50,000 interviews; each interview consists of an unedited video transcript together with written background and descriptive information. The collection consists of more than 100,000 hours of video and a comprehensive catalog that captures background information and facilitates access to the collection. Interviews are conducted in the language with which the interview subject is most comfortable (32 languages to date) and The nation is investing significant sums to advance research in scientific areas, investments that will bear fruit in many directions now little surmised. Both the Los Alamos National Laboratory’s e-print archive13 13   The LANL e-Print archive, now known as arXiv.org, was established in 1991. The service blossomed and expanded quickly (by February 1992) much beyond its original focus on high-energy physics. As of January 2000, arXiv.org contained 122,400 submissions. (See <http://arXiv.org/cgi-bin/show_monthly_submissions>.) Sixteen mirror sites operate around the world. Paul Ginsparg, the founder of the service, notes that he could still fit the current hardware under the hatch of his Honda (with plenty of room left over) and take it anywhere.

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress may incorporate photographs or other image information, which makes the collection not only large but also multilingual and multimedia. The Shoah collection is to be used as an historical and sociological research tool and as a source of educational materials to promote tolerance and cultural understanding. Developing the infrastructure necessary to provide access to the collection has proved to be a challenge. The high-quality digitized video requires large (by today’s standards) amounts of storage (180 terabytes) and a high rate of data transmission (3 megabits/sec/client); the design parameters look forward to a time soon when suitable storage (terabyte disk drives) and network technology (OC-3, 155 megabits/sec) will be commonplace and affordable. Shoah is particularly interesting because it is an open question whether the archive has been “published.” Shoah chooses to make its resources available on a contractual basis in settings where it is assured of reasonable security and where the materials it provides will be handled in a way compatible with the goals of the foundation. These restrictions are understandable, but how ought the Library of Congress respond? Is this a publication to be demanded on copyright depository statute conditions?1 Or is there to be a negotiation, assuring the foundation that LC is a responsible partner? There are no easy answers to these questions. The Corbis and Shoah collections both contain important items of interest to researchers and other library patrons. Developing the organization, policy, and technology to provide materials of a similar nature to users is a challenge not only for LC but for other libraries as well. In fact, projects like these can serve as models for libraries to emulate as they begin to collect digital materials. SOURCES: Corbis: “Corbis Images Build Market Momentum,” M2 Presswire, October 15, 1998, and “Corbis Opens Its Art Collection,” by Cameron Crouch, in PC World Online, November 17, 1999, both available at <http://www.corbis.com>. Shoah: “Multimedia Pedagogy for the New Millennium,” by Rhonda Hammer and Douglas Kellner, in Journal of Adolescent & Adult Literacy, April 1999, available online at <http://www.shoah.org>. 1   These conditions are discussed in Chapter 2. and the Human Genome Database14 are collective works that bring together new global communities based on special interests and have extraordinary power to advance and build scientific research. It is entirely reasonable to imagine that, particularly in scientific and technological 14   The National Human Genome Database’s home page is located at <http://www.nhgri.nih.gov/Policy_and_public_affairs/Communications/Publications/Maps_to_medicine/>. There one can read about the project and access information and data related to genome mapping from around the world.

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress fields, such initiatives offer only an inkling of what is to come.15 As the committee was deliberating, initiatives such as PubMed Central (sponsored by the National Institutes of Health)16 and the Department of Energy initiative PubSCIENCE17 were set to be scaled up in the very near future. Other resources are the electronic equivalents of printed information. The electronic mode makes it possible to deliver the information wherever the reader may be (for instance, to his or her computer in the home or workplace or, by wireless technology, to any place), to present information that cannot be captured in print (such as video attachments, tables that can be manipulated, and so on), and to facilitate use of the information through quality interfaces and search capabilities. Some of the speed with which this transformation is happening in the world publishing industry can be captured by briefly considering two well-established publishing formats—journals and books. Online Electronic Journals There is no truly reliable or single source of information on the growth of electronic journals. A careful study of the World Wide Web archives of the e-journal and magazine electronic announcement list NewJour18 reveals that in 1989, fewer than 10 e-journals were available. They were created in basic ASCII (.txt) form and had to be distributed in small 15   For example, the development of standards and technology for federating these e-print archives, such as that proposed by the Santa Fe Convention of the Open Archives Initiative, <http://www.openarchives.org>, offers the possibility of creating a global digital library of scholarly e-prints. Further information may be found in “The Santa Fe Convention of the Open Archives Initiative,” by Herbert Van de Sompel and Carl Lagoze, in D-Lib Magazine, February 2000, available online at <http://www.dlib.org/dlib/february00/vandesompel-oai/02vandesompel-oai.html>. 16   PubMedCentral is a creation of the National Institutes of Health. Its contents are intended to include both formal and informal articles in the fields of biomedical sciences for free access. It is located at <www.pubmedcentral.nih.gov>. 17   The Department of Energy’s (DOE’s) Office of Science and the Government Printing Office (GPO) announced the development and public availability of PubSCIENCE as of October 1, 1999. GPO is sponsoring the free access through its GPO Access Web site. PubSCIENCE, developed by DOE’s Office of Scientific and Technical Information, focuses on the physical sciences and other energy-related disciplines. It was modeled after the highly publicized PubMedCentral and is located at <http://pubsci.osti.gov/>. 18   For the complete NewJour archive back to the start date of August 1993, see <http://gort.ucsd.edu/newjour>. Information about the NewJour announcement list, managed by James J. O’Donnell and Ann Okerson, with archives managed by James Jacobs at the University of California, San Diego Library, can also be found at this URL.

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress chunks, lest they crash the mailboxes of subscribers. Today, many large publishers around the world, both for-profit and not-for-profit, maintain Web sites that make available their full collections of print journals (with only limited back-file runs, so far) to subscribers. Given this penetration of new technologies into scholarly, scientific, and popular journal and magazine publication, a list such as NewJour will one day soon no longer be needed. In this decade (or even half-decade), most print magazines and journals will have a Web version, if they do not already have one (see Box 1.2). Furthermore, although for the moment it is convenient to think of print and Web versions as providing the same or identical information, the two styles are already beginning to pull apart and will only diverge BOX 1.2 The Rapid Growth of Online Electronic Journals Electronic journal publication is relatively new. There were no electronic journals 20 years ago and fewer than 10 in 1989. It is difficult to get an accurate count of journals currently published in electronic form. Two types of electronic journals can be distinguished: electronic versions of journals that are also published in paper form and electronic journals that are available only in electronic form. NewJour (<www.gort.ucsd.edu/newjour>), a notification service that tracks electronic journals, listed 8,404 titles in February 2000 (up from 3,634 in mid-1997 and 6,900 in December 1998). Electronic versions of traditional journals are by far the most common, as many traditional publishers make their materials available in electronic form. Elsevier Science, at <http://www.elsevier.com>, for example, offers electronic versions of over 1,000 of its journals. In fact, Elsevier’s full text was available to selected institutions by the mid-1990s through the University Licensing Program (TULIP), just as Springer’s was available through Red Sage.1 Many professional associations (e.g., the Association for Computing Machinery, at <http:/www.acm.org>, and the American Mathematical Society, at <http://www.ams.org>) also offer electronic versions of their professional journals. Efforts to digitize important retrospective journal collections have also been undertaken. JSTOR, at <http://www.jstor.org>, offers digitized versions of back files of 117 scholarly journals. JSTOR plans to expand its coverage and to add titles to its collections. Scholarly journals published only in electronic form represent a small but growing body of material (Journal of Artificial Intelligence Research, at <http://www.cs.washington.edu/research/jair>, is an early example). A wide variety of electronic-only publications are associated with online discussion groups, newsletters, and other special interest groups. 1   Information about TULIP may be found online at <http://www.elsevier.nl:80/homepage/about/resproj/tulip.shtml>; information about Red Sage may be found online at <http://www.ckm.ucsf.edu/projects/RedSage/>.

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress further. Not only will the same name ultimately denote collections of content that are in fact very different,19 but some of the e-journals also will evolve into new genres. The prevailing vision is that this wealth of journal literature will be linked through indexing services and search engines and cross-linked internally online;20 indeed, this vision is rapidly being realized. For as long as print versions continue to be published, tracking and collecting will be much more difficult. The most significant problem for ubiquitous electronic access is that of long-term archivability and preservation, an issue that begs to be solved if publishers and libraries are not to maintain costly parallel print and electronic systems. So far, very few electronic journals (or any other resource) have had to survive on the Internet for even one decade. While some experts say that long-term sustainability is a trivial matter, studies suggest high costs and tremendous uncertainties.21 Fundamental to these uncertainties is the matter of ownership, which libraries rarely have, given that electronic information produces no fixed artifact for libraries to possess and cherish (see the section “Digital Materials, Ownership Rights, and Libraries” below). While rapid growth characterizes scholarly and research journals, the committee found similar or even faster growth in the even broader universe of all continuing publications that includes popular magazines and newspapers as well as annual reports, directories, series, and so on. Books Finally Go Electronic Standard periodical indexing and abstracting services (ranging from popular sources such as Public Affairs Information Service to highly research-oriented sources such as Chemical Abstracts) began to become available electronically in the 1970s through specialized vendors such as Dialog and BRS, whose proprietary systems required the mediation of expert searchers; by the mid-1990s, they were available through easy-to-use Web interfaces for any licensed subscriber. Thus it is no surprise that the 19   Gerry McKiernan, of the Iowa State University Library, maintains a Web site of multimedia journals at <http://www.public.iastate.uedu/~CYBERSTACKS/M-Bed.htm>. See also, “Embedded Multimedia in Electronic Journals,” by Gerry McKiernan, in Multimedia Information and Technology, Vol. 24, No. 4, 1999, pp. 338-343. 20   The multipublisher initiative called CrossRef was announced in the fall of 1999 (<http://www.crossref.org>). Participation from numerous scholarly journal publishers is expected. A press release titled “Reference Linking Service to Aid Scientists Conducting Online Research,” from Susan Spilka at John Wiley Publishers, New York, was posted to the liblicense-list (liblicense-l@lists.yale.edu: archive at <http://www.library.yale.edu/~llicense/ListArchives/>) on November 16, 1999. 21   See Chapter 4 for an extended discussion on digital preservation.

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress journal articles (which are relatively short and therefore easily distributed and used through present electronic technology) cited by these sources would become quickly available online (see the section “Online Electronic Journals,” above). However, books—such as novels and scholarly monographs, for example—have seemed far less susceptible to electronic transformation. While some books are consulted in bits and bytes (for particular facts or small sections), many (the argument goes) need to be deliciously savored and contemplated from beginning to end, and an online screen is hostile to such prolonged congenial or intense reading. “You can’t take it to bed or to the beach or onto the plane with you,” is the oft-heard lament. Accordingly, readers, publishers, and vendors have posed some chicken-and-egg questions: if we build a better e-book reading device or interface, will the critical mass of e-books—and attendant readers—materialize? Or do we first need a critical mass of e-books to bring the improved e-book devices and readers en masse? Maybe these questions are moot, for it seems, as one publisher said of his company’s participation in the Barnes & Noble/Microsoft e-book alliance, that we might be able to skip the chicken and egg and go directly to the omelet.22 The vision now being articulated by many players in the book publishing industry and its partners (in printing, distribution, and software) is that the full text of all published books, at least from mainstream publishers, will exist on vast electronic information servers, there to be channeled to the output of the reader’s choice: traditional print formats or digital formats (by accessing a local copy on a PC or portable device or by viewing a remote copy through the Web). That is, the authoritative source file of many books may soon be an electronic version that can generate various derivative versions. The e-book may be on the verge of acceptance and success23 because of the convergence of large computer servers, big network pipelines, rapid progress in developing e-book standards, and the increasing sophistication and utility of handheld book-reading devices, as well as business 22   Steven M. Zeitchik in Publishersweekly.com, January 10, 2000, available online at <http://www.publishersweekly.com/articles/20000110_83924.asp>, quoting Dick Brass of Microsoft in “Microsoft, Bn.com in E-book Deal.” The “omelet” is the widespread use of e-books. 23   An instance that suggests acceptance is the release of Stephen King’s short story “Riding the Bullet,” which was made available for sale in downloadable electronic format on the Web only. Barnes & Noble officials reported that the story had the biggest opening day for any book on its Web site, regardless of the format of the book. See “Stephen King Rewrites E-book Biz” by Margaret Kane, in ZDnet News, March 16, 2000, available at <http://www.zdnet.com/zdnn/stories/news/0,4586,2469310,00.html>.

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress mation technology. Its comprehensive database will cover topics ranging from programming languages to networking and electronic commerce. The service offers one-step searches that can provide quick answers to complex technical questions as well as cut, paste, and download capabilities. Books24x7, at <www.books24x7.com>, is creating a library focused on marketing, finance, management, and human resources and geared to corporate professionals. The service also offers summaries and synopses, prepared by its editorial board, of the books it selects. Books24x7 has established relationships with technical publishers and has aligned with a major online retailer to fulfill orders for hard-copy books. SOURCES: “An Ambitious Plan to Sell Electronic Books,” by Vincent Kiernan, in Chronicle of Higher Education, April 16, 1999; “NetLibrary Targets an Early Market for e-Books,” by Lisa Bransten, in Wall Street Journal, November 4, 1999, p. B12; “netLibrary.com,” by Christine K. Oka, in Library Journal, May 1, 2000, available online at <http://www.libraryjournal.com/articles/multimedia/databasedisc/databaseanddiscindex.asp>; “Houston Startup Targets Undergrads,” by Steven M. Zeitchik, in Publishers Weekly, April 17, 2000, available online at <http://www.publishersweekly.com/articles/20000417_85721.asp>; “Ebrary.com Offers Web as Serious Research Tool,” by Paul Hilts, in Publishers Weekly, March 27, 2000, available online at <http://www.publishersweekly.com/articles/20000327_85475.asp>; “Ebrary Solves a Very Big Problem,” I/O Magazine, 2000, available online at <http://www.iowebsite.com/products/3_1.html>; Seybold Seminars Boston Publishing 2000, “The Editors’ Hot Picks: Ebrary.com,” available online at <http://38.241.81.30/SRPS/free/hotpicks/ebooks.html>; EarthWeb Inc., “EarthWeb Launches ITKnowledge: Online Services Provider Goes Live with Subscription-Based Online Technical Reference Library for IT Professionals,” October 7, 1998, available online at <http://www.internetwire.com/technews/tn/archive/tn981060.htx>; and “New Technologies Transform Publishing Industry: Production Time and Costs Are Being Cut While Publishers Gain More Flexibility,” by Barbara DePompa Reimers, in Information Week, March 27, 2000, available online at <http://www.informationweek.com/779/ebooks.htm>. works and commission close to 100 new ones in the coming few years. Both projects are funded by the Andrew W. Mellon Foundation. How will libraries integrate such new book formats? Digital Music, Digital Video, and the Convergence of Formats Traditional libraries are most often conceived of as repositories of textual artifacts. High culture constructs itself around artifacts that can be managed and cataloged in particular ways. But the technological changes of the next decade will no doubt challenge that conservatism in new ways. The technical convergence of data, voice, and video technologies, coming to the desktop (or palmtop) through a single network connection, will encourage consumers to think first of data and only secondarily of media. Libraries will then be particularly challenged to decide whether

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress and to what extent their traditional focus on textual or mostly textual artifacts fulfills their responsibility. At the very least, libraries in the year 2000 should be actively assessing the possibility that they will be called upon in the future to be the repositories of whole classes of artifact quite unlike what they owned before, placing very different demands on their various skills. These artifacts will be emphatically commercial in purpose and appeal but no less important than traditional artifacts for documenting American culture in decades to come and therefore inescapably part of the collecting mission of the Library of Congress as now articulated. In the past, the Library of Congress successfully accommodated the introduction of media such as sound recordings and film. High Initial Cost of the Electronic Environment Formally published online information resources are expensive to license, often costing more than one would expect to pay for print. Startup costs for both sellers and purchasers of information are higher than the costs of maintaining traditional print information: (1) information providers are investing in new technologies and skills, and a database or subscription price attempts to recover many, if not all, of the start-up costs over time; (2) institutions are developing their own technological and human capabilities, also with significant new costs; and (3) libraries and publishers are maintaining parallel information systems and resources in traditional and electronic media (i.e., the introduction of electronic media has not yet displaced traditional media in most libraries but instead adds cost as it adds functionality). These additional costs will not disappear for some time. But the primary barrier to the use of online information resources is the cost of licensing the electronic resources marketed by publishers and vendors. Most libraries are funded on a model for the acquisition of fixed-format materials. License agreements, or even specialized CD-ROM products and services, are being offered at prices that encourage many libraries to consider cooperative purchasing. Furthermore, the new model for library acquisitions must consider not only the cost of access to the information, but also the cost of maintaining the technical capability (hardware, software, personnel) required to make these resources available to readers. Various contemporary students of the economics of information have asserted that the only financial survival option of libraries is to scale up into efficient, cooperating entities. Brian Hawkins, now president of EDUCAUSE, wrote as follows: “It is clear that the current unit of analysis … cannot survive in the existing environment. The leveraging of our library resources is clearly called for, with the best solution being the

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress largest system-level possible.”34 How to achieve that scale is an open question. The large consortia that build national catalogs—the large utilities such as Research Libraries Group (RLG) and OCLC—clearly have a part to play in organizing any such meta-system, but the comprehensive libraries, particularly the Library of Congress, will also necessarily be challenged to join such discussions. The question here is whether such scaled-up participation amounts to a continuation of an old role in pursuit of an established mission, a new role in pursuit of an established mission, or a genuinely new function in which libraries may choose to participate or not, depending on their mission, inclination, and resources. User Demand for Electronic Resources Although librarians and publishers have, as yet, little quantitative data or user analysis to show how much or even how electronic information resources are used, there is no question that usage, to the extent it can be measured, is shooting up with every passing month.35 Readers regularly demand more and more such resources and protest loudly if online 34   The “largest system-level” is the highest level of aggregation or cooperation possible among libraries (from “The Unsustainability of Traditional Libraries,” by Brian Hawkins, in Executive Strategies, a publication of NACUBO and the Stanford Forum for Higher Education Futures, Vol. 2, No. 3, pp. 1-16). 35   Members of the committee asked information providers and librarians for usage data and received numerous replies from individual libraries, consortia, publishers, and information intermediaries. These data repeatedly tell the same story: usage is skyrocketing. Kevin Guthrie, president of the JSTOR core periodicals collection, at <http://www.jstor.org>, wrote in a private e-mail letter dated February 4, 2000: Usage at JSTOR participating sites more than doubled between November 1998 and November 1999. Overall, usage of the JSTOR database is increasing at a rate of roughly three times per year. Part of this growth comes from the addition of new sites; of course one must control for that. In any case, total 1998 usage was 5,920,398 accesses with 432,714 articles printed; 1999 usage was 17,311,453 accesses and 1,224,400 articles printed. In November 1998, there were 1,063,675 accesses at a little less than 400 sites; in November 1999, there were a total of 2,223,013 accesses from those same sites. From Min-min Chang, librarian at the Hong Kong University of Science and Technology, the committee learned that the university’s Web server usage page recorded 233,848 hits in November 1997; 370,947 in November 1998; and 859,473 in November 1999 (see <http://library.ust.htk.usage/pwebstats/g-index.html>). The Florida Center for Library Automation reports that its Elsevier server’s articles were accessed 58,842 times in 1998 and 207,006 times in 1999 (Michele Newberry, assistant director for Library Services, in private e-mail dated January 25, 2000). Indiana’s statewide virtual library project INSPIRE delivers journal and periodical databases to everyone in the state and reported serving up 10 million pages of full-text data in 1998 and 20 million pages in 1999; the staff projects that 40 million

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress access to any index, dictionary, encyclopedia, text, or collection is removed, even if the reasons are seemingly good ones—for example, if a library or institution believes the price is too high or the usage too small to justify continuance. This reaction is no surprise and from the reader’s point of view makes perfect sense. Because we live in an age of intense national, institutional, and personal competition, information is essential, and meeting the need for knowledge is an indisputable requirement. Digital Materials, Ownership Rights, and Libraries Traditionally, libraries have purchased physical objects for their collections and permitted these to be used as allowed by copyright law and best practices. Many electronic resources, on the other hand, are maintained by publishers, vendors, or others designated by them, and libraries or individuals can obtain rights of access through custom or mass-market licenses. A library’s readers then access the works through high-speed information networks. A library’s electronic resources licenses generally permit readers a broad range of educational, research, and personal uses, and, increasingly, the book and journal suppliers promise, in the licenses, to deliver ongoing (“perpetual”) access or to give the institution a residual “product” should the publisher or vendor discontinue the work, sell it to another supplier, or leave the business altogether. Nonetheless, this new business and (non)ownership model raises many questions for libraries and for society—issues that are hotly debated wherever information policy and practices are discussed. What future, if any, is there for the largest research libraries in an era when, increasingly, materials that readers want are available without going to the library? More importantly, when libraries or other entities do not have contractual rights to archive and preserve electronic content, can they develop adequate archival and preservation mechanisms for electronic materials? And, finally, will we develop the types of institutions and agreements that can carry electronically communicated knowledge into the far distant future, just as today’s libraries have effectively stew-     pages will be delivered in 2000 (private e-mail correspondence from Juck Lowe, project manager, dated January 26, 2000). The American Society for Cell Biology reports that the number of hits for its e-version of Molecular Biology of the Cell in the 2 years since its release increased about 500 percent; by contrast, the number of subscribers to the print journal increased about 6 percent in the same time (private e-mail from Heather Joseph, dated January 28, 2000). Reference publisher InfoUSA reports a 1,391 percent growth in searching on its Web site from August 1998 to August 1999 (Doug Roesemann, in private e-mail correspondence dated January 25, 2000).

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress arded knowledge throughout the past centuries? What institutions should have this mission in the future, and how will they carry it out? These questions speak to large and vital questions of societal well-being. The traditional regime of copyright, enshrined in the copyright law (Title 17 of the United States Code) and its provisions for fair use (section 107) and libraries (section 108), creates for libraries (institutions that for at least the last century have been designated as holders of materials for the public good, materials that they then make freely available to the reading public) a role that now appears to be challenged if not attacked outright. As libraries generally shift toward a service rather than a collection orientation (see above), and as the economics and technologies of publishing (particularly for digital materials) make it harder for libraries to own their materials, the concept of freely accessible information, at least to the academic and public library user at the point of use, may well be at risk. The Library of Congress has a unique and privileged position in the acquisition of cultural materials. While all other libraries in the United States must find willing sellers and must have the financial wherewithal to become willing buyers in a freely made commercial transaction, the requirement of legal deposit with the Copyright Office gives LC a presumptive right to full ownership of a copy of each and every artifact published in the United States.36 The publisher in this particular relationship with LC is not a willing seller but a law-abiding citizen paying a kind of “public good” tax on his output. The two questions that arise here are the following: Can LC gain parallel rights for digital information formats? How will it exercise these: to whom and on what terms will LC offer access to the digital works that fall into its purview? This point is central to all discussions of mission and cannot be emphasized strongly enough. The Library is different from all other libraries when confronting the explosion of digital content and the shift from purchase to licensing. THE GREAT LIBRARIES IN THE ELECTRONIC AGE The multitude of electronic databases, the rapid growth of Web sites, the increase in the number of electronically available print journals, and the availability of numerous full-text resources such as reports, dissertations, and electronic books all represent a dramatic change in the dissemination of scholarly and cultural content. The history of this revolution is 36   The unique role that the Library of Congress plays in the U.S. copyright system is exceedingly important for the operation of the Library and is discussed elsewhere in this report, beginning in Chapter 2.

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress short and there is still much to learn, on all levels and in all areas. Not a great deal has been published about how the great libraries are transforming themselves to greet the electronic age, although Maurice B. Line notes that three factors have caused an almost ceaseless questioning of the roles and futures of national libraries: (1) automation, information, and communications technology; (2) the intrusion of the private sector into areas formerly sacred to libraries; and (3) the globalization of libraries. He writes: It is always useful to ask, ‘If we did not have such-and-such, would we invent it?’ From a strictly utilitarian point of view, it is doubtful if we would now invent monumental national libraries; we would find other and cheaper, if less effective, ways of performing national functions. But we do not start from scratch: big national libraries exist, and it is almost unthinkable, on the grounds of cost and logistics alone, to dismember them and distribute their resources among other libraries …. Secondly, national pride is a major factor …. Another factor is what may be called linguistic pride.37 A great deal of information exists about the emerging practices of the largest research libraries in the United States and Canada, particularly in the publications and data gathered by the Association of Research Libraries (ARL).38 In particular, about 25 of these libraries are also members of an initiative named the Digital Library Federation, under the auspices of the Council on Library and Information Resources (CLIR), which leads its members in exploring various key infrastructure issues and helps them to formulate useful projects and experiments.39 While the continent’s large research libraries in no way begin to compare in collecting scope, mandate, or size with the Library of Congress, they have, nonetheless, begun to move aggressively in the direction of becoming hybrid libraries (i.e., libraries that embrace information in numerous formats, now including electronic formats).40 How far have they come? 37   “Do National Libraries Have a Future?” by Maurice B. Line, in LOGOS, Vol. 10, No. 3, 1999, pp. 154-159. 38   For additional information, including newsletters, reports, and statistics, see the ARL Web site at <http://www.arl.org>. 39   For additional information about the Digital Library Federation and its sponsoring organization, CLIR, as well as numerous full-text reports online, go to the Web site <http://www.clir.org/diglib/dlfhomepage.htm>. 40   For an articulate view from the United Kingdom and its electronic libraries program, funded by the government’s Joint Information Systems Committee (JISC), see “Towards the Hybrid Library,” by Chris Rusbridge, in D-Lib Magazine, July/August 1998, available online at <http://www.dlib.org>.

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress The transformation of library culture and practice by the adoption of information technology continues apace. An increasing emphasis on service and a decreasing emphasis on collection have already been noted. As the committee surveyed the national scene, it found that libraries are engaging in a broad range of novel activities. It attempts to list them here to give as rich a picture as possible of the nature and momentum of change. Libraries are incorporating electronic technologies and services into the everyday work of all staff by doing a number of things: Working for the broadest possible access for readers in the electronic environment. Not only are libraries seeking technological standards and presentation of resources in forms accessible to the broadest range of readers, but they are also lobbying to advance the public policy debate in ways that support broad access for the good of society as a whole. Reallocating an increasing and visible portion of collections budgets to the electronic resources needed by their readers. For example, ARL Supplementary Statistics indicate that in FY98/99, 29 ARL member libraries spent more than $1 million of their collections budgets for licensing electronic databases,41 representing anywhere from 6 to 22 percent of their library materials budgets. Building collections of digital resources that, while not yet rivaling traditional collections in scope and bulk, are substantial, of high value, and integrated in the traditional patterns of collection and use. Working to shape and support initiatives such as community education, online course support, Web page design, teaching-specialist electronic resources, and digitizing of materials for these programs—all with a view to making educational opportunities as broad, rich, and accessible as possible. Lifelong learning is the opportunity and the goal, and “distance learning” is the current buzzword for the tactics librarians seek to support. Finding new ways to measure the usage patterns and behaviors of readers, so as to anticipate and support their needs, bringing the right resources into play for readers. The digital environment facilitates such measurement and, accordingly, such feedback, giving a better allocation of resources than has ever been possible with print media. Devoting increasing effort to more sophisticated reader services associated with single and multiple electronic resources. Librarians are more often than ever teachers of how to use electronic resources, and readers spend less time pursuing simple factual information at traditional reference desks. 41   See “ARL Supplementary Statistics 1998-99” (Washington, D.C.: Association of Research Libraries, forthcoming).

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress Cooperating with other libraries in setting up networks that make libraries effectively a single virtual (through the locator tool of interoperable online catalogs) institution that can deliver physical materials, via advanced interlibrary loans and document delivery, to more and more readers more effectively—and more cost-effectively—than ever. Delivering physical materials by electronic means. As physical materials become increasingly deliverable at a distance, libraries are putting more and more electronic delivery services into operation. Partnering with other participants in the creation and dissemination of knowledge. Libraries can, for example, work with individual authors, organizations, publishers (commercial and noncommercial), booksellers, and software companies to create and make available functional and well-used online resources. Digitizing and making available to readers material already in library collections and special collections. Such materials would include, in particular, out-of-copyright material, image collections, sheet music, maps, and other traditional library treasures. Subscribing to online services that provide statistical data. Libraries would help readers learn to manipulate services containing anything from historical census data to financial market data. Creating multimedia servers for music, film, and other media. At the same time, thorny questions of access and permitted use must be addressed, and the technological capability to handle significant quantities of such material must be developed. Using the new generation of library management systems as a spring-board not only for integrating forms of access to a wide range of materials and formats but also for reengineering the entire workflow and back-office processes of traditional librarianship. The technical services of libraries are becoming increasingly business-like, streamlined, and closely managed, with closer links than ever to vendors through electronic data interchange (EDI) and other forms of electronic interaction that work to the advantage of all parties. Working to understand the technical demands, possibilities, and long-term costs and responsibilities of digital media as instruments for the preservation of library information, including material from traditional print media (e.g., the contents of books printed on acid-based paper) and material created in digital form. When we fully understand the challenges of moving digitally preserved information from format to format, from one hardware and software system to a new hardware and software system, we will have made great progress in solving what many think is the biggest remaining problem in establishing truly functional and satisfactory digital libraries. Working through the issues that must be faced in deciding which kinds of resources are best maintained locally, library by library, and which resources are

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress best maintained elsewhere, whether by publishers, vendors, library consortia, or third parties. Traditional librarianship achieves security and preservation by having redundant physical copies: the challenge now is to balance redundancy (and thus security) with optimal efficiency and to avoid unnecessary duplication of effort. Understanding evolving legal regimes such as copyright and licensing. In this arena, librarians seek not only to understand but also to shape and influence developments, thus securing agreements that offer readers high-quality, reliable, and permanent access to resources. Exercising responsible stewardship of library resources, which are usually purchased with public funds or from not-for-profit institutional budgets. Such stewardship requires keen understanding of the business models and economics of the new information sources in an environment in which libraries find themselves increasingly offered not ownership but access, not a once-for-all price but something closer to annual subscription or by-the-drink pricing. Cultivating an expertise in technology matters. The technological infrastructure of a library now faces a new degree of volatility and continuing costs as equipment and software need upgrading. The e-marketplace makes it literally impossible to choose not to play the upgrade game: in a very short time, a library’s information would simply become unavailable if it persisted in using even slightly outmoded operating systems or software. Continuously upgrading human resources and skills. The librarians and support staff at this time of transformation must undergo no less arduous a series of “upgrades.” As in other sectors of our economy, it is impossible in the library sector for staff to acquire and practice skills and then use them for a lifetime; instead, they must grow and adapt, and there are real and substantial costs for supporting the necessary training and for paying a more highly skilled staff. Seeking new funding sources and opportunities. Traditional funding sources—annual budgets doled out by the government or not-for-profit organizations with a tiny annual increase—no longer suffice. Librarians are increasingly engaged in entrepreneurial efforts, whether soliciting research and development funding from granting agencies, developing partnerships with other entities in the library sector, or participating in cost-recovery projects with the commercial sector that serves and interacts with the library community. And libraries do more. There remains an underlying question: Do most consumers of information need the intermediation of libraries? That issue will play itself out in the next few years. But the preservation of the overall collection of human creativity is not something that publishers

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress can achieve unaided. In an odd way, the Library of Congress remains the U.S. library most assured of a future. If that is true, many vital questions remain, including the following: What should be the goal of our great libraries, including LC, in this time of technological transformation? (The committee confesses to admiring the view articulated by Anthony Appiah: “… the library I never go to is already one of the most important places in my life.”42) What sort of institutions should libraries become when every current book, dissertation, journal, and so on is available on electronic servers, potentially never out of print? What is the place of libraries, particularly LC, with its unique copyright role, in such an information-communications system? If librarianship is increasingly a matter of public service and LC retains its core mission of collection rather than service, how and where will it differentiate itself from the rest of the library community, and how will it cooperate and collaborate with that community to bring about the greatest societal good? In short: What will be the future role of the great research, national, and academic libraries? Who will define that role? And what do libraries need to do to play it well? In this period of acceleration, scaling up, and ongoing, hard-to-predict change, developing strategies and continuously adjusting them characterize our new environment. It is complicated and it demands much. It offers immense room for innovation and conceptualization. It is tremendously exciting. ROADMAP FOR THIS REPORT This introductory chapter describes some of the remarkable opportunities and formidable challenges created by the digital revolution for the world’s great libraries. These opportunities and challenges are most dramatic for the Library of Congress because of the grand scale and scope of its operation. However, while this report focuses on the Library of Congress, much of the discussion in it is applicable to a broad range of libraries, archives, and other cultural institutions that face many of the same challenges. Moreover, because the digital revolution is still unfolding, many of its challenges (e.g., digital preservation) are not likely to be resolved any time soon. 42   “Realizing the Virtual Library,” by Anthony Appiah, in Gateways to Knowledge, Lawrence Dowler, ed. (Cambridge, Mass.: MIT Press, 1997), p. 39.

OCR for page 23
LC 21: A Digital Strategy for the Library of Congress As is true for many important libraries, the Library of Congress is a complex organization with multiple purposes and stakeholders. Over time, a distinctive organizational culture has developed, evolved, and become institutionalized at the Library. In Chapter 2, the present-day organization and structure of the Library is surveyed, with some emphasis on the Library Services unit, which most laypersons would regard as “the Library.” For physical artifacts—such as books, periodicals, manuscripts, recordings, films, and numerous other records of the cultural heritage—the core processes for libraries are well established. Materials have to be acquired. They are then organized for two purposes: internal management and use by clients. In the course of organizing, mechanisms such as cataloging exist to make it easy to find materials. Access to the materials is provided by making them available for borrowing and for on-site use in reading rooms. Preservation procedures are implemented to ensure that materials will be available indefinitely. The advent of digital information challenges many of these long-standing practices and raises many questions. Chapters 3, 4, and 5 are dedicated to an exposition of how the core processes of libraries need to evolve, within the special context of the Library of Congress. In the final chapters the committee’s focus returns to the Library of Congress as an organization. Chapter 6 discusses the Library’s role within the larger context of the library community and the information industry. Historically, the Library has led or been involved with major initiatives in the library community. How should it continue to do so in the future? Chapter 7 addresses the questions of whether the Library is prepared to play an important role in the larger community today and how it needs to evolve to ensure that it remains a leading institution in the digital future. In addressing these questions, the chapter discusses critical organizational issues such as human resources management and strategic planning. Key operational issues surrounding the information technology infrastructure are addressed in Chapter 8. The revolution in information technology raises a host of questions with regard to networks, databases, computer and communications security, and how LC should manage its development projects—through internal development, the purchase of off-the-shelf software, or contracts with integrated systems vendors.