Cover Image

PAPERBACK
$31.00



View/Hide Left Panel

Appendix A:
Commissioned Papers

A National Library for Undergraduate Science, Mathematics, Engineering, and Technology Education: A Learning Laboratory

Prudence S. Adler

Assistant Executive Director

Association of Research Libraries

Mary M. Case

Director, Office of Scholarly Communication

Association of Research Libraries

Introduction and Summary

Digital technologies are transforming all aspects of education including scholarly communication. These technologies provide an unprecedented opportunity to rethink how the education community including the research library community creates, uses, publishes, accesses, and manages these resources. The greatest impact of these technologies can be seen in the creation of new knowledge and how scholars, researchers, and students are increasingly finding new means of providing education services and collaborating Via the networked environment. This transformation or transition to a new mode of scholarly communication and education generally necessitates a rethinking of the concept of a "digital library." One does not want to recreate the current system that predominately reflects the print environment and as presently constructed, does not provide equal benefits to all participants.

Instead, in designing a digital library for undergraduate science, mathematics, engineering, and technology education, a starting point could be identification of the values or ethic of a new scholarly communications medium in support of education and to map these against the potential of the networked environment. As Einstein commented, "The significant problems we face cannot be solved at the same level of thinking we were at when we created them." There will be more than ever the need to heed Einstein's advice and think creatively in these discussions.

The development of a digital library for undergraduate science, mathematics, engineering, and technology education (SME&T) presents a unique opportunity to build a "library" which breaks out of the current system and incorporates the unique capabilities of the networked environment into its structure. It also permits the scholarly and research community to recapture and reengineer one facet of the scholarly communication process to meet their needs.

Selected Attributes of an SME&T National Library

The National Library should provide an active learning environment. Using the vast capabilities of the Internet and the Next Generation Internet, the National Library should provide a distributed system of access not only to primary data resources, reference materials, and other resources, but to a variety of interactive components including such features as: computer-aided design, lab simulations, access to research tools such as telescopes, instructional software, virtual reality, multimedia, teleconferencing, among many others. With the emphasis on introducing students to research early in their undergraduate careers, the National Library could provide not only instructional packages for basic undergraduate classes, such as calculus and statistics, but could also provide the opportunity for students even at small institutions to participate in the research of scientists located at major research universities. Simulations, online lab notebooks, shared problem solving would all support the collaborative environment that makes learning so exciting.

Key to this vision of a National Library is the concept of providing the resources, tools, and col-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 59
--> Appendix A: Commissioned Papers A National Library for Undergraduate Science, Mathematics, Engineering, and Technology Education: A Learning Laboratory Prudence S. Adler Assistant Executive Director Association of Research Libraries Mary M. Case Director, Office of Scholarly Communication Association of Research Libraries Introduction and Summary Digital technologies are transforming all aspects of education including scholarly communication. These technologies provide an unprecedented opportunity to rethink how the education community including the research library community creates, uses, publishes, accesses, and manages these resources. The greatest impact of these technologies can be seen in the creation of new knowledge and how scholars, researchers, and students are increasingly finding new means of providing education services and collaborating Via the networked environment. This transformation or transition to a new mode of scholarly communication and education generally necessitates a rethinking of the concept of a "digital library." One does not want to recreate the current system that predominately reflects the print environment and as presently constructed, does not provide equal benefits to all participants. Instead, in designing a digital library for undergraduate science, mathematics, engineering, and technology education, a starting point could be identification of the values or ethic of a new scholarly communications medium in support of education and to map these against the potential of the networked environment. As Einstein commented, "The significant problems we face cannot be solved at the same level of thinking we were at when we created them." There will be more than ever the need to heed Einstein's advice and think creatively in these discussions. The development of a digital library for undergraduate science, mathematics, engineering, and technology education (SME&T) presents a unique opportunity to build a "library" which breaks out of the current system and incorporates the unique capabilities of the networked environment into its structure. It also permits the scholarly and research community to recapture and reengineer one facet of the scholarly communication process to meet their needs. Selected Attributes of an SME&T National Library The National Library should provide an active learning environment. Using the vast capabilities of the Internet and the Next Generation Internet, the National Library should provide a distributed system of access not only to primary data resources, reference materials, and other resources, but to a variety of interactive components including such features as: computer-aided design, lab simulations, access to research tools such as telescopes, instructional software, virtual reality, multimedia, teleconferencing, among many others. With the emphasis on introducing students to research early in their undergraduate careers, the National Library could provide not only instructional packages for basic undergraduate classes, such as calculus and statistics, but could also provide the opportunity for students even at small institutions to participate in the research of scientists located at major research universities. Simulations, online lab notebooks, shared problem solving would all support the collaborative environment that makes learning so exciting. Key to this vision of a National Library is the concept of providing the resources, tools, and col-

OCR for page 59
--> laborative environment to support the creation of new knowledge. The promise of the networked environment is not in the pipes or the data alone. The great promise of the network is in the ability for human interaction with vast amounts of data and with numerous other students and researchers from around the globe. The value of many minds exploring the same problem or the serendipitous connection of seemingly unrelated efforts is enhanced by a national network. A National Library system that supports and encourages the creation of new knowledge by undergraduates could serve as a model for reshaping the educational process for other disciplines. This National Library would need a number of attributes to succeed. First, the user must be able to access the available resources transparently, regardless of his or the information's location. A robust network, and effective and affordable delivery systems (bandwidth, scalable systems, and quality of service) are inherent parts of such transparency. Second, applications to retrieve, access, authenticate, evaluate, and utilize data and information, including detailed metadata and/or object content, will be critical in the effectiveness of the National Library and its adoption and use. Third, applications to perform operations on the data—to make it meaningful to the user—including the provision of links to other providers' sites, commercial or noncommercial, and/or to compile data objects to meet a user's needs, will be an essential component of the system. Fourth, authoring applications that simplify the reporting of research results, the incorporating of data Sets or simulations, the building of curricula, will be critical to ensuring timely and active reporting of research results by faculty and students. Finally, communications systems that support interactive real-time text, audio, and visual transmission—the key to the student-faculty, student-student, and student-resource interactions—are equally as important to the success of the National Library. In addition to these technical issues, there are a number of practices and policies that need to be considered to achieve the vision of the National Library. Access to robust content is essential to the education and training of tomorrow's scientists and engineers. Content in the National Library must be provided with the understanding that it will be used in various ways to support the educational mission. These uses may include access through multiple sites by students across the country, printing and downloading for individual student and classroom use, excerpting for inclusion in papers, projects, instructional packages, multimedia presentations, etc. The uses may also include the making of preservation copies by designated library sites to ensure long-term access to the resources and new knowledge created through the system. Content for the National Library will come from a variety of sources. Several years of backfiles of scholarly journals, varying by discipline, will need to be included. Permissions will need to be obtained and broad use rights as described above negotiated. Indexing and abstracting flies will also be needed to support efficient access to the published literature. Primary resource data, such as photographs from Mars, human genome data, astronomical observations, the periodic table, geographic information data, should also be at the student's fingertips. To be successful, new practices and policies are required, particularly with regard to copyright and intellectual property. The current balance between users of proprietary information and creators of that information must be maintained while at the same time, rethinking how creation of information and access to that information is managed. The education community is both a creator and user of proprietary information, Thus members of this community participate in the full spectrum of activities regulated by the laws governing copyright and must be sensitive to the balance of interests. As digital technologies revolutionize how information is recorded, disseminated, accessed, and stored, these technologies eliminate the technical limits that have supplemented the legal framework of balance between ownership and public dissemination. Unlimited technological capacity to disseminate by transmission in ways that can violate the rights of copyright holders confronts equally unlimited technological capacity to prevent works from being used in ways contemplated by law. Carried to its logical extreme, either trend would destroy the balance currently enjoyed, with results that would likely undermine core educational functions as well as radically transform the information marketplace.

OCR for page 59
--> New practices could extend this balance but ones that would represent a different ethic, namely balance in support of furthering the goals of the education community. For example, a legal regime which ensures that such factual data critical to the progress of science remains in the public domain is essential. It will also be important that current work of faculty and students be a vital part of the library. Participation in the library as an active learning environment will require faculty and students, if they choose to publish their work elsewhere, to retain the rights that would allow full use of the resource within the National Library. Participation in the National Library will also mean that faculty and students accept the responsibility to respect use restrictions where they apply and to respect the principles of attribution and fairness in the use of others' work. To encourage faculty contribution to and participation in the National Library, implications for promotion and tenure will need to be addressed. If the new knowledge created through the National Library is highly collaborative in nature, with the potential for students and faculty from around the country making contributions, how is the contribution of an individual evaluated? Technology may provide some help, if methods can be found to track unobtrusively the flow of discussion as a group makes its way through a problem, simulation, or experiment. But at what point does such tracking invade the privacy of the participants? In addition, how is the extensive contribution of a faculty member to the development of a national curriculum in undergraduate science to be evaluated within an educational context that emphasizes research and publication? Finally, the National Library will only be a success if it indeed improves the education of undergraduates in science, mathematics, engineering, and technology. Evaluation methods will need to be designed to measure pre-and post-library use. Not only skills and knowledge could be measured, but also interest in the disciplines, interest in research, and career plans. Selected Concerns/issues to Resolve The SME&T National Library would be an extremely useful application to test the evolving and proposed infrastructure for the Next Generation Internet, the vBNS, and the I2 project. But to be effective, the SME&T National Library should be accessible and available well beyond those participating in these efforts. It needs to reach a much wider audience (e.g., community colleges and selected K-12 institutions) who generally do not have the necessary connectivity nor the resources to gain such connectivity. This represents a significant hurdle for the success of the proposal. Collaboration with the networking division of NSF would be essential to design strategies to promote needed connectivity to these other institutions. It will be important to view this as a multi-phased initiative; one that acknowledges the shortcomings and limitations of the current environment yet continues to promote a broader vision. As noted above, there will be a need to refocus the expectations and in many instances, practices, of faculty and students regarding publication, education, and access to resources beyond an individual institution. To be successful, the National Library must meet the timing and access needs of community, reflect how the materials are used, and in the settings that are most productive. This again may be particularly problematic with regard to connectivity issues. The availability of trained and committed staff able to build, navigate, and to translate the needs of users in this complex environment is another key factor in the success of this initiative. If we think broadly and imaginatively, the digital National Library has the potential to create an exciting new learning environment for undergraduates that may result in better education and an increased interest in pursuing careers in science, mathematics, engineering, and technology.

OCR for page 59
--> The Digital Library as “Road and Load”: Partnerships in Carrier and Content for a National Library for Undergraduate Science, Mathematics, Engineering, and Technology Education Harold Billings Director of General Libraries The University of Texas at Austin Introduction The "digital library" is still being defined very much with respect to how it is perceived in the mind of the beholder. A librarian, a computer scientist, an educator, a journal publisher, or a Web master will each have a different perception of what a digital library is from their point of view. For the majority of participants in the technical construction of the infrastructure that provides access to information on the Internet or within the World Wide Web, and for most of the users of that world, a digital library is simply a collection of information stored in electronic format. Defined in this fashion, cyberspace is a virtual wilderness of digital information of frequently dubious content, utility, authority or longevity. The rapid growth of information sources that are of more obvious benefit to the business, governmental, educational, and technical communities—and to the prospectively larger benefit of the social weal—has led to an intensified examination of the infrastructure that supports the provision of and access to such resources. In several ways, there has been much more attention paid to the "road" to distribute and reach digital resources than there has been paid to the "load" it carries. The digital library is much more than the definition above assumes it to be. Its roots lie much more deeply in the traditional library than is generally assumed, and its users are likely to be more typical, in need and habit, of traditional library customers than might be imagined. But road and load are dramatically different within the traditional and the digital library. This difference, built around the new information technologies, represents both the opportunity and the challenge for the establishment of a National Library of Science, Mathematics, Engineering, and Technology Education, because such a library must incorporate the best of each. Road and Load Road as carrier, and load as content are the most basic elements of a digital library. Carrier is defined by computing and telecommunications—information technology. Content is defined by the boundaries of several digital library spatial concepts. These concepts include those of information stored "somespace" in electronic format. It includes information in physical format stored somewhere, usually within the traditional library, capable of being transmuted into digital form and delivered through electronic transmission. And it includes the interactive distribution and receipt of purposeful scholarly communication, teaching, and distance learning content—all deliverable through the information technologies that provide the road. Digital content may consist of textual information, sound, video, animation, raw data, response-invoking semiotics such as art or music, streaming communication, electronic curricula or teleconferenced course content, and so on—all of it hypertextually linked. Its sources may be commercial Web sites, "publications" by amateurs on the Web, interactive courseware developed at a faculty member's home page, databases at national laboratories, traditional library Web sites which supply information stored in local servers including bibliographic, textual and multimedia content, and a plethora of other information providers with a presence on the Internet. The fact is that the information available via the Internet is dwarfed by the holdings of most academic libraries. The problem is that the road to the content of most libraries continues to be a manual circulation or interlibrary lending system that reaches print-on-paper resources. Comparatively little useful information is yet available through the Internet digital road and load, and as suggested above this information is a complex mix of digital library contributions. The manager of the digital library selects and organizes this content, supplies access to it, and may provide new types of digital library assistance: e.g., routinely staffed interactive help-desks, electronic

OCR for page 59
--> messaging between user and library, the sharing among institutions of human expertise to provide specialized subject assistance, and information-gathering tools that use every hour of the day to identify and deliver to the scholar information of interest that has gathered in its space. The best of traditional libraries are now identifying and providing the Internet address on their Web home pages of the most useful digital content providers. These. libraries are providing access to electronic journals and other information mounted on local or remote servers, and are encouraging the reformatting and delivery of information from library print collections into a load suitable for the national information highway. This is the same plan that is most likely to serve as the architecture for the National Library, a library of selective digital library and distance learning linkages. Question. The selection of resources to be "held" in the National Library will certainly be a major issue. Is the concept of distributed selection the best model, or should there be a central "collection development" office? (It should certainly be assumed that a "collection development policy" similar to those used in contemporary traditional libraries would need to be constructed if determinations of the content of the National Library were to be made at the top, rather than built up through the accretionary development of choices made at the local SME&T participant level by faculty and library collection development experts.) The growth of information available over the Internet has started to attract more attention by the federal, state, educational, and scientific sectors for its potential to enhance learning and research. The recent submersion of the Internet by commercial and personal interests has led to consideration of a means to establish an Internet dedicated to education and research. It is apparent, however, that there will have to be gates between these roads since a digital library will require access to loads on each. A number of projects at institutional, state, and library association levels have been established to promote a higher-quality information infrastructure and to foster the development of a more useful national structure of digital libraries. Question. How is it possible to relate federal government and state activities in the changing model of distance learning and distance information? There is a growing federalism of library programs—that is, a distinct urging through funding and legislative mechanisms at the state level to push libraries into information management organizations or networks to leverage their resources. Will this conflict with the concept of a National Library for SME&T Education ? In my view, the traditional library that in the past has supported the informational needs of the K-12, higher education, and the general public communities has not been as closely involved in the digital library movement as it should have been. This is a result of both lethargy on the part of the traditional library community and of a misapprehension by the information technology and instructional communities that there will be as important a role for the traditional library in the information future as there has been in the information past. As I see them, traditional research libraries are being rapidly enhanced and extended electronically. In the most accurate sense of the word, these libraries are becoming increasingly "bionic"—organic, evolving bodies whose collections are growing rapidly in both traditional paper and in digital format, and that are increasingly responsive interactively with their users. It is from the grass roots growth of information resources into the contemporary "bionic library," and from the creation of pockets of digital information created by multiple agencies and authors and distributed throughout the Web, that the hypertextual fullness of a National Library for Undergraduate Science, Mathematics, Engineering and Technology Education can be realized. To some extent, just as the definition of a digital library poses uncertainties in a discussion of issues relating to it, the metaphor of the "web" may confuse the necessary definition of the National Library and the prospective role of the National Science Foundation in that enterprise. Given the organic nature of libraries and knowledge, a superior metaphor in these circumstances might be that of a hypertextual Knowledge Tree, with its roots in undergraduate scholarship and research and its limbs and twigs the result of the growth triggered at these grass-root levels. The information resources that abound today on the World Wide Web are not the results of a top-down effort, but rather that of a libertarian attitude that lets the many roots and trees and flowers grow. It is almost ironic that

OCR for page 59
--> much of the current digital library environment was designed by students who had the foresight to seize and exploit information technology opportunities. It is likely that this will be the same group that will continue to use the Web in unique and uncontrollable ways, helping create new mechanisms from which digital libraries and distance learning can grow. It may be that it is this group, also—students active in SME&T—who can tell us how a National Library might be constructed and how it might best meet the needs of the undergraduate student. In terms of the information content of digital libraries and the services required of them, there is basically little evidence available today to differentiate between the methodologies of the users of science, mathematics, engineering and technology library materials and those of other disciplines. Content is the point. A major problem at the moment is the difficulty in attracting the attention of either faculty or students to the richness of the resources available to them through the Internet or the World Wide Web. But, ultimately, it will be this audience who will determine which information resources, which faculty, whose courses, what kind of learning techniques, flourish on the Web and which are assigned to the digital dust heap. Question. How can the National Library and the National Science Foundation make the best use of the experience of institutions that have established digital library models and have provided both general content and learning resources to accompany the information they provide? An example is the Library of Congress's American Memory Project, and its accompanying Learning Page. How to adapt such programs for SME&T? Distance Education and Distance Information The location of information is of little importance until it is needed. Every hypertext location on the World Wide Web is immediately present to any other location, and every visitor on the Web is in virtual assemblage with other visitors. Time presents fewer constraints on access to information than it has in the past. Through the use of new information technologies it is increasingly possible for libraries to provide information to the user wherever, whenever, and in whatever format it is needed. The availability of the technology, however, does not ensure access to the needed information. The barriers of cost, copyright, licensing, an absence of economic models, a deficiency of skilled knowledge workers and teachers versed in the capabilities of the new technologies, and a lack of understanding at policy-making levels of the advantages to be gained from the new information opportunities, have all restricted the level of progress that should have been made in this arena. Perhaps the most serious impediment to progress has been the difficulty of gaining the attention, or maintaining the attention, or providing a sufficiency of attention by the groups that should gather in partnership to encourage a rapid and orderly progress towards a richer digital library model and, in this instance, to develop the National Library. Michael Goldhaber certainly has a point when he contends that the new natural economy is not information but attention. Whatever catches and commands attention becomes a major currency, although the target of that attention is only worth the informational substance imbedded in it. The economy of digital libraries is a major concern. Digital reference works, electronic journals, and scholarly databases are already here, already on the Web, and already delivered to millions of undergraduates, and the number of these products grow by the minute. But these products do not come cheaply. The enormous costs of information products in SME&T can be illustrated by the recent deal that Elsevier made with OhioLINK, the digital library alliance of Ohio. Elsevier has licensed its journals to members of OhioLINK, some 40 libraries, for three years, for $23 million—quite a load of another kind. Given this arrangement in one state, from one publisher, for three years, it is possible to gain some perspective regarding what the costs might be for the digital products of several publishers for just a few states for just a few years. Can a National Library offer a lower-cost model? Question. NSF must ask itself what its most appropriate role would be in helping develop a National Library. Should it place its resources and effort at the top? or at the grass roots? Could it develop cost models, collect background materials, provide expert testimony, underwrite the preparation of electronic

OCR for page 59
--> curricula, support an organizational partnership to design the National Library? Or, might not its best role be as facilitator and enabler, not just of the National Library as a "collection," but as an agent to help direct scholarly communication towards a new model of SME&T information creation and delivery exemplified by the National Library? It is very much the case that many funding bodies for educational institutions have seen the digital future and believe it holds the answer to reducing costs and achieving efficiencies in the learning process. The elevation of cost in the educational process has certainly captured the attention of those who provide its funding. Many examples of the implementation of distance education are appearing quickly on the national educational scene. This rapid emergence of a demand for distance education and learning among institutions of higher learning—even though most teachers do not appear ready to embrace the concept—is leading also to the required availability of information to support these programs. Any learning process must include an information source. "Distance information" is just as relevant in this new paradigm as are new pedagogical considerations, as are issues that relate to teaching methodologies and learning skills, as are new environments for this process. The traditional teacher-centered environment that has been a characteristic of the university as place is beginning to be replaced by learner-centered environments, where the requirements for distance education enterprises and for distance information are unlike those for the classrooms and the libraries of the past. There are new road and load issues implicit in this new model that require thoughtful consideration. Any successful transportation or delivery system must pay attention to both road and load, to carrier and content, and to how each of these elements relate to one another. One must consider as a part of this entire system the technological, fiscal, legal, political and social issues, as well as telecommunications, content selection, content organization, indexing, human and machine interfaces, instructional tools, and, yes, even "sand boxes" to try the systems out in. New alliances must be formed and focused on these issues since there is no single entity that can address them all. The National Library project can provide a venue for such a partnership. Diminishing the Boundaries Between Educational Institutions The development of distance education and learning programs, with its attendant integration of digital libraries, of computing, information technology, and telecommunications, may very well shake apart the fundamental distinctions between the very institutions that have previously provided the educational experience. A more placeless role for educational institutions could lead to a diminishment of the boundaries between the K-12, the higher education, and the lifelong learning processes and the institutions in which they have traditionally been located. It might also more easily provide for a binding of the teaching, learning, and research processes into a more seamless model that could help reduce the presumed fracture that too many persons believe exists between these tightly related educational activities. The dissolution of the university as we have known it is a topic that appears frequently in the literature. The establishment of the "virtual university" is already occurring on at least a limited basis. As suggested above, it is undoubtedly the case that the same information content, pedagogical and learning processes, and similar interactive exchanges between teachers and learners—and their "libraries"—could very well be shared among the K-12 community and the undergraduate population of institutions of higher education. Lifelong learners, who will become an increasingly active component of the distance education population, as well as information users and learners in the research and business sectors, can very well make use of many of the same resources. The prospective far-reach of the National Library is probably far greater than can be immediately imagined. More meaningful partnerships need to be established to extract the full advantage of what the digital library opportunity holds to improve the level of knowledge skills required for competition in a new world commerce of goods, services, and ideas where information resources and a command of attention are the chief currencies. Information technologists, Internet engineers, computer scientists, human and machine interface designers, information service providers and publishers, experts in intellectual property rights, class-

OCR for page 59
--> room teachers, and librarians, who know the audience and the content—and including especially students themselves—who can articulate and resolve the complexities of the vision of the National Library, will be important to the success of any project that emanates from distance learning plans and digital library concepts. Question. What is the best structure in which to incorporate the participation of those parties identified above to establish and maintain the National Library? Would the concept of a single centralized national center of responsibility be the best choice, or would cooperative, distributed centers of digital library excellence, centrally coordinated—a Virtual National Library—be a better model? Whatever our preconceptions are of a digital library and the behavior of its users, the actual determination of those points will be made by the use that people make of the library. That will be determined by the marketplace of learning. That is the chief reason why the National Science Foundation must have as dear an understanding as possible of the road and load digital library issues that will confront the effective creation of an as potentially important learning and research resource as a National Library for Undergraduate Science, Mathematics, Engineering, and Technology Education. References Billings, H. 1996 [i.e. 1997] Library Collections and Distance Information: New Models of Collection Development for the 21st Century. Journal of Library Administration 24 (1/2): 3-17. Billings, H. 1991. The Bionic Library. Library Journal 116 (17): 38-42. Bothun, G. 1997. Seven Points to Overcome to Make the Virtual University Viable. Cause/Effect 20 (2): 55-57. Goldhaber, M.H. 1997. The Attention Economy and the Net. First Monday (http://www.first monday.dk/issues/issue2_4/goldhaber/index.html) Library of Congress American Memory (http:/llcweb2.loc.gov/ammem/) Wulf, W. 1995. Warning: Information Technology Will Transform the University. Issues in Science and Technology 11 (Summer): 46-52.

OCR for page 59
--> A National Library for Undergraduate Science, Mathematics, Engineering, and Technology Education: Needs, Options, and Feasibility (Technical Considerations) William Y. Arms Corporation for National Research Initiatives Digital Libraries and Undergraduate Education Introduction This is a discussion paper for the National Research Council workshop on August 7-8, 1997. The paper is arranged as a series of key topics that fall within the theme of the workshop. Although the paper emphasizes technical aspects of a digital library, it is impossible to introduce technical considerations without discussion of the overall goals and form of the library. "Please can I use the Web. I don't do libraries." ANONYMOUS CORNELL STUDENT, REPORTED BY CARL LAGOZE. The fundamental question for the workshop is how can a national digital library enhance undergraduate science education. My basic assumption is that there is little utility in taking. existing education materials, designed for other media, and simply placing them on a computer network. The greatest benefits will be gained by modification of curricula and creation of different forms of materials, in parallel with the deployment of the digital library. Some Personal Examples Each of us brings to this workshop pre-conceptions based on our own experiences. Here are two examples of my own. During the 1980s, as part of the Andrew project at Carnegie Mellon University, we invested heavily in the creation of educational materials. They were delivered over the campus network, through a networked file system—a campus digital library for education. The computing initiatives that grew out of the Andrew project have had an impressive impact on education at Carnegie Mellon. Our regular surveys of faculty showed more than half the faculty regularly using computing as an integral part of their courses, but the surveys also showed that most of this impact came from materials that were not developed explicitly for education. The surveys showed that the dominant educational uses were as follows: Professional computing tools. Many of the enhancements in education came from providing students with the same tools that the faculty use in their research and professional activities. These include applications programs (e.g., statistical packages such as SAS, symbolic mathematics such as Maple and Mathematica, graphical programs such as AutoCad or Quark), and mainstream computing applications (e.g., electronic mail, databases, and compilers). They also include data sets such as census data, NASA's images from space, and the genome data base. Although some of these tools began as non-commercial materials, by the time that they became widely used in science education, they were of a scale and complexity that required a commercial framework of support. Communication. For many years, the dominant applications over the campus network were electronic mail and bulletin boards. In addition, from 1986, extensive reference materials were provided by the university libraries over the network. These materials were widely used by both faculty and students. They also appear to have helped stimulate the steady increase in the use of traditional library materials that occurred during the same period, though it must be admitted that the use of libraries by engineering and computer science students (and faculty) never reached the level that one would hope. As soon as the Mosaic browser was released, the World Wide Web was adopted by the Carnegie Mellon community as a very important source of communication, both for finding and for publishing information. The second example comes from my time as a faculty member at the British Open University in the early 1970s. This was the first large-scale university organized completely around home-based learning. Although Britain has good public libraries, many students do not have easy access to a library.

OCR for page 59
--> Therefore, we were forced to construct courses on the assumption that students had access to no materials other than those provided by the course team. The university provided each student with a set of educational materials. These materials included printed texts, reprints of articles, and home experimental kits. Television and radio were used to augment these materials; they were an important part Of some courses, but less important in my areas of mathematics and computer science. The academic achievements of the Open University have shown that good undergraduate education is possible without providing the students access to a library. However, it places serious limitations on course design. In particular the options for independent work are severely limited. As distance learning becomes more common, the workshop might ask the question how can modern technology help a home-based university, or any university, improve on the Open University's approach thirty years ago. Both these examples show the importance of creating services where the teaching faculty have a large measure of control over how the services are used in education. Potential Benefits To begin to answer the question how can a national digital library enhance undergraduate science education, here is a list of the potential benefits that might be hoped from a digital library aimed at undergraduate science education. Provide faculty and students with access to original scientific materials. Studying science from original papers, research reports, data sets, etc. is fundamentally different from learning based on distilled materials, such as textbooks. As the volume of scientific information that is crammed into undergraduate courses has grown, universities have moved from the ideal of a liberal education in which students explore a subject through reading original materials to heavily structured curricula. Recently we have seen a trend, at least in some universities, that is partially reversing this direction by encouraging students to carry out independent work, which requires easy access to the source materials of science. Independent work requires good libraries and a digital library has much to offer. Provide faculty with materials used in preparing courses. Preparation of a good course is extremely labor intensive. Faculty need ways to discover and evaluate educational materials and scientific source materials. They also need access to curricula, course notes, problem sets, etc. The better the services that are provided to faculty, the more they are able to build on the successes of others, and the less likely to use inappropriate materials or to re-create materials. Provide communication among faculty and students. Communication can be within a university or college, or across organizations. Many faculty, particularly in small colleges, are quite isolated. Networked services, such as bulletin boards and the World Wide Web, develop a community where they can cooperate in both education and research. In a similar manner, students can interact with others from around the world. There is continuing development in collaborative tools that allow faculty and students to distribute their work to others, including annotations and comments. Deliver specific educational materials. An increasing variety of educational materials are intrinsically digital. They include computer programs, data sets, various categories of multimedia items, etc. Computer networks and digital libraries provide a cost-effective way to store, retrieve, and deliver these materials. One topic has been deliberately left out of this list, reflecting a personal bias. Because of a combination of technical and economic issues, my instinct is not to focus on using the digital library as a substitute for traditional textbooks. Computer networks have long proved to be an effective way to deliver course notes and other supplementary materials, but textbooks and courses built on textbooks are so closely tied to the strengths of printed volumes that they are difficult to migrate to digital libraries. The Technology of Digital Libraries Assumptions The following are my basic assumptions about the proposed library. This is a digital library. Although materials will sometimes be printed by the user, and some materials may be available on CD-ROM, the focus is

OCR for page 59
--> on materials that are created and stored in digital formats, and transmitted to the user over the Internet. It will be a virtual library. This will not be a conventional library in that it will not acquire and store all its materials. The digital library collections will be managed by many organizations, with materials stored on many different computers. Three models of delivery of information to faculty and students are possible: (a) directly from the originator of the materials (e.g., a publisher), (b) from a service center at the educational establishment (e.g., a library or media center), (c) from collections maintained by the national digital library. The library will contain both proprietary and public materials. Many of the best educational materials are created by companies or individuals who wish to be paid for their efforts. However, as the World Wide Web has shown, there is also an enormous quantity of high-quality material that is made publicly available at no cost. In some areas of science, large amounts of scientific source material are available online with no restrictions on access. Faculty and students will be able to interact with the collections. In a traditional library, it is a serious misdemeanor to write on the books or otherwise alter the collections. In a digital library the collections can be dynamic. People can annotate the materials, or link them to others; some materials are programs that students can execute or interact with; others can carry out computations, simulations, searches, or other actions on behalf of the user. A Possible Technical Framework Today's remarkable growth in digital libraries results from the maturing of several technologies: personal computers, the Internet, the World Wide Web, and protocols for searching online databases. Major areas where technical barriers remain include: interoperability among disparate systems, user interfaces, authentication and security, archiving, real-time and other non-static media, copyright management, payment for services, and searching vast amounts of information. In each of these areas, there are adequate short-term solutions, supported by extensive research and development. Hardware costs and performance continue to improve rapidly. There are no fundamental, technical barriers to the development of digital libraries for scientific education. A rough technical outline might be as follows: The digital library will be built on the Internet. Almost every university and college now has a good connection to the Internet. Faculty and students working at home can dial-up to their university or connect through an Internet service provider. All protocols will be based on the TCP/IP suite. Users will have a standard personal computer (PC or Macintosh) running widely available software. For the foreseeable future, the user interface will be a Web browser, such as Netscape Navigator or Microsoft's Internet Explorer. The library will select a specific set of standard formats and protocols. The aim will be to follow the technical mainstream as it evolves with time, but the library will probably need to provide some additional software to handle special formats, authentication and payment, and identification of materials. These will be provided as applets, plug-ins, or other extensions that can be installed over the network. Materials in the digital library will be stored on a variety of servers. The collections will be managed by a variety of organizations including universities, publishers, and libraries. With a large-scale library, where collections are maintained by many organizations, it is naive to believe that all the computers will be equally up-to-date or run the same protocols and formats. The library must accommodate the problems that are associated with heterogeneity. Today, many of the servers will be HTTP Web servers, but there will also be servers based on other protocols, such as relational databases (SQL), and Z39.50. Object-oriented systems using IIOP may be the next important development. Interoperation among such systems is not easy but can be achieved by adopting suitable formats and protocols. (The Stanford University Infobus project has done good work in this area.) Materials in the digital library will be entered into a registry. The registry is a centrally managed list of materials that have been selected for the library. The registry contains information about each item, but not the item itself. The information includes an identifier, a digital signature, the location of the material, and perhaps indexing information and annotations. (CNRI has developed a registry for the U.S. Copyright Office and is planning to deploy a modified version in other library applications.)

OCR for page 59
--> and elementary curricular needs, especially when funding is limited. If this aspect of a proposed SME&T library is to be adequately broached, the following questions might well need to be raised. Who will set the goals of what the national SME&T digital library should contain? What information user groups needs will be served, and in what order of priority? The Idea of a Library—Value-Adding Roles It seems incumbent that the library proposed here be invested with a viable administrative and operational infrastructure as well as a viable technological infrastructure. As already implied, a library (any library, including one that is "digital") is more than a warehouse or dumpsite of information that is merely delivered. It is more than simply a collection of inert packages of information represented by bits and bytes that are merely shuffled about. It is more than a publishing venture at the head of which is an editorial staff that can decide what to publish or send off merely by market studies. In short, it is not merely a disinterested operational structure with some sort of a simple delivery system for the people who come to it. It is instead a complex operation of selecting, acquiring, organizing, delivering, advising about, etc., information which adds value to the information included in it at every possible point. The foregoing is the experience learned from approximately 130 years of the "modern" library of print information-bearing entities, the only kind of library that virtually anyone now working on digital libraries of any kind have ever experienced but of which most are not much aware. Were the library being created for undergraduate SME&T education to observe this normative, "enriched" idea of a modem library, the following questions will need to be addressed. What value-adding activities must be provided for in the proposed national SME&T digital library? Given the answer to question 1 above, what sort of an administrative infrastructure needs to be provided for the library? Technical Issues—Interfacing with Information Use Styles It seems obvious that the national SME&T digital library proposed here will encounter a wide range of information use styles, some of which depend on finding bits and pieces of information-bearing entities useful for one's current information need often in some practical or utilitarian way, but others of which focus on identifying and "reading" whole information-bearing entities so as to interact with entire idea-sets of the creators of those entities. In short, sometimes one needs little more than a character string from a text, or from a database, or simply an illustrative photograph, and so on, and that is quite enough. It would appear that this kind of information use is particularly amenable to quick and dirty (or even cleaner, more structured) indexing devices. And it would seem to be served well by some of the more recent ideas for the creation of intelligent agents. However, others kinds of information seeking do not simply assemble bits and pieces of information, but need and even revel in the retrieval of and interaction with "whole" information bearing entities. Such objects are sought not simply to solve some immediate problem but rather to augment and even to reconstruct one's own thoughts and emotions in some creative way. This kind of need requires more than an Alta Vista type search engine. It needs, in fact, the plodding, labor-intensive results of cataloging, where information-bearing entities are not simply categorized for searchers to find potentially useful groups of items, but also carefully described so as to identify them uniquely and thus to promote efficient "known-item" searches. It needs, in fact, the capacity for a person to look for and precisely find individually created "works" even when they are buried in other collections of information-bearing entities. Should this wide expanse in information use be recognized in the proposed library, then some attention must be paid to the following questions. What provisions will be made in the proposed national SME&T digital library for listing information-bearing entities "included" in the library (as well as the "works" they contain) in such a way that such entities and works can be specifically found?

OCR for page 59
--> Given the answer to question I above, what attempts should be made to adhere to nationally adopted standards of cataloging and indexing? The Library as an Intellectually-Structured Space One mark of a library, regardless of whether it is of the traditional kind or digital form is that the materials that it includes are organized in terms of an intellectually cohesive and structured "space" (Miksa and Doty 1994). Much is made these days of "intelligent agents" and automatic indexing and retrieval devices that will somehow remove the bottleneck of human intervention in the information storage and retrieval process. The quest for this kind of automatic approach to human information organization and retrieval began with the beginning of the computer revolution and has tended to be kept alive especially by the aforementioned "inert stuff" view of information. But, if the history of the West has any lesson it is that information organization needs an intellectual framework (knowledge structure) to achieve its greatest impact in a given cultural context. Further, applying such a structure to massive collections of information-bearing entities is a labor-intensive human endeavor that has not yet been successfully made into an automatic routine. (It may someday be accomplished and therefore attempts to solve this trenchant problem should not cease. However, it has not yet been accomplished and that places a particular burden on anyone planning a library of any kind.) Over the centuries a variety of knowledge structures have been imported into information organization so as to make them into a rational realm for searching and discovery. Although it might be difficult to make a case for applying any given current knowledge structure to the library envisioned here, a case can be made that some such structure is needed. Such a structure need not be rigid, at least from the standpoint of an individual's own homepage base for collecting information links. But, one must have a point of departure for retrieving information from a library for many kinds of information searches and that point of departure will ultimately incorporate knowledge structures. Were the library proposed here to pay attention to this most basic social and cultural need of information organization, the following questions might well be considered. What kinds of information retrieval search engines should be employed in the proposed national SME&T digital library. What recognition should be given to controlled vocabulary, structured searching environments, if any? References Breivik, Patricia S. and E. Gordon Gee. 1989. Information Literacy: Revolution in the Library. New York: American Council on Education. Dervin, Brenda. 1976. "The Everyday Information Needs of the Average Citizen: A Taxonomy for Analysis." In Information for the Community , ed. By M. Kochen and J. C. Donohue, 19-38. Chicago: American Library Association. "Development of Technology Integrated Learning Environments: A Report of the Multimedia Instruction Committee, Spring 1995." The University of Texas at Austin. Available at: <http://www.utexas.edu/computer/mic/>. Farmer, D. W. and Terrence E Mech, editors. 1992. Information Literacy: Developing Students as Independent Learners . San Francisco: Jossey-Bass. Instructional Technology Connections. (Website) University of Colorado at Denver, School of Education. <http://www.cudenver.edu/~mryder/itcon.html>. Miksa, Francis. 1987. Research Patterns and Research Libraries. Dublin, Ohio: OCLC. Miksa, Francis. 1989. "The Future of Reference II: A Paradigm of Academic Library Organization." College and Research Library News 50 (no. 9, October): 780-90. Miksa, Francis. 1996. "The Cultural Legacy of the 'Modern Library' for the Future." Journal of Education for Library and Information Science 37 (no. 2, Spring): 100-119. Also available at: <http://www.gslis.utexas.edu/faculty/Miksa/modlib.htm>. Miksa, Francis and Philip Doty. 1994. "Intellectual Realities and the Digital Library." Proceedings of Digital Libraries '94: The First Annual Conference on the Theory and Practice of Digital

OCR for page 59
--> Libraries. Eds. J. L. Schnase . . . [et. al], pp. 1-5. College Station, Texas: Hypermedia Research Laboratory, Department of Computer Science, Texas A&M University. Also available at: <http://abgen.cvm.tamu.edu/DL94/paper/miksa. html>. Wilson, Patrick. 1977. Public Knowledge, Private Ignorance: Toward a Library and Information Policy. Contributions in Librarianship and Information Science, no. 10. Westport, Conn.: Greenwood Press. Wilson, Patrick. 1983. Second-hand Knowledge: An Inquiry into Cognitive Authority . Contributions in Librarianship and Information Science, no. 14. Westport, Conn.: Greenwood Press.

OCR for page 59
--> The Case for Creating a Systematic Indexing System for the National SME&T Digital Library Francis Miksa Professor Graduate School of Library and Information Science The University of Texas at Austin Joan Mitchell Editor Dewey Decimal Classification OCLC/Forest Press Diane Vizine-Goetz Senior Researcher Office of Research On line Computer Library Center (OCLC) Dublin, Ohio Abstract A case is presented for creating a systematic indexing system for the proposed national SME&T digital library. Two sets of assumptions are provided as background, the first having to do with what is “included” in the library's collections, the second with typical factors related to indexing in general. Indexing is defined operationally in a very general way, as making available and using for information searches in the library the attributes of information-bearing entities which the library identifies as members of its collections. The main features of a systematic indexing system include a controlled vocabulary for topical and formal attributes of information-bearing entities, a taxonomic and faceted structure (with notation) of the concept terms that shows relationships among terms, and an alphabetical index to the structure. The idea of the system is illustrated by reference to the Dewey Decimal Classification. A rationale is provided. Its two major loci are how the system supports the undergraduate educational process, and how the system supports searches for materials in topical areas. Finally, after problems are presented for implementing this system are given, questions pertinent to the issues are listed. Background For the sake of presenting our approach to indexing the national SME&T digital library we will begin with two sets of assumptions—one that concerns the nature of the library's collection, the other with factors related to indexing in general. First, we assume that the proposed national SME&T digital library will "include" graphic and textual information-bearing entities such as texts, audio and graphic files (or combinations of such entities in the form of multimedia files), databases, websites (which contain still other collections of information-bearing entities), etc., in its "collections." Here, "include" means that such entities are purposefully and intellectually included in what the library considers its realm; and "collections" refers to the sum of such entities included in its realm. It is understood, of course, that in the context of a digital library, "includes" essentially means available in electronic format through telecommunications links. Second, we assume certain things about indexing itself. Operationally speaking, indexing the national SME&T digital library simply means making available and using for information searches in the library the attributes of the information-bearing entities that the library identifies as members of its collections. This is a very broad interpretation of indexing which includes the widest possible range of systems. Thus, a library catalog is considered an index to a library collection just as a more specifically named indexing service constitutes an index of the periodicals and other items which it includes in its purview. When this operational goal is implemented, the form that indexing takes is controlled by various basic factors. Some of the most important of these are shown in Table 1. The implementation of each of the factors listed should be viewed as ranging along a continuum that begins with the statement in column A and proceeds in the same row to column B. For example, an indexing system might include only carefully assigned attributes as found in 1A, or it might include all naturally occurring attributes identified in the entities by some automatic algorithm as designated in 1B; but likely as not a typical system will include some combination of attributes from the two sources. Likewise, an indexing system might carefully segregate kinds of attributes according to

OCR for page 59
--> TABLE 1. Controlling Factors in Indexing Individual Factors A. One end of a continuum B. The opposite end of a continuum 1. Source of the Attributes Attributes are devised conceptually and assigned to the entities Attributes are naturally occurring, such as terms, or audio or visual features found in entities, and are used in the form found 2. Relationship of Attributes to the Entities They Represent Attributes represent the entity as a whole, or totally (Exact, specific match) Attributes represent part of the entity in extent or only in terms of some measure of frequency of appearance 3. How Kinds of Attributes Are Handled Kinds are commonly segregated according to function in relationship to an entity—e.g., subject, form, authorial, producer/publisher, etc. Kinds are not always distinguished but are rather treated as key terms or key features, mixed and matched. 4. How Relationships Among Attributes Are Handled Relationships are handled formally according to a conceptual schema. Relationships are handled automatically by clustering, set-theoretic routines, etc. 5. How the Number of Attributes Per Entity Are Determined Number attributes used are often predetermined by kind and restricted Number of attributes used are usually determined by algorithm 6. The Point at Which Attributes Are Compiled or Used for a Search Attributes compiled prior to any given search and without specific reference to a given search Attributes are compiled upon a request being initiated by searching through the entities in a file (Note: The list of controlling factors can doubtless be augmented and some of the individual factors might be appropriately subdivided into parts. However, for the purposes of the argument, the ones listed seem sufficient. Shaded boxes represent factors basic to the kind of system advocated in this paper.) some tradition as when traditional library cataloging carefully segregates a name functioning as an author of a document from a name that functions as a subject of a document. Or, again, an indexing system may simply intermix all such functions as in searches made by AltaVista on the Internet. Likely as not, however, a planned indexing system will segregate some attributes from others in order to make the system function more efficiently. We include this table first of an in order to provide a general framework for considering various important aspects of indexing when considering how the national SME&T digital library might be indexed, and also to offer a way to distinguish existing indexing approaches. With respect to the latter, for example, traditional library catalogs as they evolved from the late nineteenth century to about the 1950s can most readily be associated with column A of the table. However, as library catalogs have migrated to a computerized context, they have tended to move in some respects toward column B. This is especially evident in various efforts to enhance controlled vocabulary subject heading systems in online public access catalogs by automatically incorporating natural language keyword searching on terminology used in the bibliographic records for individual entities listed in such catalogs. One thing is certainly true with respect to indexing that follows many of the provisions of column A, that is, that it tends to be labor-intensive and, therefore, costly. In contrast to the foregoing, much of the research done in the realm of information storage and retrieval systems over the past four decades has tended to be identified with column B in the table (cf. Belkin and

OCR for page 59
--> Croft 1987). One reason for this is that the provisions of column B are strongly related to tapping the computer's capacity to engage in automatic routines. This has generally been viewed as a necessity in order to break through what has been considered the labor-intensive and costly "bottleneck" of indexing under the provisions of column A. Recent efforts to combine derived indexing methods and the information ordering capabilities provided by established classification schemes are being reported with increasing frequency (Programming Systems Research Group, 1996; Koch and Day, 1997; Thompson, Shafer, and Vizine-Goetz, 1997; Weiss et al., 1996). The second reason for including the foregoing table of indexing factors is to provide a framework for identifying what this paper advocates—that is, that regardless of any other indexing approaches which might be taken for the national SME&T digital library, one that should be seriously considered is indexing the library according to a systematic, logically related structure of controlled vocabulary index terms for the topical and other relevant aspects of the information-bearing entities included in the library. This kind of a system will adhere at a minimum to 1A in the table (controlled vocabulary), 3A (for topic, form, etc., attributes), 4A (a systematic taxonomic structure of term relationships), and 6A (a predetermined structure), with extensions into column B on factors 2 and 5. (See shaded areas in the table.) In short, we advocate the creation of a multiple entry classified index for the library. What remains here is to briefly describe such a system and to provide a rationale and other considerations regarding it. A Systematic Indexing System A systematic indexing system of the kind envisioned here will adhere to the following provisions. It will contain a set of controlled vocabulary concept terms which are assigned to each of the information-bearing entities in the national SME&T digital library—as many for each item as are necessary to highlight useful aspects of each entity—and which are expanded as needed for new entities added to the library. Such concept terms should feature the following attributes of the information-bearing entities when appropriate: topicality of the entities (i.e., "aboutness" attributes) formal aspects of the entities (i.e., such attributes as "genre," medium, arrangement, formal digital characteristics, etc.) other formal aspects of the entities (i.e., those related to "of-ness" of items such as saying what a graphic is "of" rather than "about," or those related to the "for-ness" of items such as saying that an entity has been created “for” such and such an audience or purpose, etc.) The concepts so assigned are then arranged in a taxonomic order with heavy emphasis on "faceted" structures such that both indexers and those searching for information-bearing entities with particular attributes of these kinds may be able to use the system as an aid—for indexers in assigning concepts to new items, and for information seekers when constructing search algorithms. Faceting here means grouping like attributes in "families" (not unlike the particular values in any given field in a database) that are highly adaptable for multiple use in different sections of the structure. For the purposes of ease of use, a notation of the system should be attached to the concepts that will "express" the relationships of the concepts and be available as a shorthand way of referring to parts of the system. An alphabetical arrangement of the concept terms (i.e., an index to the systematic structure) should be maintained in order for indexers and information searchers to gain access to starting points in the systematic map of concept relationships, but also for searching independent of that structure. A moment's reflection will show that what is actually proposed is similar to what in the past has been called a "classified catalog." Classified catalogs consisted of three parts: 1) a listing (numerically by notation from the system) of entries representing items in the system in their classified order (any item being represented by as many different notations as necessary), 2) an alphabetical listing of terms used in the system, sometimes with inverted index references to the entries, and 3) an alphabetical listing of items in the system by author, title, etc. Classified catalogs were almost always made as manual systems. More recently, as online public access

OCR for page 59
--> catalogs (OPACs) have begun to provide access to items by their library classification numbers, some semblance of classified catalog arrangement has been achieved. It is limited, however, because it generally does not provide multiple representations of any particular item in the system under different class numbers. The foregoing brief sketch for indexing can be illustrated by envisioning the use of a system such as the Dewey Decimal Classification (DDC) for indexing the national SME&T digital library but with certain variations of the system as now constructed and typically used. The DDC in its present form is a systematic, logical structure of concepts that are assigned to items in a library by attaching the notation representing each concept or combination of concepts to the items. Its structure of concept terms is highly developed, having been modified constantly by including new concepts, modifying old ones, and restructuring concepts over many years by means of a strong, centralized editorial process. It has adopted faceted structures in various places in the system, has a reasonably thorough index of its concepts, and has many other features that cause it to be one of the best such systems for information retrieval available. What is envisioned here for the national SME&T digital library is using a system like the DDC to index the information-bearing entities that the library includes in its collections. Multiple index terms or term combinations (represented by classification numbers) would be assigned to each entity for each of the various categories of terms noted above. As a result, those who need to search the library will have both the structured system and the alphabetical arrangement of terms available as a way to search the system. In addition, the structured system will also serve as a map of the categories in the system quite apart from specific search needs (cf. Cochrane and Johnson, 1996; Bendig, 1997; and more generally, Iyer and Giguere, 1995). The purpose for invoking the DDC is not to champion that system in particular, as excellent as it has become, but rather simply to use it as an example of what is meant here. All things being equal, even the DDC in its present state does not yet have all the requirements for fulfilling the goals outlined here, although it has great potential for being able to do so ultimately. For example, the DDC does not have a fully controlled vocabulary of concept terms and does not always differentiate completely between the various formal and other attributes of entities which were described above. It also does not yet use faceted concept structures to the fullest extent possible although these are being incorporated at an increasing rate under the present editorial direction of the system (Mitchell, forthcoming). Finally, the typical application of the DDC in libraries generally follows a "single-entry" approach, where each information-bearing entity in a collection is generally assigned a single concept statement from the system. This follows the common use of the system as a device physically to arrange library items rather than to index them thoroughly. One Internet-based exception is OCLC's NetFirst database which provides access to Internet and Web-accessible information-bearing entities through multiple classification numbers assigned to an entity. (Vizine-Goetz 1997a) Nevertheless, the DDC is especially adaptable for the present case, and it especially is adaptable for use in an indexing environment with a layered approach to access. For example, keyword access to information-bearing entities in the national SME&T digital library will be one way to approach its indexing needs. However, most people are doubtless aware of the weaknesses of the straight keyword approach. One question this raises is how to blend the keyword approach with the context and relationships provided by the structured approach to improve retrieval. In the SME&T library, we assume an increasing number of items may be available in digital form. This offers an opportunity to present a layered approach to information retrieval that in many ways represents previous approaches, but in a more efficient manner. Say we have a textbook on machine learning. A general textbook on machine learning is summarized in Dewey under the number 006.31, and in the Library of Congress subject headings by the phrases Machine learning and Computer algorithms. In the index to the book, there is no mention of computer algorithms, but many examples of specific algorithms which may or may not be known to undergraduates. "Machine learning" has just a few entries in the index, but it is the central "about-ness" of the book. An undergraduate may be looking for algorithms for machine learning, with or without knowing the specific name of one. The

OCR for page 59
--> summarizing function of the DDC number and subject headings brings one to a promising initial set of documents, the general texts on machine learning. Once in this set, the browse could then move to a keyword search of indexes (back-of-the-book) and browse within those indexes to find a particular algorithm (e.g., backpropagation algorithm). A bottom-up approach would also work within the same structure—a large keyword retrieval could be sorted and summarized by category using the structure and relationships provided by the DDC and controlled vocabulary. A look at the OCLC NetFirst database will help to illustrate this possibility. Using the hierarchical structure of the Dewey Decimal Classification, a NetFirst user can select from subject categories (such as health, home, technology), topics (such as health and medicine) and subtopics (such as diseases, preventive medicine, and public health) to reduce a results set numbering nearly 14,000 to a more manageable set of 249 records. Further refinements in searching can be achieved by combining one or more terms with DDC topic categories. For instance, a NetFirst user interested in finding electronic resources containing information about health concerns for travelers can browse to the second level topic health and medicine under the category health, home, technology and then search for items in this topic area about travel and tourism. Browsing and filtering the database records in this way (using the structure of DDC but not its class numbers) enables users to retrieve relevant items that may not be as easily discovered using traditional keyword searching capabilities. In this case, a keyword search for health and (travel or tourism) retrieves 143 items; a similar search filtered by DDC topic area retrieves 25 items, with several potentially relevant items included on the first page of the results display (Vizine-Goetz, 1997b). Rationale The rationale for indexing the national SME&T digital library with the kind of systematic indexing approach outlined here resides chiefly in two assumptions about how such a library might be used, the first assumption having to do with the educational support the library is intended to provide, the second, having to do with efficiency in searches which focus on surveying an area of knowledge. Educational support We assume that the focus of the national SME&T digital library, being supportive of undergraduate education in science, mathematics, engineering, and technology, will need a capability for searching that enhances the ability of undergraduates to engage in the personal exploration of ideas, and that given this need, the indexing system of the library will therefore need to include a broad range of information search types. We illustrate this broad range of search types by referring to two parts of the taxonomy of kinds of knowledge-information "uses" found in Fritz Machlup's work. He outlined five kinds of knowledge-information uses, of which the first two kinds have special relevance here—the "instrumental" or "practical" use of information on the one hand, and the “intellectual" use of information on the other hand (Machlup, 1980, 107-9; cf. Miksa, 1985). The first of these two types focuses on the need for (and, therefore, the search for) very specific information found as a result of very specifically defined information searches. This information is often needed quickly, and it is generally needed in order to complete some task, make some decision, etc. This kind of information use and search is predicated in turn on one knowing exactly what is needed and the capacity to generate an information search that precisely meets the information need. It is certainly basic to known-item searching for library items about which one knows some due about its attributes and which one pursues because of the expectation that the item will fulfill one's information need in some fashion. This kind of information use is also basic to searches on topical terms for very specific topics differentiated from other closely related topics. This approach to searching is basic to many of the information storage and retrieval systems created over the past four decades and especially to systems created to serve scientists and other educated researchers who one supposes know when they have information needs and have some skill in stating precisely what they want or need in the way of information. We assume that while the undergraduate education supported by the national SME&T digital library will necessitate this kind of searching on the part of undergraduates some of the time, the second important type of knowledge-information use designated by Machlup and its corresponding kind of search

OCR for page 59
--> type will play an equally if not even more important role. Machlup's second kind of information use, which he called the "intellectual" use of information and which he associated with information gained in some repose for more general educational purposes rather than for specific instrumental ends, is a much less specific approach to information need and searching. It actually amounts to a kind of exploratory approach to information where information is surveyed by categories in a manner that has great likeness to mapping knowledge, often for little more than one's personal satisfaction. Its main emphasis is the mental exploration of ideas and is characteristically associated with the browsing done by students in the stacks of a library where books on various topics are surveyed according to the progression of topics they represent on the shelves, books being pulled and examined often sequentially, with topical hints and ideas coming in a flood from the books themselves, from their association with other books along side them in the same category, and from differences with books in nearby categories. We assume that this kind of information use and, by extension, information searching, is especially relevant to the national SME&T digital library as a support for undergraduate education insofar as that education will emphasize the exploration of ideas in the form of personal research and exploration rather than the directed research of seasoned researchers in creating new social knowledge. It is precisely this kind of intellectual activity, in fact, that produces the kind of thinking that appears to be fundamental to the national SME&T digital library idea. If our assumptions about information use are accurate, and we believe they are, then an indexing system is needed for the library that will support this kind of information use and searching as well as the instrumentally precise kind of information use and searching described above. In this respect we conclude that a systematic approach to indexing the national SME&T digital library of the kind we propose will support this need very directly in a way that no other indexing approach can. Our proposed system will do so because it "maps" knowledge categories into a logical structure, and given a system in which such a knowledge structure is available, will promote this kind of information searching to the undergraduates who use it. It will promote and facilitate, in other words, the kind of browsing or exploratory searching described here. As a caveat, it should be noted that the "mapping" of knowledge relationships in the sense meant here is not designed to be some ultimate and absolute set of knowledge categories and their relationships, but rather merely a beginning point for an information seeker's own personal mapping of knowledge. In short, any such structure constitutes no more nor less than a starting point, concluding that it is in the nature of this kind of mental activity to use such a structure to build one's own personal knowledge structure, redefining and extending the relationships one begins with and which are found in such a structure as needed and not simply absorbing the given structure as absolute. The basis for doing so, however, is that some such knowledge structure is available as a beginning point and that one has the capability of browsing through such a structure with both guidance in its use but also with a good deal of freedom (Miksa, 1997). Efficiency in Surveying Information The second reason why a systematic indexing system of the kind proposed here will be useful for the national SME&T digital library has to do with a certain kind of usefulness in searching that is sometimes, but not always, needed in information retrieval but which is hard to come by in other kinds of systems—that is, searching for all aspects of a topic where the aspects are indexed under a variety of names. For example, given a search for various aspects of, say, the realm of Bryophyta, unless one were a seasoned researcher who already knew the classes of plants included in Bryophyta (for example, different kinds of mosses, hornworts, and liverworts) or such various aspects of the study of Bryophyta or any of its subclasses as anatomy, physiology, morphology, ecology, molecular and cellular issues, and so on, it would be much easier to find what a library of any kind had on the area were these all gathered systematically in one place in an indexing system. In short, it would be more efficient for one to see a concept map of the area than simply diving in without a clue about what is included trying to survey it. Searching for related topics such as these can be done by controlled vocabulary systems such as subject headings if a strong structure of narrower and related term cross-references are available, but such

OCR for page 59
--> cross-references ultimately must be derived from a systematic structure of the kind the system proposed here would supply as a matter of course. Not all searches are conducted with this goal in mind, of course, but where they are the system proposed here would expedite them with some efficiency. Other Considerations Having hopefully made a case for the need of the kind of systematic indexing system proposed here, we close by pointing out several difficult issues that must be considered in implementing such a system. The system proposed here is labor-intensive and, therefore, relatively costly to implement, as is any controlled vocabulary and concept-assigned system. However, there seems at the present time no alternative to it that would yield this kind of a system. Further, in order to implement such a system an organized, managed, and funded approach to the indexing process will be needed. Creating any systematic system will bog down if its goal becomes to create what could be called the "one best system" or knowledge taxonomy—a system considered to be "more correct" than any other system. We assume that all knowledge structures are ultimately artificial and capable of growth and evolution. Thus, what is needed is an emphasis on adaptability in such a system where the official version of the system can not only be easily modified, but can be used in whatever modified or "non-official" form one wants for the system without losing contact with the form in which the official version of the system is found. Some will claim that a systematic structure of knowledge categories arranged in someone's logical manner will be evidence of little more than what post-modernists such as Michel Foucault and others would consider the blatant exercise of power and authority in the intellectual realm so as to squelch intellectual dissent. We conclude that to the extent that any classification of knowledge categories is at base an information-losing process (i.e., by excluding alternative arrangements, at least in any "official" or basic version of the system), and that the purpose is to provide only one approach to knowledge structure, this objection has some merit. We also conclude, however, that the solution to the problem is not to avoid making taxonomic structures in the first place, or to argue incessantly about what is right or wrong about them, but rather to create a system with malleability sufficient to allow it to be arranged and searched in alternative arrangements, much like one can rearrange the reporting structures of databases. Questions We conclude with a list of questions for discussion that arise from the foregoing remarks. What indexing implications arise from the meaning of the assertion that information-bearing entities are "included" in the national SME&T digital library and, in fact, from how that process will function? What do the educational objectives underlying the national SME&T digital library yield in terms of the information search needs and patterns of the undergraduate users of the library? What other users of the national SME&T digital library are expected besides undergraduates in the areas of science, mathematics, engineering, and technology, and how does the expectation of the information use needs of these other information users impact on the indexing of the library? What combination of typical indexing factors are necessary and sufficient for the users of the national SME&T digital library? If the answer to question 4 consists of a layered approach to indexing, of what should the layers consist? What combination of typical indexing factors for the library is both practical and affordable? What alternatives to a systematic indexing system of the kind envisioned here are presently available for meeting the information use needs described in the "rationale" above? If a presently available system such as the DDC were used for creating a systematic indexing system for the national SME&T digital library, what changes might be recommended with respect to the system and how it is typically applied? Which persons or bodies would be given responsibility for indexing the national SME&T digital library? To what extent should the indexing needs of the national SME&T digital library provide a test-bed for indexing experimentation?

OCR for page 59
--> References Belkin, Nicholas J., and W. Brace Croft. 1987. "Retrieval Techniques." Annual Review of Information Science and Technology 22: 109-45. Bendig, Mark. 1997. "Mr. Dui's Topic Finder," Annual Review of OCLC Research 1996. Dublin, Ohio: OCLC. Also available at: <http://www.purl.org/oclc/review1996>. Cochrane, Pauline, and Eric Johnson. 1996. "Visual Dewey: DDC in a Hypertextual Browser for the Library User." In Knowledge Organization and Change: Proceedings of the 4th International ISKO Conference, 15-18 July 1996, Washington, D.C., ed. by Rebecca Green. Frankfurt/Main: INDEKS Verlag, 95-106. Iyer, Hemalata, and Mark Giguere. 1995. "Towards Designing an Expert System to Map Mathematics Classificatory Structures," Knowledge Organization 25 (no. 3/4): 141-47. Koch, T., and Michael Day. 1997. The role of classification schemes in Interact resource description and discovery . [Development of a European Service for Information on Research and Education (DESIRE) project report posted on the World Wide Web.] Retrieved July 29, 1997 from the World Wide Web: <http://www.ukoln.ac.uk/metadata/DESIRE/classification/class_ti.htm>. Machlup, Fritz. 1980. Knowledge and Knowledge Production. Vol. I of Knowledge: Its Creation, Distribution, and Economic Significance . Princeton, N.J.: Princeton University Press. Miksa, Francis. 1985. "Machlup's Categories of Knowledge as a Framework for Viewing Library and Information Science History." Journal of Library History 20 (Spring): 157-92. Miksa, Francis. 1997. The DDC, the Universe of Knowledge, and the Post-Modem Library. Albany, N.Y.: Forest Press, a division of OCLC Online Computer Library Center, Inc. Mitchell, Joan S. "Challenges Facing Classification Systems: A Dewey Case Study." In Knowledge Organization for Information Retrieval: Proceedings of the 6th International Study Conference on Classification Research, 16-18 June 1997. London. (Forthcoming.) Programming Systems Research Group, MIT Laboratory for Computer Science. HyPursuit Homepage. [Document posted on the World Wide Web.] Retrieved July 29, 1997 from the World Wide Web: <http://paris.LCS.MIT.EDU:80/Projects/CRS/HyPursuit/>. Thompson, Roger, Shafer, Keith, and Diane Vizine-Goetz. 1997. "Evaluating Dewey Concepts as a Knowledge Base for Automatic Subject Assignment." Paper presented at 2nd ACM International Conference on Digital Libraries, Philadelphia, Pa., July 23-26, 1997. Also available at: <http://purl.oclc.org/scorpion/eval_dc.html>. Vizine-Goetz, Diane. 1997a. "Classification Research," Annual Review of OCLC Research 1996. Dublin, Ohio: OCLC. Also available at: <http://www.purl.org/oclc/review1996>. Vizine-Goetz, Diane. 1997b. "OCLC Investigates Using Classification Tools to Organize Internet Data," OCLC Newsletter (March/April 1997): 14-18. Also available as: <http://www.oclc.org/oclc/new/n226/frames_man.htm>. Weiss, R., et al. 1996. HyPursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering. [Compressed file posted on the World Wide Web.] Retrieved July 29, 1997 from the World Wide Web: <http://www.psrg.lcs.mit.edu/ftpdir/papers/hypertex96.ps.gz>.