A Digital Infrastructure to Support Tomorrow's Research Communities
Research in the digital age requires a new kind of infrastructure—digital libraries and databases, access to networks, adequate communications bandwidth, supercomputers, and various support services. These resources are the basis of new kinds of worldwide research communities that are taking advantage of the richer communications possibilities offered by the global network. The elements of this infrastructure are growing cheaper and more widely accessible by the day. But they will not provide themselves. Traditional academic relationships, built around these resources, are likely to be altered by the spread of a digital research environment. New relationships, based on community of interest, are becoming the common currency of research.
The responsibility for providing, maintaining, and controlling the many elements of the research infrastructure remains to be determined. Will the authority of academic departments be eroded if researchers learn to value networks (national and international resources) and extended communities more than they value the crucial social organization of departments, schools, and universities? Will the increasing remote use of instruments reduce the power of academic institutions, as researchers organize themselves in virtual institutions centered
on major instruments and facilities? To the extent that these questions can be answered in the affirmative, researchers will experience important shifts in their fundamental professional relationships. Every researcher, in shaping a career, should be aware of this potential.
No one can answer these questions with certainty today. Change will come from the bottom up, through the efforts of individual researchers, seeking their own paths to knowledge and developing the tools and relationships to get them there. The young—because they have grown up immersed in digital media environments—will probably be the most prolific in devising unexpected new tools and techniques. This effort in total will be enormous and hard to monitor, much less predict. Much of it will take place invisibly, in dormitory rooms and computer labs. The applications of these tools will be constantly surprising. Some of them will change the world.
For institutions of higher education, overseeing this unpredictable surge of innovation will be constantly challenging. Unlike other forms of academic infrastructure, such as electric power and water, the campus information infrastructure of buildings and laboratories cannot be planned in detail, or in isolation from the needs of the end-users. The best institutions can do in this case is provide enough bandwidth and suitable points of connectivity, while leaving it to researchers, their departments, and their colleagues to configure the details of the local area network. For every university the investment required will be very large.
This large investment means that researchers at every level will have to be informed about choices so that they can guide their institutions in making technology decisions. The growing disparity in Internet access between rich and poor has been called the “digital divide” ( http://www.ntia.doc.gov/ntiahome/digitaldivide/). Vigorous efforts are being applied to cope with the economic, policy, training, and design issues ( http://www.digitaldivide.gov/ ), but persistent attention to these issues will be needed at every level (Shneiderman, 2000). The large disparity between advanced and developing nations also raises concerns about the capability of researchers to participate in international projects. Bridging this divide, nationally and internationally, will bring benefits by ensuring diverse participation in research activities and enabling previously marginalized researchers to contribute.
For researchers themselves, of course, these institutional
questions are not the dominant ones (although every researcher would be well advised to be aware of them). The important questions are those that bear directly on research and education. In pursuing their careers, young researchers will experience growing freedom to build relationships on the basis of community of interest—locally, nationally, and globally. It may even become possible to build rewarding research careers outside of the traditional academic structure. The most successful academic institutions will be those that most effectively facilitate the expanding communities of researchers, while providing the local conditions to attract the best faculty and students. Those conditions include stimulating colleagues, rich information resources, and departmental reward structures that are open enough to recognize innovation and creativity in nontraditional areas.
LOCAL (INSTITUTIONAL) INFRASTRUCTURE
Research institutions (colleges and universities, departments, and research laboratories) are responsible for local infrastructure (such as campus networks and libraries, personal computing resources for students, faculty, and staff, and access to specialized resources such as supercomputers or virtual reality environments). Academic departments generally provide instruments and facilities for research, and control their use. Overall, information technology to support faculty, students, and staff often involves 10 percent of the annual operating budget of a university.
But beyond the financial challenge is the challenge of providing the sophisticated computing environments that many young researchers demand. Information technology has become a strategic investment for universities, critical to the vitality of their academic missions and administrative services. The great diversity of researchers' needs will require an equally diverse network infrastructure. Some humanists will need access to digital libraries and graphics processing. Some scientists and engineers will need massively parallel processing. Social scientists will need the capacity to manage massive databases, e.g., data warehouses and data mining technology. Artists, architects, and musicians will require multimedia technology. Business and financial operations will need fast data processing, robust communications, and high security. Emerg-
ing areas of research will place their own demands on the university infrastructure.
Expertise and Support
Most young researchers are highly knowledgeable about computing and how to use it for their research. Universities provide basic networking and computing service, but little expertise in the use of academic applications. The extent of this support varies from institution to institution. Much of it is provided informally. Some of the formal support is automated, in the form of remotely accessible databases presenting lists of the answers to “frequently asked questions,” software setup scripts, and the like.
In most institutions, less central support is provided as time goes on, as computing expertise becomes a basic skill that every researcher has. The support that is available, to the extent that it is distributed, may be less accessible to users.
Faculty development programs will need to provide graduate students and faculty members with the research and teaching skills needed for success in a digital environment. While many young researchers are highly knowledgeable about computing and how to use it for their research, many senior faculty members lack the knowledge and skills to operate confidently in the digital environment, and find the marvelous new tools of information technology to be baffling sources of job stress (Science, 1999). Students are far ahead of the faculty in their understanding of digital tools. Few universities have devoted adequate attention or resources to the needs of faculty (in sharp contrast to the massive amounts spent on student needs). The Faculty Development Institute of the Virginia Polytechnic Institute and State University ( http://www.edtech.vt.edu/idi.html ) represents one serious attempt to provide those tools for faculty and staff. Others can be found at George Mason University, MIT, and Cornell University.
Personal Computing and Communication
Universities and colleges are considering to what extent they should be responsible for providing faculty and students with computers and communications tools such as cellular phones and PDAs. The costs involved in these decisions are substantial. For example, some may provide their staffs with
personal computers in both home and office, but require undergraduate students to supply their own. Some may provide home computers and ISDN or cable-modem connections, wireless telephones, and other devices. Others will regard computers as commodities, much like telephones or calculators, to be provided by the individual. The dramatic declines in prices of digital systems make it increasingly feasible to lay these responsibilities on the individuals (many of whom will also prefer to exercise their own judgments in choosing the systems they need).
It was once the norm for researchers to write or customize their own special-purpose computer software. This practice had the advantage of giving researchers a hands-on appreciation of the logical structures and appropriate applications of the software. On the other hand, for outsiders, results obtained through such special purpose software could be hard to interpret and were generally impossible to audit.
Over the past two decades, home-coded software has been steadily replaced by off-the-shelf commercial software packages for calculation, data collection and analysis, and data mining. These general-purpose packages are comparatively well documented and reliable. But many researchers have little understanding of the limitations and constraints of the software tools they use. Many of these users do not need to understand the operation of their tools in detail. For some it is vital; fortunately powerful software tools and high-tech languages are available to make this task easier.
Local Area Networks
To nurture the digitally mediated research communities of tomorrow, universities will need to provide students, faculty, and staff with robust, high-speed networks. Both Internet access to off-campus resources and “intranet” networks to link students, faculty, and staff are essential. While the processing power of computers is continuing to increase, it is often the growing bandwidth of communications that is more vital to researchers. Today, switched Ethernet at 100 million bits per second is common.
At some universities the information technology infrastructure is centrally coordinated. Other institutions have allowed it to evolve with few constraints, perhaps encouraging some units to serve as “islands of innovation,” able to look ahead, explore new technologies, and serve as pathfinders for the rest of the university.
A national infrastructure—long-distance communications, nationally significant instruments (many accessible by digital networks), software libraries, and high-performance computer facilities—might be needed. It is appropriate for these resources to be provided at the national level, by government, consortia of universities, or industry (or combinations of these institutions). They must be further developed, maintained, and calibrated.
Digital Libraries and Databases
The long-term impact of digital networks on academic libraries is impossible to predict. Libraries are in a state of sweeping change, in both technology and function. The most obvious sign is the proportion of library information that is now available in digital formats. Online catalogs and information services have almost totally replaced card catalogs and printed indexes. Over half of all scientific journals are online, with the proportion in other fields rising rapidly. Activities such as JSTOR and the American Memory project at the Library of Congress are digitizing large quantities of historic documents. Researchers in some disciplines can rely almost entirely on digital information. While only extreme enthusiasts envision a world where all library materials are online, the importance of digital information looks likely to expand indefinitely.
However, the changes are much greater than simply replacing traditional materials with digital equivalents. One major change is that a vast amount of information that is important to research is provided on the Internet, with access open to all (in open archives, research Web sites, open-access periodicals, and government repositories). In some disciplines, it is possible to teach advanced courses or carry out research using open access materials only. Open access has benefits to both the author and reader, in encouraging the flow of information. Many people
see it as the natural form for research information, but somebody has to pay for managing information and no general economic model has yet been established for providing open access to all materials. Authors have to decide whether to publish their research in an open access form.
Another major effect of networked information is that readers no longer need to visit a library building. Libraries are increasingly providing services to remote users. Often they are connecting users who never enter the library with online services provided by remote publishers.
Digital libraries are surrounded by considerable hyperbole. Advocates often imply that multimedia and virtual reality are replacing text as the medium of communication and that digital libraries in the near future will supply tools for navigating oceans of information, provide researchers immediate access to all relevant literature and allow them to search by concept rather that conventional text-based retrieval. All of these concepts have been demonstrated in research laboratories, but the real world is more prosaic. Deriving new knowledge from materials in libraries remains a painstaking, skilled task, whether the materials are digital or print.
New formats of information are emerging, albeit slowly. Networked databases are important in some fields, such as the Protein and Genome databases. Research papers are providing more of the raw data and intermediate results. Online collections, such as the Perseus Project's collections of classical and humanities materials, are rich in links that relate concepts at a much finer granularity than conventional footnotes and references. Surely, such new approaches to information will continue to grow.
Digital information provides new challenges to libraries and to researchers. Libraries are very conscious of the difficulties in long-term preservation of digital material, with its fragile media and ever changing formats. Digital information is so easy to change that materials go through countless versions and revisions; this leads to difficulties in citation. Although hard data is lacking, many observers sense an increase in academic plagiarism, made easier by digital information on the Internet.
Finally, digital libraries and electronic publishing have placed stress on the framework of copyright and patents, which facilitates the exchange of information while providing income for publishers and authors. Who should own the product of research: researchers, universities, publishers, or perhaps the
taxpayer who funds so much research? In the current state of uncertainty, every researcher needs to understand the basic concepts of copyright, both as an author and a reader. In most cases, authors can be guided in their decisions by asking themselves—and their colleagues and mentors—whether a proposed transaction, such as transfer of copyright, helps or hinders the underlying objectives of communication.
The Internet in its present form was the result of initiatives by the federal government, the technical development of the ARPANet, which began in the 1960s, and the national expansion led by the National Science Foundation in the late 1980s. Today, the federal government continues to support research in the technology of information processing and communications, but the Internet is almost entirely a commercial enterprise.
Funds for specific disciplines are provided through numerous agencies, including the National Science Foundation, the National Institutes of Health, the National Endowment for the Humanities, the Department of Defense, the Department of Energy, the Department of Education, and others. In recent years, Congress has voted considerable funds to support computing in education. Although there are continual national efforts to present these activities as a coherent national plan, in practice any coordination is informal. However, there is general agreement that the nation needs to pay attention to this need.
A multi-agency program (the National Coordination Office for Computing, Information, and Communications) has been established to coordinate federal information technology R&D. Among those activities is a proposed federal research and development initiative that would support (a) long-term research in software, interfaces, and high-end computing; (b) development and acquisition of newly powerful computer systems and associated software; and (c) research on economic, social, and work force implications of the Information Revolution ( http://www.cra.org/Policy/it2.html ).
Advanced Scientific Computation
Many important scientific and engineering problems are so computation-intensive as to be beyond the capacities of today's systems. These problems can be found in many fields, includ-
ing atmospheric modeling; simulation of black holes and general relativity; chemical reaction studies; cryptography; drugs by design; ecological analysis of dynamic systems; electron transport in complex media; fusion energy studies; genomics; geophysical and astrophysical turbulence; global systems simulation and modeling; high energy and nuclear physics; microbial interactions in soil, water, and vegetation; modeling metabolic systems; prime number mathematics; protein folding; quantum chromodynamics; subsurface transport; supernova science; and theoretical biology.
The federal government traditionally has supported the advance of computing by funding national supercomputer centers to support both particular government missions and the needs of the broader scientific community. It will continue to play these roles.
Online Access to Important Instruments
One significant use of the world's data networks and computational grids has been to enable remote access to a diverse array of sophisticated research instruments. Such “telescience” represents a new model for scientific collaboration and investigation and provides important challenges, advancing our thinking about new approaches to surmount limitations in several important areas. Telescience tools are now evolving to include the seamless linking of remote data acquisition with distributed computation-intensive analysis and intelligent comparisons of new data with data in repositories. A most demanding form of the telescience applications of the future will employ a type of computation-based steering, “closing the loop” of remote control of data acquisition, data refinement, and data analysis. This advance will enable the process of data acquisition to be improved, in more or less real time, by feedback from the analysis to enable intelligent or knowledge-based, remote steering of research or diagnostic instrumentation.
Automated sensors are available in a wide and growing variety (including high-resolution optical, chemical, and temperature sensors). Placed on oceanographic buoys, spacecraft, and many other platforms, these sensors output their data to computers on the global network, so that a scientist or engineer at a desktop workstation anywhere can obtain direct information about environmental or physical conditions anywhere, in nearly real time. The data thus acquired can be processed,
analyzed, and shared with colleagues. These instruments are doing the work once done by expensive oceanographic voyages, for example, but with better spatial and temporal resolution and vastly lower cost.
Another kind of remote access that the network offers is the ability to study human behavior and relationships in immensely greater detail, and in near real time. Psychologists are studying interpersonal relationships in electronic groups; sociologists and anthropologists are studying electronic group dynamics; economists are studying electronic auctions. This new tool raises ethical and methodological questions involving privacy, informed consent, protected populations, and other considerations.
Access to these new resources is controlled in the traditional manner, by the government agencies and other organizations that own them. Competition for their use may be intense. NSF's nanofabrication facility, for example, is available on a competitive basis to anyone in the country, in any discipline, for remotely specifying and building specific micro-electromechanical devices needed for research projects. It is always fully booked.
Preservation of Data
A new kind of data facility will be needed to archive the unpublished data underlying research, such as laboratory data and notes. Unlike laboratory notebooks, electronic notes and data are extremely perishable; their accidental or deliberate erasure may undermine the support for important research results. Furthermore, electronic notes do not bear the personal stamp of handwritten notes; techniques for electronic “watermarking” or “fingerprinting” of notes and other data will be needed to validate their sources. Secure electronic date stamping will also be needed to verify the sequence of events (and also to settle questions of research priority). Extensive work has been done on electronic notebooks, their legality, fingerprinting, time stamping, and other issues (see, for example, http://www.emsl.pnl.gov:2080/docs/collab/research/ENResearch.html ).
SUPPORT FOR THE GLOBAL RESEARCH COMMUNITY
Advancing information technology will enable a globalization of research. Collaborators increasingly often will be in separate countries, often continents apart. They will increasingly use remotely sited automated instruments and facilities. Standards for communications protocols, data acquisition and data processing software, and data preservation must be applied globally, to ensure that data and results are comparable, reliable, and verifiable. These tasks extend beyond the limits of national sovereignty. They will be carried out by the members of the research community (often acting individually), and sometimes through national governments as signatories to international technical conventions.
A New Infrastructure for Collaboration: The Collaboratory
Information technology's prime virtue as a builder of relationships is its ability to enhance teamwork. Researchers in many fields are experimenting with this aspect of the technology by building “collaboratories” (a portmanteau word, coined in the 1980s, combining the words “collaboration” and “laboratory”). In a collaboratory, the available digital tools are integrated to provide researchers access to scientific resources (such as instruments and shared databases), wherever they are located. A collaboratory, according to one of the concept's earliest promoters, is “a center without walls in which the nation's researchers can perform their research without regard to geographical location—interacting with colleagues, accessing instrumentation, sharing data and computational resources, [and] accessing information in digital libraries” (Cerf, 1993; National Research Council, 1993; New Scientist, 1997; Wulf, 1989). A collaboratory provides the technological basis for interaction among researchers, instruments, and data independent of distance.
The collaboratory approach, built around networks of distributed computers, is expected to change not only scholarly work, but many other activities involving human teamwork, including the arts, business, and education (Rosenberg, 1991; McDaniel et al., 1994; Edelson et al., 1996; Kouzes et al., 1996; Kilman and Forslund, 1997; Casper et al., 1998; Olson et al., 1998). Some form of collaboratory may be the appropriate infrastructure for the great universities of the future. Experi-
mental research on both the social and technical aspects of collaboratory has been sponsored by the National Science Foundation since the mid-1980's; more recently the Department of Energy ( http://www.emsl.pnl.gov:2080/docs/collab/ ) and the National Institutes of Health have begun collaboratory initiatives.
The most developed of the NSF-sponsored collaboratory projects is the Space Physics and Aeronomy Research Collaboratory (SPARC) ( http://www.si.umich.edu/SPARC). The SPARC environments allows far-flung teams to conduct real-time observational campaigns using hundreds of coordinated instruments (both ground- and satellite-based). Both the instruments and technical support experts are available on line. Computational models of the object of study (the magnetosphere, in this case) run in parallel with data gathering and help predict where to direct the instruments to the most interesting phenomena. The collaborative data gathering sessions through the collaboratory can be annotated, stored and played back for others to experience at a later date. Students participate in authentic projects in ways previously impossible due to constraints of travel and physical size of instrument rooms. Archival data and the relevant journal literature are also available. “Electronic workshops” occur between the data gathering campaigns to share insight about and interpretations of the experimental data and to collaborate on publications.
Other collaboratories include the following:
The NSF-sponsored Graphics and Visualization Center, founded in 1991, includes some of the most advanced work in virtual reality applications and shared virtual environments. In particular, the center's Telecollaboration Project aims to “develop a distributed collaborative design and prototyping environment in which researchers at geographically distributed sites can work together in real time on common projects.” Research is underway to provide both the hardware and software required.
The Cooperative Online Resource Catalog (CORC) project of the Online Computer Library Center (OCLC) is exploring the cooperative creation and sharing of metadata (literally “data about data”—data that helps identify, describe, and locate networked electronic resources) by libraries, through the involvement of libraries and other institutions throughout the world. Its aim is to help libraries cope with the huge amounts of material becoming available on the Web ( http://www.oclc.org/oclc/research/ ).
SOFTWARE AND COMMUNICATIONS STANDARDS
Since investments in academic activities are small compared to those in commercial or consumer requirements, it often is necessary for there to be agreement across the global research world if a service is to develop fully. Standards thus are essential, but often lag far beyond the innovations that continuously appear in the world of information technology. Researchers therefore must unify around standards that are necessary in their communities, and must develop educational programs to allow them to benefit quickly from emerging standards.
There are two kinds of standards: those that involve human users and those that operate between digital systems. The former should be changed very slowly, and only for very good reasons, to avoid frustration and wasted time (since the effort to learn a language is considerable). Those that smooth communications among machines are more straightforward, (but as we all know, it frequently is not done correctly).
One example of the former type of standard concerns documents. In 1985, SGML, the Standard Generalized Markup Language, became an international standard. Even after more than fifteen years, few researchers really understand it, and tools supporting it are too expensive to facilitate widespread adoption. However, two descendants of SGML, namely HTML and XML, have been widely exploited by the scholarly community as well as the broader commercial and consumer sectors, and are generally thought to be important mechanisms for interchange. Backed by the World Wide Web Consortium (W3C, see http://www.w3.org ), these two languages should be as well understood as are convenient word processors, since they allow document creation that enables long-term preservation as well as dual rendering (e.g., in electronic as well as paper publication forms).
Other standards efforts in machine-to-machine communication have shown considerable progress. Agreements on communication protocols have been at the heart of the success of the Internet. TCP/IP, along with other core services like FTP, Telnet, and SMTP (for mail), have made it possible for the Internet to flourish. The national standard Z39.50, and the corresponding international standard ISO 23950, enabled the WAIS service to flourish briefly just prior to the emergence of the World Wide Web, and have supported broad access to library catalogs. Simpler schemes have allowed federated
search across distributed information collections, as in the NCSTRL service ( http://www.ncstrl.org ) for computer science technical reports. This and related work on metadata, through the Dublin Core initiative, has enabled further sharing of scholarly resources through efforts like the Open Archives Initiative [Van de Sompel and Lagoze] ( http://www.openarchives.org ), which promises to move us toward universal access to broad segments of open literature. Researchers should partake of the new services stimulated by such efforts, both as readers and contributors, lest they miss out on the latest findings in their fields.
Other standards will no doubt emerge in key areas to support research activities. In the area of software development, “platform-independent” languages such as Java and Perl have made it easier to reuse and share information and tools. With so much research depending upon software, researchers will save enormous amounts of time if they make investments in purchasing reliable and relevant tools, or adopting free and helpful software (e.g., Linux, or that from the GNU project), as they pursue particular solutions.
INTO UNEXPLORED TERRITORY
The exploding technology of computers and networks promises profound changes in the fabric of our world. That goes for everyone, of course, not just researchers. As seekers of knowledge, researchers will be among those whose lives change the most. What these changes will mean, for academic work and the larger society, remains to be seen. Researchers themselves will build this New World largely from the bottom up, by following their curiosity down the various paths of investigation that the new tools have opened. It is unexplored territory. At the same time, the hoped-for benefits of these systems will depend on their being made available widely and equitably. That is a challenge that the community of researchers, working with public and private funders and regulators of research and technology development, will need to take on over the long term.