The Research Process in a Digital World
Digital tools can make the routine aspects of nearly every research task easier, from observation to publication. Computer systems have become more powerful and are used to create models of highly complex phenomena—such as cloud cover patterns. The proliferation of data on the world's networks is making available at a mouse-click information that has traditionally required hard, time-consuming work to gather. Digital technologies not only facilitate traditional research tasks, but also enable previously impractical analysis (i.e., being able to simultaneously retrieve information from diverse archives and to correlate it). Furthermore, it is increasingly feasible to provide network access to remote instruments such as telescopes, microscopes, and specialized manufacturing services (Roberts, 2000). Equally important, research results can be made publicly available online much sooner than by traditional means.
Through access to e–mail and desktop videoconferencing, researchers may now work more informally and quickly with colleagues. Traditional institutional boundaries (and loyalties) are blurring, as researchers more frequently engage in long-term projects with colleagues from other institutions and disciplines.
But these new digital tools—because they are so powerful and easy to use—can be misused by the unsophisticated or abused by the dishonest. They may offer new temptations for plagiarism and fraud. Researchers may find themselves overwhelmed by the massive volume of data on the networks as they seek ways to winnow sound information from nonsense. Protecting the integrity of research will require vigilance and ingenuity and probably the development of new technologies to enhance the security of data and prevent forgeries, use of false identities, and unauthorized changes to publications or data.
Just as important, researchers need better ways to select from among the proliferating sources of digital information. After all, searching for and reading an article takes time, a precious commodity. In the age of hard print, researchers had cues about what to read, such as tacit hierarchies among journals, but these tools have yet to be developed for electronic publishing.
Well-intentioned researchers may be seduced by the power of their computers, and substitute off-the-shelf software packages for their own careful analysis. (Powerful statistical software packages, for example, may be misused by people who do not understand the proper application of statistics.) Blind use of the powerful tools of digital technology, without regard to underlying assumptions, can lead to errors that are difficult to avoid and detect.
The tools of information technology, like other products of our industries, can often fall short of our needs and expectations. It is easy to be carried away with their great promise, but we need to temper our enthusiasm. In real life, we need to remember that human factors may dominate the impact and acceptance of electronic tools. Technology changes rapidly; keeping up to date can be expensive. In particular, the more complex systems for global communications and collaboration involve major investments in engineering and logistics (see, for example, National Research Council, 2000).
For purposes of discussion it is useful to divide the research process into several interactive activities: articulating a hypothesis, creativity and analysis, observation and inquiry, experimentation and simulation, data archiving and access, and publication and dissemination. For most researchers, digital computing and communications technologies are changing each of these activities. This chapter attempts to trace some of the
changes and to forecast the impacts on the lives and careers of researchers.
OBSERVATION AND INQUIRY
“Credit must be given rather to observation than to theories, and to theories only if what they affirm agrees with the observed facts.” (Aristotle, On the Generation of Animals, 350 B.C.)
Discovering facts remains the foundation of research. The new tools of information technology give every researcher increasingly powerful means of acquiring data, verifying it, and manipulating it in search of patterns. Researchers can now observe from the subatomic level to the outer reaches of the universe. Huge masses of data, collected for many purposes over many decades, are accessible online to researchers everywhere.
The proliferation and use of remote-sensing devices is adding to the flood of data that is reliable, sophisticated, and less costly. For example, digital sensors on satellites, aircraft, and buoys, collect weather and other environmental data that are processed and made available for downloading, accessible to anyone with a modem.
Sifting all this data in a timely way is an enormous challenge to researchers, but it is critical in order to maximize the value of the data collections. Computer systems that are powerful enough to browse files with hundreds of billions of bytes and display visually millions of pixels may also provide powerful tools for gleaning insights from massive data collections. One such system at the Jet Propulsion Laboratory enabled environmental researchers to examine detailed radar data of the entire Amazon basin and zoom in and out of areas of particular interest; previously they could examine data only on small areas of the region, one at a time.
These developments pose a challenge for researchers, since results must be replicable. It must be possible for other researchers to repeat an experiment and confirm the results. Software programs and hardware will need to be validated and documented, as will records of provenance and other items of “metadata” (data about data, especially about documents themselves and their histories), to ensure consistency and accuracy of data that is produced in one laboratory and used by another.
CREATIVITY AND ANALYSIS
Creativity for the researcher is the ability to approach the solution to a problem by using existing tools and information in new or unconventional ways and/or by creating new tools. Easier access to information and more powerful ways to manipulate it may speed up the creative process, but also may bias or redirect the process in ways that we cannot predict.
New tools also will make possible “brute force” exploration of possible problem solutions, and powerful techniques for composing, manipulating, and assessing alternatives. These techniques and the accompanying improvements in access to peers and mentors for consultation could make research a more sociable activity, some believe (for example, Shneiderman, 1998). At the same time, the ability to reach a colleague by electronic mail does not guarantee an instant collaboration. Many researchers find that they are swamped with requests for advice, help, and mentoring, and are simply unable to respond to requests from those they do not already know.
Information tools also may increase pressure to draw research topics more directly from observations, rather than predominantly from the curiosity of researchers. The sheer quantity of digital information about the world is sure to shape many researchers' choices in defining their future activities. This can be a daunting task, but the potential rewards are great. Information abundance can contribute to much progress in many fields if the information is well organized, tagged with descriptive metadata, and presented with appropriate textual or graphic displays. For example, there will be an increasing need to create compact textual displays that are controllable by users so that the data can be organized chronologically, clustered by source, ordered by relevance, and identified for trustworthiness. Even richer possibilities exist for visual displays that show trends in data over time, variations by source, gaps in normal patterns, or outliers that are significant. Exploratory visual “data mining” has proven to have powerful benefits in pharmaceutical drug discovery, digital libraries, and gene data presentation. Novel strategies include parallel coordinates or starfield visualizations for multidimensional data, zoomable timelines for temporal data presentations, and tree maps or hyperbolic trees for hierarchies. Rapid progress is expected as research prototypes are made commercial and embedded in scientific research systems (Card, Mackinlay, and Shneiderman, 1999).
For the researcher, the continuing rapid increases in computing power offers not only access to information but whole new ways of communicating with colleagues. While some researchers can remember the 100 bit-per-second modems of the 1960s, others are already using the gigabit-per-second networks provided by Internet2 and other network providers. High speed networks in the gigabit-to-terabit-per-second range and high resolution digital displays (tens of megapixels) will allow an increasingly high degree of fidelity in reproducing human interaction at a distance, from routine conversation to multisite conferencing. Telepresence and virtual environments enabled by high speed networks are likely to become important tools in supporting scientific collaboration.
Digital communications with other humans are advancing in step with new digital technology. Students and their mentors are in touch with other researchers, through e-mail, Web sites, and discussion lists. Through these means, online communities are flourishing.
Enlarging Research Boundaries
Research communities are enlarging their boundaries as the global research community is knit together by digital networks. Synchronously, researchers working in the morning in Washington, D.C., can interact with counterparts working late in the afternoon in Europe. Writing late at night, they can exchange ideas with colleagues in Indonesia who have begun the morning. Almost any request theoretically can be addressed in less than 24 hours by members of a research team no matter where they happen to be. Geographically distributed groups can now better collaborate, enabled by ready exchange of data, programs, videoconferencing, and software version control systems that function across networks.
The widespread adoption of the English language for international research communication, evolving technologies of machine translation, and multilingual information retrieval along with new digital tools for translating and retrieving information in many languages have also reduced barriers to collaboration. Most conferences have become international, with most of the submission and reviewing and an increasing portion of the discussion and selection aided by electronic tools.
These human communications are increasingly including voice and video, along with text. As sensors and tele-operated actuators improve, researchers may use interfaces that offer touch and action-at-a-distance. With the techniques of “virtual reality,” some users may soon communicate with one another through simulated environments, using “telepresence,” perhaps guiding their own software representations, or “avatars,” to interact in a virtual world with those of their colleagues (IEEE Spectrum, 1996). Collaboration is supported and mediated by information technology in many ways, and often coordinated by digital libraries.
Digital tools to support research collaboration can begin with basic tools such as e-mail, audio/video conferencing, and screen-sharing programs that are freely available. The next generation of synchronous and asynchronous tools will be tailored to the needs of specific communities such as environmental scientists using NASA remote-sensing data sets or physicists sharing complex and expensive real-time experimental equipment. Scientists could benefit from software tools to facilitate and speed scheduling, transform data to accommodate each other's needs, and even negotiate delicate questions of authorship.
Researchers will always want to meet in person. Still, as the capacity of electronic networks grows, the physical distance among collaborators may become less of a consideration. With the trend toward remote use of instruments, databases, and other resources, researchers may find that they identify less with their home institutions and academic departments (which have traditionally been responsible for supplying and maintaining the facilities and infrastructure of research), and more with their far-flung collaborators.
Blurring Disciplinary Boundaries
Some people argue that the traditional boundaries among disciplines also are blurring. The global scope of researchers' information networks works against narrow specialization. Many researchers in the traditional academic environment build careers within their disciplines or subdisciplines. Will the disciplinary specialist be relevant in the future, when many new, interesting and significant problems will require a multidisciplinary approach? Some say the need for specialized knowledge will fade as researchers are able to roam through
networks that contain the knowledge of the world, effortlessly extracting whatever they want to know.
On the other hand, digital technologies can facilitate contributions to multidisciplinary projects from a number of diverse specialists. Multidisciplinary modeling and simulation is becoming more tractable though emerging techniques for integrating discipline-specific modules into an overall simulation program, for example, those that link chemistry and fluid flow modules to model the behavior of combusting materials. And the Internet—with its capacity to help members of small groups find one another and sustain themselves as communities—may increase the academic tendency toward specialization.
EXPERIMENTATION AND SIMULATION
Digital technologies make it possible to examine the physical world at an increasingly fine scale, both in space and time. They also make it possible to simulate the world in more lifelike ways, for example, in modeling the complexity of natural systems such as climate. Researchers have direct access to increasingly sophisticated instruments, and to the archived data from those and other instruments.
Simulation: A “Third Modality”?
In fact, many researchers speak of computation as the “third modality” of scientific investigation, on a par with theory and experimentation. For many years computers have been used to simulate natural phenomena, through statistical methods—such as the Monte Carlo codes used to analyze the transport of neutrons in fission chain reactions—or deterministic solutions of the equations of motion for complex systems—such as the molecules comprising a physical system or the stars contained in a galaxy. Today the extraordinary power of advanced scientific computers enables predictive simulation of complex phenomena directly from fundamental microscopic principles. Numerical simulations may tend to draw more closely together modelers, theoreticians, and experimentalists, because their tools (digital sensors, simulations, networks, and databases) are increasingly related.
Digital tools enhance our understanding of natural phenomena. They do not reduce the importance of reliable experimental and theoretical analyses.
One important advantage of new digital research tools is remote access to facilities that must be shared because they are costly or are located in extreme environments such as Antarctica or outer space. These facilities include, for example, both instruments for data gathering and machines for fabricating semiconductor devices. Remote access to these devices eliminates the need for expensive and time-consuming travel. More important, they allow a theoretically infinite number of observers to gain access simultaneously to instruments and technical support and to obtain multiple views of the same phenomena. Researchers in remote facilities may also meet responsibilities at a home institution, such as teaching and supervising research assistants.
Remotely accessible facilities include the following:
The National Science Foundation's (NSF) National Nanofabrication Facility ( http://www.nnf.cornell.edu );
The intermediate voltage electron microscope at the University of California at San Diego ( http://www-ncmir.ucsd.edu/CMDA/ and http://www-ncmir.ucsd.edu/MIBC/);
MOSIS, a remotely accessible facility for prototyping integrated circuits;
The synchrotron X-ray data collection facility at the Stanford Linear Accelerator ( http://ssrl.slac.stanford.edu/ );
The NSF-funded chain of incoherent scatter radars for measuring electromagnetic phenomena in the magnetosphere;
The planned NSF Polar Cap Observatory;
The Telescopes in Education (TIE) program, which provides educators and students with convenient access to a growing network of professional-quality telescopes, through pioneering the remote operation of telescopes by computer;
The virtual nuclear magnetic resonance facility at Battelle Pacific Northwest Laboratory ( http://www.emsl.pnl.gov:2080/docs/collab/virtual/EMSLVNMRF.html ).
“Supercomputers” were originally designed for highly efficient parallel processing (for computation-intensive calculations such as chemical interactions and fluid dynamics). In the
future, they are just as likely to be used as “superstorage” systems for handling huge amounts of digital data. Some of the largest parallel-processing supercomputers today, for example, are in “video-on-demand” servers for communities of users, such as in museums, to handle the enormous volumes of graphical data that must be transferred. Managing terabytes or even petabytes of geographic or other spatial information, in addition to datasets, text, and multimedia information, is the purpose of specially designed superstorage systems with hundreds or thousands of disks and processors. One demonstration of such a system, used for mapping the surface of the earth, is Microsoft's Terraserver project ( http://terraserver.microsoft.com). Another example is Knowledge System's ( http://www.ks.com ) PetaPlex series, originally funded by the National Security Agency.
Supercomputers and superstorage systems also enable the use of advanced, discipline-specific “search engines” for data-intensive applications that also involve large computations. For example, by comparing, pixel–by–pixel, co-registered Landsat satellite images of the same area that are taken at different times, it is possible to detect ground motion due to earthquakes. This type of computation requires large memories and takes months on a workstation; on a supercomputer it can be carried out in a few hours. Automated classification of large volumes of remote-sensing data (such as information on soil types) is also possible on powerful enough systems.
One area of research that the burgeoning networks have made possible is so-called distributed computing: using varied collections of computers, data collection and visualization engines, linked by communications networks, to solve largescale computing problems. Projects such as “Distributed.Net” or the “Search for Extraterrestrial Intelligence,” which rely on loosely knit groups of volunteer computer users from all around the world, are changing the paradigm for solving complex problems by distributing processing activity over networks of thousands or even millions of computers.
DATA ARCHIVING AND ACCESS
Preserving knowledge is one of the most vital and yet rapidly changing functions of the university. For centuries the
intellectual focus of every university has been its library—a collection of written works, maps, and special collections preserving the knowledge of civilization. Today, such knowledge exists in many forms—as text, graphics, sound, algorithms, and virtual reality simulations—and it exists almost literally in the ether, distributed in digital representations over worldwide networks, accessible to anyone, and certainly not the prerogative of the privileged few in academe. The library is becoming less a collector of information and more a navigator of information—a facilitator of retrieval and dissemination.
In a sense, some have observed, the library and the book are merging (see, for example, O'Donnell, 1998; Wulf, 1995). Hypertext links, embedded in tomorrow's books, will lead the reader through the maze of information seamlessly. The Web provides a model for this new form of information navigation. Some think the advanced search engines of today will pale by comparison with the software now being developed, which will enable us to use artificial intelligence techniques to collect, organize, relate, visualize and summarize information.
Ensuring the Quality of Information
The most challenging function of such a digital library will be to ensure the quality of information. As anyone who has searched the Web knows, the information there is highly variable in usefulness. The development of tools for capturing the useful and rejecting the useless is an important area of research in itself. So too will be work on tools to help humans rapidly review summaries, visualizations, and other organizations of information as we move from a black-and-white view of “relevance” to a fuzzier structuring that allows flexible human control.
Archiving Research Data
Another growing function of the university library may be the archiving of research data. Electronically stored information is highly perishable, unlike paper documents, which can last for centuries. In particular, research notes or supporting data for a report can be erased in the blink of an eye if they are not systematically preserved. The individual researcher is generally responsible for maintaining these records, but in the
long run a more comprehensive, consistent, and reliable method is needed.
This role may be assumed by digital libraries, which would thereby become true information utilities, ranging from the personal to the global (Communications of the Association for Computing Machinery, 1998). In the broad sense, digital libraries are human and technological systems and institutions, with collections and processes that have at least some digital aspects, usually at least an electronic catalog. At the upper end of the development scale are information systems that support rich hypermedia collections, helping people communicate and collaborate over space and time as they use data, information, and knowledge (Lesk, 1997).
Digital libraries must have effective preservation plans, since they are assuming the roles traditionally assigned to archives and museums, as well as libraries. By integrating the collection, organization, discovery, retrieval, reuse, publication, and dissemination roles now spread across many institutions, digital libraries will make it easier to relate data, information, and knowledge and their many representations. But disciplines need to develop data format and metadata standards to enable fusion, cutting through data in different ways.
Digital methods will allow these representations to be viewed according to their origins (for example, a state or national digital library), topic or discipline (a digital library of computing), genre (a digital library of reports), media (a digital library of music), or combinations of these and other perspectives. In any case, an effective preservation program requires planning (e.g., specification of allowable standard representations) as well as ongoing activity to accommodate changes in media and format.
PUBLISHING AND DISSEMINATION: THE CHANGING MEANING OF RESEARCH COMMUNICATIONS
Publishing is a fundamental part of research. It is the means by which researchers share their work with others, expose it to critical examination, give credit to previous work, and establish their own claims. Traditional methods of publication are intertwined with the technology of printing, and the economics of distributing and storing printed volumes. Information technology opens a variety of new communication channels to researchers. These channels—many under the
direct control of the researcher—are powerful means of communication. They allow researchers to publish incrementally, to include interactive materials, and to include supporting information for which there would not be room in a traditional publication. Using the new technology, researchers can communicate more broadly and directly with colleagues and the public.
Researchers today have on their desktops the technology for sophisticated multimedia production, incorporating text, graphics, audio, and video. Many of them have established their own Web sites and other channels for communicating with colleagues and the public throughout the world. Information that is openly available on the Internet is accessible to people around the world. This is a great advance over print journals or specialist monographs, which are available only to members of organizations with expensive research libraries.
These techniques place a heavy responsibility on the researcher. He or she must be accountable not only for the quality of the work, but for the completeness and candor of the presentation. The rise of cheap and capable information technology in the past decade or so has been paralleled by a growing tendency by researchers to publish their own results, before serious peer review. In physics the practice has emerged of depositing an open access version in a preprint archive at the same time as the paper is submitted to a journal. Some researchers, however, who seek to establish their priority in achieving a certain result in this way may jump the gun, propagating statements that turn out later to be misleading or incorrect when subjected to review. Questions of research priority have grown sharper for many researchers in the past decade or so, since, in some fields, research results are potentially valuable intellectual property. (For a discussion of the changing status of research priority, see Nissenbaum, 1998.)
For the consumer of information, the early publication of research is valuable, since these communications contain the latest information. On the other hand, the information may be of uncertain validity. Professionals need to develop their own techniques for assessing such information. Members of the public may lack those techniques.
An additional concern is archiving. If results are published on the Internet, the duty of preserving all relevant information (including background data and research notes) falls on both the researcher and the webmaster.
The peer-reviewed journal has been the touchstone of authoritative research results for more than 350 years. Today thousands of journals, printed on paper, record progress in specific disciplines and subdisciplines. Publication in an established journal implies that the report has been subjected to peer review and careful editing. Readers may therefore rely on such reports. Publication in a leading journal also can confer status on a researcher. However, the number of journals poses a serious financial challenge for many libraries. Some journals can cost libraries $10,000 or more for annual subscriptions. Some journals have dabbled in electronic publishing and others have switched completely to that mode of distribution. Most have hesitated to abandon their traditional way of operating.
Meanwhile, researchers themselves have taken advantage of electronic information technology to introduce new kinds of informal “publication,” such as press releases and personal Web sites. These channels have the virtue of openness and speed. They also make it possible to include background information such as laboratory data, additional graphs, and even sound and video, which would not be available in a journal publication. The information conveyed, once it is presented, can be tested by the standard techniques of research. But many researchers have pointed out that these informal channels, as well as electronic versions of report series, lack quality control, and often muddy the waters of research more than they clarify them. (Ginsparg http://www.lanl.gov/blurb/pg96unesco.html ), Walker ( http://www.amsci.org/amsci/articles/98articles/Walker.html ), and Harnad (1991, 2000) ( http://www.princeton.edu/~harnad/nature.html ), among others, discuss the possibilities of electronic publication.)
The National Institutes of Health (the federal agency that sponsors most of the nation's fundamental biomedical research) has proposed a new “electronic publishing site” to be known as “PubMed Central” (Pear, 1999). This site would serve many of the purposes of a journal. It is envisioned as a “two-tiered” site, with one tier containing peer-reviewed reports much like those to be found in a printed journal, and the other (larger) tier containing information on virtually all publicly funded biomedical research. The proposal can be found at http://www.nih.gov/welcome/director/ebiomed/ebiomed.htm . Such a site would give researchers everywhere, regardless of their
professional or geographic positions, equal access to the latest news of progress in biomedical science. (For example, researchers at institutions whose libraries cannot afford subscriptions to all-important journals would not be handicapped.) It would thereby speed progress.
Critics of the proposal, including editors of many conventional journals, worry that it would propagate poor quality science, which, rather than speeding progress, might impede it as large numbers of researchers attempt to replicate incorrect results that would have been exposed by vigorous peer review.
Other models also are evolving. The Association for Computing Machinery (ACM) is typical of academic publishers who have converted traditional journals to digital production. The processes of selection, editing, and review are unchanged. Currently, individuals and libraries can subscribe to either online or print versions, but demand for the print versions is falling rapidly and the ACM expects to phase them out over the next few years. ACM is engaged in a “portal” project, to bring the entire computing literature into its digital library. In Japan, NACSIS, a government agency, is stimulating change in the journal literature by providing electronic publishing and digital library services to scores of professional societies. In the United States, the National Science Foundation (NSF) is developing the National Science, Mathematics, Engineering, and Technology Education Digital Library (NSDL) to support undergraduate learning. It will be built as a “federated enterprise” (with resources submitted by educators and researchers). The Computer Science Teaching Center ( http://www.cstc.org ) has taken a similar approach (carefully testing and reviewing submitted materials for quality and usability). In other collections, resources may be submitted, made available after brief editor review, and then receive additional tags identifying how widespread their adoption is, as well as annotations describing use cases; such collections could operate in a fashion similar to the NIH's PubMed Central.
A FUTURE OF CONTINUED INNOVATION
No one is wise enough to predict the future of research in the digital environment. That future will be shaped by the individual efforts of researchers, students, administrators, entrepreneurs, and others working to make the most of their tools. Our research system's greatest strength is its capacity for
innovation at the individual level, and innovation by its nature is unpredictable.
One possible outcome is that, to the extent that distance and time lose importance, it will become possible to do good research anywhere, without losing access to the necessary information tools. However, the significance of this effect remains to be seen; researchers at poorer colleges and universities are likely to lag those in richer institutions in gaining access to the new digital tools.
Another possible result is a shift in emphasis of scientific research, farther from more-or-less direct observation of nature toward observation that is mediated by the available instruments, networks, and databases on the world's information networks. We may also confidently predict a growing reliance on computer simulation as an adjunct to experimentation.
The extent to which the university campus will lose its relevance to scientific research as virtual communities of scholars evolve is uncertain. Researchers and administrators have been discussing the impacts for a decade or more (see, for example, Lenzer, 1977; National Research Council, 1993, 1994, 1996; Sproull and Kiesler, 1991; Casper, 1995; Noam, 1995; Wulf, 1995; Casper et al., 1998; O'Donnell, 1998; and the Vision 2010 project supported by the Carnegie Foundation http://www.si.umich.edu/V2010/home.html#indexmap ]). It is clear that many of the functions of the local campus (such as the traditional library and some of the delivery of “mass market” undergraduate and technical education) are being threatened by information technology.