Click for next page ( 12


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 11
The Use of ~formadon Technology in Research n this chapter we examine the effect of information technology on the conduct of research. New technologies offer new opportunities, although pervasive use of computers in research has not come about without problems. Some of these problems are technological, some financial. Underlying many of them are complex institutional and behavioral constraints. Nearly five decades ago, the first programmable, electronic, digital computer was switched on. That day science acquired a tool that at first simply facilitated research, then began to change the way research was done. Today these changes continue, and now amount to a revolution. Electronic digital computers at first simply replaced earlier technologies. Researchers used computers to do arithmetic calculations previously done with paper and pencil, slide rules, abacuses, or roomfuls of people running mechan- ical calculators. Benefits offered by the earliest computers were more quantitative than qualitative; bigger computations could be done faster, with greater reliabil- ity, and perhaps more cheaply. But computers were large, expensive, required technically expert operators and programmers, and consequently were accessi- ble only to a relatively small fraction of scientists and engineers. One human generation and several computer generations later, with the advent of the integrated circuit (the semiconductor "chip"), computational speed increased by a factor of 1 trillion, computational cost decreased by a factor of 10 million, and the smallest useful calculator went from the size of a typewriter to the size of a wristwatch. At present, personal computers selling for a few thousand dollars can put significant computing power on the desk of every scientist. Meanwhile, advances in the software through which people interact with and instruct computers have made computers potentially accessible to people with no specific training in computation. More recently, computer technology has joined telecommunications technology to create a new entity, 11

OCR for page 11
12 INFORMATION TECHNOLOGY AND THE CONDUCT OF RESEARCH Bodices supplement or expand points in the text: the first two below deal with specific disciplines. "information technology." Information technology has done much to remove from the researcher the constraints of speed, cost, and distance. On the whole, information technology has led to improvements in research. New avenues for scientific exploration have opened. The amount of data that can be analyzed has expanded, as has the complexity of analyses. And researchers can collaborate more widely and efficiently. Different scientific disciplines use information technology differently. Uses vain according to the phenomena the discipline studies and the rate at which the discipline obtains information. In such disciplines as high energy physics, neurobiology, chemistry, or materials science, experiments generate millions of observations per second, and these must be screened and recorded as they happen. For these disciplines, computers that can handle large amounts of information quickly are essential and have made possible research that was previously impractical. Other disciplines, such as economics, psychology, or public health, gather data on events that accumulate slowly over relatively long periods of time. These disciplines also need computers with large capacities, but do not need the capability to react in "real time." Most disciplines use informa- tion technology in ways that fall somewhere in the range between these two extremes. HIGH ENERGY PHYSICS: SCIENCE DRIVES THE LEADING EDGE OF INFORMATION TECHNOLOGY An example helps to illustrate the direction in which many disciplines are moving: high energy physics could not be done without information technology, and offers an ex- treme example of the trends for computing and communication needs in many scientific disciplines. Most high energy physicists work on the same set of questions: what is the behavior of the most elementary particles, and what is the nature of the fundamental forces be- tween them? Their experiments are con- ducted in machines called accelerators, de- vices that produce beams of protons, elec- trons, or other particles that are accelerated to high speeds and huge energies. There are two types of accelerators: those in which two beams of particles are made to collide with each other (colliders), and those in which a beam hits stationary targets. Physicists then reconstruct the collision to find new phe nomena. Remarkable results have emerged from high energy physics experiments conducted over the past two decades. For instance, a Nobel prize-winning experiment carried out at the proton-antiproton collider at the Euro- pean Center for Nuclear Research (CERN) in Switzerland, discovered two new particles known as the W and the Z. Their existence had been predicted by a theory claiming that the weak and electromagnetic forces, seem- ingly unrelated at low energy levels, were in fact manifestations of a single force, called the electroweak interaction, which would ap- pear at sufficiently high energies. This discov- ery is a significant step toward the descrip- tion of all known interactions-gravity, elec- tromagnetism, and the strong (nuclear) and weak (radioactive decay) forcers manifes- tations of a single unifying force. The process by which some tens of these

OCR for page 11
13 The Panel recognizes the diversity in research methods, and differences in needs for information technology. But the needs of researchers show sufficient commonalities across research fields to make a search for common solutions worthwhile. THE CONDUCT OF RESEARCH The everyday work of a researcher involves such activities as writing proposals, developing theoretical models, designing experiments and collecting data, ana- lyzing data, communicating with colleagues, studying research literature, rev~ew- ing colleagues' work, and writing articles. Information technology has had important effects on all these activities, and more change is in the offing. To illustrate these effects, we examine three particular aspects of research: data collection and analysis, communications and collaboration, and information storage and retrieval. In each area, we discuss how researchers currently use information technology and what difficulties they encounter. In a final part of this section, we discuss new technological opportunities and their implications for the conduct of research. new W and Z particles were isolated from millions of collision events in the CERN accel- erator offers a striking illustration of the dependence of high energy physics on the most advanced aspects of information tech- nology. Three steps are involved. First, data are acquired in real time as the experiment progresses; second, the data obtained are transformed into flight paths, from which the particles making the paths are identified; and third, the event itself is reconstructed, and those few events exhibiting the very special characteristics of the new phenomenon are identified. In each of these steps computers are vital: to trigger the identification of inter- esting events; to establish particle tracks from the data; and to carry out analysis and interpretation. In the future, high energy physicists will demand more from information technology than it can now deliver. Proposed new parti- cle accelerators, such as the Superconduct- ing Super Collider (SSC), are expected to pro duce several million collisions every second, of which only one or two collisions a second can be recorded. Selecting this tiny fraction of the produced events in a manner that does not throw away other interesting data is a tremendous challenge. It is hoped that "farms" of dedicated microprocessors might be able to examine tens of thousands of collision events per second, so that sophisti- cated selection mechanisms can screen all collisions and select the veIy few that are to be recorded. The computer programs that need to be developed for these tasks are of unprecedented size and complexity, and will challenge the capabilities of both the physi- cists programming them and the information technology software support available to the programmers. Even the small fraction of recorded events will result in some ten million collisions to be analyzed in a year. Processing one year's worth of saved data from the SSC would take a modern mid-sized computer 500 years; THE USE OF INFORMATION TECHNOLOGY IN RESEARCH

OCR for page 11
14 INFORMATION DATA COLLECTION AND ANALYSIS TECHNOLOGY AND THE CONDUCT Current Use Collecting and analyzing data with computers are among the OF RESEARCH most widespread uses of information technology in research. Computer hard ware for these purposes comes in all sizes, ranging from personal computers to microprocessors dedicated to specific instrumentational tasks, large mainframe computers sensing a university campus or research facility, and supercomputers. Computer software ranges from general-pu~pose programs that compute nu meric functions or conduct statistical analyses to specialized applications of all sorts. The Panel has identified five trends in the use of information technology in data collection and analysis: Increased use of computers for research. This trend coincides with large and continued increases in the speed and power of computers and corresponding declines in their costs. Dramatic increases in the amount of information researchers can store and analyze. For example, researchers can now process and manipulate observations in a database consisting of 18 years x 3,400 individuals x 1,000 variables per individual for each year, create sets of relationships among these observations, obviously, a faster processing rate is re- quired. Although no computer currently on the market would handle this load in reason- able time, existing plans suggest that, by the time it is needed, some combination of dedi- cated microprocessors and large mainframe systems will be available. High energy physicists are also highly de- pendent on networks. Accelerators are lo- cated in only seven main laboratories in the United States, Switzerland, West Germany, the Soviet Union, and Japan; the physicists who use them are located in many hundreds of universities and institutions scattered around the world. Almost every high energy experiment, large or small, is a result of international collaboration: for instance, one detector installed around one of the collision points of the accelerator at the Fermi Na- tional Laboratory is run by a collaboration of four foreign and thirteen U.S. institutions, involving some 200 physicists. Physicists at several institutions designed different parts of the detector; since the detector has to work as an integrated apparatus, the physicists had to coordinate their work closely. Different physicists are also interested in different as- pects of the experiment, and subsequent analysis of the data depends crucially on adequate networking. Future networking needs for high energy physics involve very high transmission speeds (as high as 10 megabits per second) between laboratories, with provision for ex- change of collision event files, graphics, and video conferencing. Present long distance communication links are limited to lower transmission speeds (typically, 56 kilobits per second); each university physics group could use a 1.5 megabit per second line for its own research needs. The provision of these facil- ities would be of enormous benefit to univer- sity-based physicists and students who can- not travel frequently to accelerator sites.

OCR for page 11
15 and then subject the data to complex statistical analyses, all at a cost of less than $100. Two decades ago, that kind of analysis could not have been conducted, and a much simpler analysis would have cost at least ten times as much. The creation of new families of instruments in which computer control and data processing are at the core of observation. For example, in new telescopes, image-matching programs on specialized computers align small mirrors to produce the equivalent light-gathering power of much larger telescopes with a single mirror. For instruments such as radio-telescope interferometers, the computer integrates data from instruments that are miles apart. For computer- assisted tomographic scanners, the computer integrates and converts masses of data into three-dimensional images of the body. Increased communication among researchers, resulting from the prolifera- tion of computer networks dedicated to research, from a handful in the early 1970s to over 100 nationwide at present. Different networks connect different communities. Biologists, high energy physicists, magnetic fusion physicists, and computer scientists each have their own network; oceanographers, space scien- tists, and meteorologists are also linked together. Networks also connect re- searchers with one another regionally; an example is NYSERNET, the New York State Education and Research Network. Researchers with defense agency con- tracts are linked with one network, as are scientists working under contract to the National Aeronautics and Space Administration (NASA). Such networks allow data collection and analysis to be done remotely, and data to be shared among colleagues. Increasing availability of software "packages" for standard research activities. Robust, standardized software packages allow researchers to do statistical analyses of their data, compute complex mathematical functions, simplify mathematical expressions, maintain large databases, and design everything from circuits to factories. Many of these packages are commercial products, with high-quality documentation, service, and periodic updates. Others are freely shared software of use to a specialized community without the costs or benefits of commercial software. One example illustrating several of the above trends is a system that geophys- icists have set up to predict earthquakes more accurately. Networks of seismo- graphs cover the western United States. One such network in northern California is called CALNET. Information from the 264 seismographs in CALNET goes to a special-purpose computer called the real-time picker. The software on the real-time picker looks at data as they come in and identifies exceptional events: patterns that indicate a coming earthquake. Then it notifies scientists of the events by telephone and sends graphics displays of locations and magnitudes, all within minutes. Difficulties Encountered The difficulties that researchers encounter using information technology to collect and analyze data vary in importance depend- ing on the particular discipline. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH

OCR for page 11
16 INFORMATION One difficulty is uneven access to computing resources. Information technol TECHNOLOGY AD of iS not equally accessible to ail researchers who could benefit from its use, THE CONDUCT even though broadening access is a continuing focus of institutions and Finding OF RESEARCH agencies. To take an example from the field of statistics: according to a 1986 report on the Workshop on the Use of Computers in Statistical Research, sponsored by The Institute for Mathematical Statistics, "...the quality and quantity of computational resources available to researchers today varies dra matically from department to department . . . Perceived needs appear to vary just as dramatically.... tWhile] departments that already have significant computer hardware feel a strong need for operating support, . . . departments that do not have their own computational resources feel an equally strong need for hard ware." (Eddy, 1986, p. iii.) Exclusion from resources happens for a variety of reasons, all reducible in the end to financial constraints. Not all academic or research institutions have links to networks; in addition, access to networks can be expensive, so not everyone who wants it can afford it. In some cases, since access to networks often mediates access to resources such as supercomputers, exclusion from networks can mean exclusion from advanced computing. See box on software, One of the most frustrating difficulties for researchers is finding the right page 18. software. Software that is commercially available is often unsuited to the specialized needs of the researcher. In those fields in which industry has an interest, however, commercial software is being developed in response to a perceived market. Software could be custom designed for the researcher, but relatively few researchers pay directly for software development, partly because research grants often cannot be used to support it. Consequently, most research RESEARCH MATHEMATICS AND COMPUTATION Computation and theory in mathematics are symbiotic processes. Machine computing power has matured to the point where math- ematical problems too complicated to be understood analytically can be computed and observed. Phenomena have been observed for the first time that have initiated entirely new theoretical investigations. The theory of the chaotic behavior of dynamic systems de- pends fundamentally on numerical simula- tions; the concept of a "strange attractor" was formulated to understand the results of a series of numerical computations. Recent advances in the theory of knots have relied on algebraic computations carried out on com- puters. These advances can be directly ap- plied to such important topics as understand- ing the folding of DNA molecules. In the field of geometry, numerical simulation has been used recently to discover new surfaces whose analytic form was too difficult to analyze directly. The simulations were understood by the use of computer graphics, and led to the explicit construction of infinite families of new examples. The modern computer is the first labora- tory instrument in the history of mathemat- ics. Not only is it being used increasingly for research in pure mathematics, but, equally important, the prevalence of scientific com- puting in other fields has provided the me

OCR for page 11
17 ers, although they are not often skilled software creators, develop their own software with the help of graduate students. The result meets researchers' minimum needs but typically lacks documentation and is designed for one purpose only. Such software is not Filly understood by any one person, making it difficult to maintain or transport to other computing environments. This means that the software often cannot be used for related projects, and the scientific community wastes time, effort, and money duplicating one another's efforts. In sections to follow we examine how this problem is being addressed by profes- sional associations, nonprofit groups, and corporations. Some disciplines are limited by available computer power because computers needed are not on the market. Some contemplated calculations in theoretical physics, quantum chemistry, or molecular dynamics, for example, could use computers with much greater capacity than any even on the drawing boards. In other cases, data gathering is limited by the hardware presently available. Most commercial computers are not designed to accommodate hardware and pro- grams that select out interesting information from observational data, and scientists who want such computers must build them. Another difficulty researchers encounter is in transmitting data over networks at high speed. For researchers such as global geophysicists who use data collected by satellite, a large enough volume of information can be sent in a short enough time, but transmission is unreliable. Researchers often encounter delays and incur extra costs to compensate for "noise" on high-speed networks. Technological solutions such as optical fiber and error-correcting coding are currently expensive to install and implement and are often unavailable in certain geographic regions or for certain applications. dium for communication between the math- ematician and the physical scientist. Here modern graphics plays a critical role. This interaction is particularly strong in materials science, where the behavior of liquid crystals and the shapes of complex polymers are being understood through a combination of theoretical and computational advances. In spite of all this, mathematics has been one of the last scientific disciplines to be computerized. More than other fields, it lacks instrumentation and training. This prevents the mathematician from using modern com- puting hardware and techniques in attacking research problems, and at the same time isolates him/her from productive communi- cation with scientific colleagues. Of course, mathematics is an important part of the foundation and intellectual basis of most of the methods that underlie all scientific use of computational machinery. To use today's high-speed computing ma- chines, new techniques have been devised. The need for new techniques is providing a serious challenge to the applied mathemati- cian, and has placed new and difficult prob- lems on the desk of the theorist; algorithms themselves have become an object of serious investigation. Their refinement and improve- ment have become at least as important to the speed and utility of high-speed comput- ing as the improvement of hardware. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH

OCR for page 11
18 IN1?OElMATION COMMUNICATION AND COLI^BORATION AMONG RESEARCHERS TECHNOLOGY AND THE CONDUCT Current Use Researchers cannot work vv~thout access to collaborators, to OF RESEARCH instruments, to information sources and, sometimes, to distant computers. Computers and communication networks are increasingly necessary for that access. Three technologies are concerned with communications and collabora tion: word processing, electronic mail, and networks. Word processing and electronic mail are arguably the most pervasive of all the routine uses of computers in research communication. Electronic mail sending text from one computer user to another over the networks is replacing written See box on document and telephone communication among many communities of scientists, and is processing, page 19. changing the ways in which these communities are defined. Large, collaborative projects, such as oceanographic voyages, use electronic mail to organize and schedule experiments, coordinate equipment arrivals, and handle other logistical IF KITCHEN APPLIANCES WERE LIKE SOFTWARE If kitchen appliances were like programs, they would all look alike sitting on the counter. They would all be gray, featureless boxes, into which one places the food to be processed. The door to the box, like the box itself, is completely opaque. On the outside of each box is a general description of what the box does. For in- stance, one box might say: "Makes anything a meal"; another: "Cooks perfectly every time"; another: "Never more than 100 calories a serving." You can never be exactly sure what happens to food when it is placed in these boxes. They don't work with the door open, and the 200-page user's manual doesn't give any details. Working in a kitchen would be a matter of becoming familiar with the idiosyncrasies of a small number of these boxes and then laying to get done what you really want done using them. For instance, if you want a fried- egg sandwich, you might try the "Makes any- thing a meal" box, since a sandwich is a sort of meal. But because you know from past experience that this box leaves everything coated with grease, you use the "Never more than 100 calories" box to postprocess the output. And so on. The result is never what you really want, but it is all you can do. You aren't allowed to look inside the boxes to help you do what you really want to do. Each box is sealed in epoxy. No one can break the seal. If the box seems not to be working right, there is nothing you can do. Even calling the manufacturer is no help, because the box is not under warranty to be fit for any particular purpose. The manufacturers do have help lines, but not for help with broken boxe~rather to help you figure out how to use functioning boxes. But don't try to ask how your box works. The help-line people don't know, or if they do, they won't tell you. Several times a year you get a letter from the manufacturer telling you to ship them your old box and they will send you a new one. If you do so, you find yourself with a shinier box, which does whatever it did before a little faster, or perhaps it does a little more but since you were never sure what it did before, you cannot be sure it's better now. SOURCE: Mark Weiser, 1987. "Source Code," IEEE Com puter, Z0(~): 6~73.

OCR for page 11
19 details. With the advent of electronic publishing tools that help lay out and integrate text, graphics, and pictures, mail systems that allow interchange of complex documents will become essential. Networks range in size from small networks that connect users in a certain geographic area, to national and international networks. Scientists at different sites increasingly use networks for conversations by electronic mail and for repeated exchanges of text and data files. The Panel has identified two major trends in the way information technology is changing collaboration and communication in scientific research: Information can be shared more and more quickly. For example, one of the first actions of the federal government after the discovery of the new high- temperature superconductors was to fund, through the Department of Energy's Ames Laboratory, the creation of a superconductivity information exchange. The laboratory publishes a biweekly newsletter on advances in high-temperature superconductivity research, available in both paper and electronic forms; the electronic version is sent out to some 250 researchers. Researchers are making new collaborative arrangements. The technology of networks provides increased convenience and faster turnaround times often several completed message exchanges in one day. For shorter messages, special software allows real-time exchanges. DOCUMENT PROCESSING [An] area of significant change is document processing. This began in the 1960s with a few simple programs that would format typed text. In the context of UNIX* in the 1970s, these ideas led to a new generation of document processing programs and lan are constructing systems, such as the POST SCRIPT protocols, embodying these ideas. The NSF-sponsored EXPRES project, at the University of Michigan and Carnegie Mellon University, illustrates a serious effort to de velop a standard method of exchanging full scientific documents by network. Low-cost laser printers now make advanced document guages, such as SCRIBE and the UNIX-based preparation and printing facilities available to tools troths, eqn, tbl, and pie. The quintessence many people with workstations and personal of these ideas are Knuth's TeX and computers. It is now possible for everyone to METAEiONT systems, which have begun to submit high-quality, camera-ready copy di revolutionize the world's printing industry. rectly to publishers, thus speeding the publi In workstations, these ideas have produced cation of new results; however, it is no longer WYSIWYG (w~zzy-wig, or "what you see true Mat a well-formatted document can be iswhatyouget")systemsthatdisplayformat- trusted to have undergone a careful review ted text exactly as it will appear in print. and editing before being printed. International standards organizations are considering languages for describing docu ments, and some software manufacturers SOURCE: Peter J. Denning, 1987, Position Paper: Informa tion Technology in Computing. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH See box on collaboration, page 20.

OCR for page 11
20 INFORMATION AS Lederberg noted a decade ago (Lederberg, 1978), digital communication TECHNOLOGY AND allows scientists to define collegial relationships along the lines of specialized THE CONDUCT interests rather than spatial location. This is immensely beneficial to science as OF RESEARCH a whole, but causes some consternation among administrators who find more loyal to disciplines than to institutions. Technologies in the process of development show the networks' remarkable potential. Multimedia mail allows researchers to send a combination of still images, video, sound, and text. Teleconferencing provides simultaneous elec tronic links among several groups. Electronic chalkboards allow researchers to draw on their chalkboard and have the drawing appear on their computer and on the computers of collaborators across the country. Directory services, or "namese~vers," supply directories of the names and network addresses of users, processes, and resources on a given network or on a series of connected networks. Program distribution services include the supply of mathematical software to subscribers. A spectacular new technology is represented in the Metal Oxide Semiconductor Implementation System (MOSIS), a service that contracts for the manufacture of very large-scale integrated (VLSI) chips from circuit diagrams pictured on a subscriber's screen. Fabrication time is often less than 30 days. In one notable example, the researchers designing a radiotelescope in Australia designed custom chips for controlling the telescope. MOSIS returned the chips in a matter of days; the normal manufacturing process would have taken months and would have delayed the development of the instrument considerably. NEW FORMS OF COLLABORATION THROUGH THE NETWORKS The development of COMMON LISP (a pro ~arnming language) would most probably not have been possible without the electronic message system provided by ARPANET, the Department of Defense's Advanced Research Projects Agency network. Design decisions were made on several hundred distinct points, for the most part by consensus, and by simple majority vote when necessary. Ex cept for two one-day face-to-face meetings, all of the language design and discussion was done through the ARPANET message system, which permitted effortless dissemination of messages to dozens of people, and several interchanges per day. The message system also provided auto- matic archiving of the entire discussion, which has proved invaluable in preparation of this reference manual. Over the course of thirty months, approximately 3000 messages were sent (an average of three per day), ranging in length from one line to twenty pages... It would have been substantially more difficult to have conducted this discus- sion by any other means, and would have required much more time. SOURCE: Guy Steele, 1984. COMMON LISP: The Lan guage. Bedford, MA: Digital Press, pp. xi-xii. Reprinted with permission. Copyright Digital Press/Digital Equip- ment Corporation.

OCR for page 11
21 To share complex information (such as satellite images) over the networks, researchers will need to be able to send entire pictures in a few seconds. One technique that is likely to receive more attention in the future is data compres- sion, which removes redundant information and converts data and images to more compact forms that require less time to transmit. Among the most important of potential applications of information technology is the emergence of a truly national research network-that is, a set of connec- tions, or gateways, between networks to which every researcher has access. The National Science Foundation has announced its intention to serve as a lead agency in the development of such a network, beginning with a backbone, called NSFNET, that links the NSF-supported supercomputing centers, and widening to include other existing networks. Widespread access to networks will also offer much more than just commu- nications links. They can become what the network serving the molecular biology community aims to be: a full-fledged information system. Difficulties Encountered The principal difficulty with communicating across research communities via electronic mail and file transfer technologies is incompatibility. The networks were formed independently, evolved over many years, and are now numerous. Consequently, networks use different protocols, that is, different conventions for packaging data or text for transmission, for locating an appropriate route from sender to receiver over the physical network, and for signaling the start and stop of a message. For example, a physicist on the High Energy Physics network (HEPNET) trying to send data to a physicist on one of the regional networks would first have to ask "What network are you on?"; "How do I address you?"; and "What form do you want the information in?" In the gateway between two networks, the protocols of the first network must be removed from the message and the protocols for the second added. Under heavy traffic loads, the gateways can become bottlenecks. As a result, navigating from one network to a researcher on another is time-consuming, tiresome, and often unreliable; navigating over two networks to a researcher on a third is prohibitively complex. Text can frequently be moved from one word processing system to another only with significant loss of formatting information including the control of spacing, underlining, margins, or indentations. Graphics can only rarely be included with text. Such issues of compatibility may delay the expansion of electronic publishing as well as electronic proposal submission and review the goals of the National Science Foundation's EXPRES project. The issues are summarized succinctly by Denning: "Most word processors are inadequate for scientific needs: they cannot handle graphs, illustrations, math- ematics and layout, and myriad file formats make exchange extremely difficult. With so many experts and so much competition in the market, it is hard to win agreement on standards. There is virtually no electronic support for the remain- der of the process of scientific publication submission, review, publication, and THE USE OF INFORMATION TECHNOLOGY IN RESEARCH

OCR for page 11
36 INFORMATION training, and documentation have continued. Efforts to move research support TECHNOLOGY AND into indirect cost categories have not succeeded as many research institutions THE CONDUCT and universities face caps on indirect cost rates and have no room to accommo OF RESEARCH date new costs. Advances in communications and computing generate new services that require subsidy during the first years of their existence if they are to be successfully tested. This is particularly true of network-related services. Building services into a national network for research will require significant federal, state, and institutional subsidy, which cannot be recovered from user service charges until large-scale connectivity has been achieved and services are mature. Sources for these subsidies must be determined. Methods used for cost recovery can have significant impacts on usage. Two alternatives are to charge users for access to services or to charge users for the amount of service used. Networks such as BITNET have grown substantially in connectivity and use because they have fixed annual institutional charges for membership and connection, but charge no fees for use. Use-insensitive charge methods (often referred to as the library model) are attractive to institutions because costs can be treated as infrastructure costs and are predictable. Charges A REASONABLE MODEL Although the Panel is unaware of anvthin~ precisely like the vision it holds for sharing information, proposals for the newly estab- systems; fished National Center for Biotechnology In formation (NCBI) at the National Library of Medicine may come close. The NCBI pro poses to facilitate easy and effective access to a comprehensive array of information sources that support the molecular biology research community. Many, but not all, of these sources are electronic. They encompass raw data, text, bibliographic information, and graphic rep resentations. Ownership and responsibility for development and maintenance of these sources range from individual researchers to departmental groups, institutes, professional organizations, and federal agencies. Each was designed to serve specific needs and audiences, created in many different hard ware configurations and software applica tions. Consequently, NCBI's mission requires experts in both information technologies and biotechnologies. NCBI staff must Provide directories to knowledge sources; Create useful network gateways between Assist users in using databases effec- tively; Reduce incompatibilities in retrieval ap- proaches, vocabulary, nomenclature and data structures; Promote standards for representing in- forrnation that will reduce redundancy and detect inconsistencies or errors; Provide useful tools for manipulating and displaying data; and Identify new analytic and descriptive services and systems. Some computing-intensive universities (e.g., Carnegie Mellon University and Brown University) and medical centers (e.g., Johns Hopkins University, the University of Utah, Baylor University, and Duke University) are also attempting to develop instances of the - vlslon.

OCR for page 11
37 for amount of use, in contrast, can inhibit usage; a major inhibitor to use of commercial databases for information searches, for instance, is the unpredict- ability of user charges for time spent searching the databases. During the development of network services, it seems desirable to recover costs through fixed access charges wherever possible. The Problem of Standards The development of standards for interconnec- tion makes it possible for every telephone in the world to communicate with every other telephone. The absence of commonly held and implemented standards that would allow computers to communicate with every other com- puter and to access information in an intuitive and consistent way is a major impediment to scholarly communication, to the sharing of information re- sources, and to research productivity. Standards for computer communication are being developed by many groups. The pace of these efforts is painfully slow, however, and the process is intensely political. The technologies are developing faster than our ability to define standards that can make effective use of them. Further, standards that are developed prematurely can inhibit technological progress; standards developed by one group (for example, an equipment vendor) in isolation create islands of users with whom effective communication is difficult or impossible. Development of standards not only improves efficiency but also reduces costs. Open interconnection standards permit competition among vendors, which leads to lowered costs and improved capabilities. Proprietary standards restrict competition and lead to increased costs. Federal government procurement rules have been major sources of pressure on vendors to support open standards. Current mechanisms for reaching agreement on standards need examination and significant improvement. Such examination needs input from user groups, which will have to exert pressure on standards bodies and on the vendors who are major players in the standard-setting process. Legal and Ethical Constraints The primary legal and ethical constraints to wider use of information technology are issues of the confidentiality of, and access to, data. The following discussion will only illustrate these issues; we believe they are too important and too specialized to be adequately addressed in a document as general as this one. In the report's final section, we recommend the establishment of a body that will study and advise on these issues. Information technology has made possible large-scale research using data on human subjects. For the first time, researchers can merge data collected by national surveys with data collected in medical, insurance, or tax records. For instance, in public health research, long-term studies of workers exposed to specific hazards can be carried out by linking health insurance data on costs with Internal Revenue data on subsequent earnings, Social Security data on disability payments, and mortality data, including date and cause of death (Steinwachs, 1987, Position Paper: Information Technology and the Conduct of Public Health THE USE OF INFORMATION TECHNOLOGY IN RESEARCH

OCR for page 11
38 INFORMATION Research). The scientific potential of such data mergers is enormous; the actual TECHNOLOGY AND use of mergers is small, primarily because of concerns about privacy and THE CONDUCT confidentiality. OF RESEARCH The right to confidentiality of personal information is held strongly in our society. Concerns about the conflict between researchers' needs and citizens' rights have been extensively explored by a number of scientific working groups, under the auspices of both governmental agencies (such as the Census Bureau) and private groups (for example, the National Academy of Sciences). As more information about individuals is collected and cross-linked, fears are raised that determined and technically sophisticated computer experts will be able to identity specific individuals, thus breaching promises of confidentiality and privacy of information. The Census Bureau, in particular, fears that publicity surrounding such breaches of confidentiality will undermine public confidence and inhibit cooperation with the decennial censuses. Although there have been discussions and legislative proposals for outright restrictions on mergers of government survey or census data, a reasonable alternative seems to be to impose severe penalties on researchers who breach confidentiality by making use of information on specific individuals. The issue here, as elsewhere in public policy problems, is the balance of benefits against costs. Does better research balance the risk of compromising perceived funda mental rights to privacy? This is a topic that will need to be debated among both researchers and concerned constituencies in the general public. A related issue is that of acceptable levels of informed consent for human subjects. At present, consent is usually obtained from each respondent to a survey; it is described as informed because the respondent understands what will be done with responses usually, that they will be used only for some specific research project. Data-collecting organizations protect the confidenti THE FAR SIDE OF THE DREAM: THE LIBRARY OF THE FUTURE "Can you imagine that they used to have libraries where the books didn't talk to each other?" [Marvin Minsky, MIT] The libraries of today are warehouses for passive objects. The books and journals sit on shelves, waiting for us to use our intelligence to find them, read them, interpret them, and cause them finally to divulge their stored knowledge. "Electronic" libraries of today are no better. Their pages are pages of data files, but the electronic page images are equally passive. Now imagine the library as an active, intel- ligent "knowledge server." It stores the knowledge of the disciplines in complex knowledge structures (perhaps in a formal- ism yet to be invented). It can reason with this knowledge to satisfy the needs of its users. The needs are expressed naturally, with fluid discourse. The system can, of course, retrieve and exhibit (the electronic textbook). It can collect relevant information; it can summa- rize; it can pursue relationships. It acts as a consultant on specific prob- lems, offering advice on particular solutions, justifying those solutions with citations or with a fabric of general reasoning. If the user

OCR for page 11
39 ality of the information obtained from respondents, but guarantee only that information about specific individuals will not be released in such a way that they can be identified. The extent to which informed consent can be given to unknown future uses of survey data, in particular to their merger with other data sources, is of great concern to survey researchers. Controlling the eventual uses of merged, widely distributed data sets would be difficult. Another concern that needs to be addressed is one of responsibility in computer-supported decision making. Scientists, engineers, and clinicians more and more frequently will use complex software to help analyze and interpret their data. Who then is morally and legally responsible for the correctness of their interpretations, and of actions based on them? Experiments involving dangerous materials or human lives may soon be controlled by computers, just as many commercial aircraft landings are at present. Computers may be capable of faster or more precise determinations in some situations than humans. But software designers lack strong guidelines on assignment of responsibility in case of malfunction or unforeseen disaster, and lack the expertise to guarantee against malfunctions or disasters. With complex software overlaid on complex hardware, it is impossible to prove beyond a doubt in all circumstances that both hardware and software are performing precisely as they were specified to perform. Gaps in Training and Education The training and education necessary for using information technology are lacking. Two decades ago many researchers dealt with computers only indirectly through computer programmers who worked in data processing centers. The development of information technology has brought computing into the researcher's laboratory and office. As a result, the level of computing competence expected of researchers, their support staff, and their students has increased manyfold. can suggest a solution or a hypothesis it can check this, even suggest extensions. Or it can critique the user viewpoint, with a detailed rationale of its agreement or disagreement. . . . The user of the Library of the Future need not be a person. It may be another knowledge system that is, any intelligent agent with a need for knowledge. Such a Library will be a network of knowledge sys- tems, in which people and machines collab- orate. Publishing is an activity transformed. Au- thors may bypass text, adding their incre- ment to human knowledge directly to the knowledge structures. Since the thread of responsibility must be maintained, and since there may be disagreement as knowledge grows, the contributions are authored (inci- dentally allowing for the computation of roy- alties for access and use). Knowledge base maintenance ("updating") itself becomes a vigorous part of the new publishing industry. SOURCE: Edward A. Feigenbaum, 1986. Autoknowledge: From file servers to knowledge servers. In: Med~info 86. R. Salarnon, B. Blum, and M. Jorgensen, eds. New York: Elsevier Science Publishers B.V. (North-Holland). THE USE OF INFORMATION TECHNOLOGY IN RESEARCH

OCR for page 11
40 INFORMATION Computers are changing what students need to learn. Undergraduate students TECHNOLOGY AND of chemistry, for example, need more than the standard courses in organic, THE CONDUCT inorganic, analytic, and physical chemistry; in the view of many practicing OF RESEARCH chemists, they should also have courses in calculus, differential equations, linear algebra, and computer simulation techniques, and through formal courses or practical research experience, should be competent in mathematical reasoning, electronics, computer programming, numerical methods, statistical analysis, and the workings of information management systems (Counts, 1987, Position Paper: The Impact of Information Technologies on the Productivity of Chemistry). Neither students nor researchers can obtain adequate training and education through one-time training courses. Because the numbers of new tools are multiplying, researchers need ways to continuously learn about, evaluate, and, if necessary, adopt these new tools. Using commercial programs and tutorial systems only partly alleviates the problem because the technologies often change faster than such supports can accommodate to the changes. Instructors in the uses of information technologies within the disciplines are rare. Senior research ers are especially hard hit. The Panel took no formal survey, but informal discussions suggest that most senior researchers have had exposure to no more than a one-semester programming course and have few of the skills needed to evaluate and use the available technology. For all researchers, learning advanced computing means taking a risk. They must interrupt their work and pay attention to something new and temporarily unproductive. They must become novices, often where sources of appropriate instruction and help are unclear or inaccessible. The investment of time and level of frustration are likely to be high. Understandably, many researchers cannot find the time and the confidence to learn technical computing; some justify their DOCUMENTS AS LINKED PIECES: HYPERTEXT The vision of computing technology revo lutionizing how we store and access knowl edge is as old as the computing age. In 1945 Vannevar Bush proposed MEMEX, an electro optical-mechanical information retrieval sys tem that could create links between arbitrary chunks of information and allow the user to follow the links in any desired manner. In the early 1960s, Ted Nelson introduced "hyper text," a fonn of Consequential writing: a text branches and allows choices to the reader, best read at an interactive screen. In 1968, Doug Englebart demonstrated a simple hy pertext system for hierarchically-structured documents-that is, a list of sections, each of which decomposes into a list of subsections, each of which decomposes into a list of paragraphs, and so on to which annotations could be added during a multiple-workstation conference. Today hypertext refers to infor- mation storage in which documents are pre- served as networks of linked pieces rather than as a single linear string of characters; readers can add links and follow links at will. Nelson's XANADU system is perhaps the most ambitious hypertext system proposed. XANAI)U would make all the world's knowl- edge accessible in a global distributed data- base to which anyone can add information,

OCR for page 11
41 choices with negative attitudes, for example: "I get enough communications as it is; I don't need a computer network," or "If I put my data on the computer, others will steal it," or "We are doing fine as things are; why change at this point?" Given these natural but negative attitudes, organizations are sometimes slow in responding to demands for new information technologies. Some research orga- nizations view these attitudes as unchangeable and wait to introduce advanced computing until existing researchers move or retire. Others are actively replacing personnel or creating new departments for computational researchers. Still others are attempting to change attitudes by giving researchers the necessary time and support systems. While we have no data on changes in productivity, there is some evidence that in organizations following the latter course, existing researchers at all ranks can achieve as high computing competence as new personnel (Kiesler and Sproull, 1987). Because people are now being introduced to computing skills at earlier stages of schooling, the lag in computer expertise is disappearing. Over time, alterna- tives to personal expertise in the form of user-friendly software or individual assistance from specialists will also develop. plunks of Organizational Change Changing an organization to make way for advanced information technology and its attendant benefits entails real risks. Administrators and research managers are often reluctant to incur the costs fi- nancial, organizational, behavioral-of new technology. In some cases, adminis- trators and research managers relegate computer resources-hardware, soft- ware, and people-based support services- to a lower priority than the procure- ment and maintenance of experimental equipment. The result can be a long-term suppression of the development and use of the tools of information technology. and in which anyone can browse or search for information. A document is a set of one or more linked nodes of text, plus links to nodes already in the global database; a document may be mostly links, constructed out of pieces already in the database. Users pay a fee proportional to the number of characters they have stored. Anyone accessing an item in the global database pays an access charge, a portion of which is returned to the owner as a royalty. Individuals can store private docu- ments mat cannot have public links pointing to them and can attach annotations to public documents that become available to everyone reading those documents. Documents can be composed of different parts including text, graphics, voice, and video. INTERMEDL\, a hypertext system with some of these proper- ties, has been implemented at Brown Univer- sibr and has been used to organize informa- tion in a humanities course for presentation to students. Small-scale hypertext systems, such as Apple's Hypercards for the Macin- tosh, are available on personal computers; their promoters claim these systems will change information retrieval as radically as spreadsheets changed accounting a few years ago. SOURCE: Peter and Dorothy Denning, personal commu nication, 1987. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH See box on electronic laboratory notebook, page 42.

OCR for page 11
42 INFORM`\TION In other cases, administrators are misled into underestimating the time and TECHNOLOGY AND resources required to deploy new information technology. Efforts to develop TElE CONDUCT effective networks have been insufficiently supported by government planners OF RESEARCH and research institution administrators, who have been led to assume that technology and services to provide network access are easily put in place. Some administrators have promoted change, but without adequate planning for the resources or infrastructure needed to support users. Problems such as these are exacerbated by overly optimistic advice given the administrators by technological enthusiasts. This particular impediment probably cannot be overcome. It can, however, be alleviated by establishing collaborative arrangements to develop plans for and share the costs of change. EDUCOM, for example, is a consortium of research universities with large computing resources that promotes long-range planning and sharing of resources and experiences. Absence of Infrastructure Most fundamental of all the institutional and behavioral impediments to the use of information technology is the absence of an infrastructure that supports that use. Just as use of a large collection of books is made possible by a building and shelves in which to put them, a cataloguing system, borrowing policies, and reference librarians to assist users, so the use of a collection of computers and computer networks is supported by the existence LEGAL CONSTRAINTS TO AN ELECTRONIC VERSION OF A LABORATORY NOTEBOOK Today, the paper laboratory notebook is the only legally supportable document for patent applications and other regulatory pro cedures connected with research. Some or ganizations, however, routinely distribute electronic versions of laboratory notebook information to managers and other profes sionals who would otherwise have to visit the research site physically or request photo copies. The benefits of legal electronic note books are speculative but attested to by those using them informally (Liscouski, 1987~. First, they would help give researchers access to information or expertise that is otherwise lost because people have moved or reside in dif ferent departments. Second, they would al low research managers and researchers to observe and compare changes in results over time. Third, they would eliminate or make easier the assembly of paper versions of doc- uments needed for government agencies. The barrier to an electronic notebook is social its lack of acceptance as a legal document. Such acceptance could take place if legal conditions for an electronic system storage, format, security were delineated. However, researchers, scientific associations, and gov- ernment agencies have failed to develop such guidelines. This failure is probably connected to the traditions of privacy in laboratory note- books, to the inability to forecast how an electronic system would stand up in court, (and related to that, the risk and unacceptable cost to any single institution of developing a system), and to the uncertainty of the ulti- mate benefits on some widely accepted index of research effectiveness. Whatever the rea- sons, the end result is that a complete and accepted electronic notebook remains unde- veloped.

OCR for page 11
43 of institutions, services, policies, and experts in short, by an infrastructure. On the whole, information technology is inadequately supported by current infra- structures. An infrastructure that supports information technology applications to re- search should provide Access to experts who can help; Ways of supporting and rewarding these experts; Tools for developing software, and a market in which the tools are evaluated against one another and disseminated; Communication links among researchers, experts, and the market; and Analogs to the library, places where researchers can store and retrieve information. Several different kinds of experts in information technology help researchers. Some are specialists in research computing. Some are programmers who develop and maintain software specific to research. Others are specialists who carry out searches. Still others are "gatekeepers," who help with choices of software and hardware. Gatekeepers are members of an informal network of helpers centered around advocates and specialists, experts in both a discipline and in inflation technology who become known by reputation. Overdependence on gatekeepers creates other problems: as with any informal service, some advice received may be narrowly focused or simply wrong and the number of persons wanting free information often becomes larger than the number of persons able to provide it. As a result, the gatekeepers may become overloaded and eventually retreat from their gatekeeping roles. To hold on to expert help of all types, research and funding institutions must find ways of supporting and rewarding it. While institutions and disciplines have evolved ways of rewarding researchers publication in refereed journals, promo- tion, tenure no such systems yet reward expert help. Another aspect of the needed infrastructure is some formal provision for developing and disseminating software for specific research applications. Tools for constructing reliable, efficient, customized, and well-documented software are not used in support of scientific research. Computer science, as a supporting discipline, needs to facilitate rapid delivery of finished software, and easy extension and revision of existing software. The Department of Defense has recently pioneered the creation of a Software Engineering Institute at Carnegie Mellon University. Efforts to create tool building and research resources for nondefense software are worth encouraging. Development and dissemination of scientific software could be speeded in many cases by adoption of emerging commercial standards. These standards are supported by many vendors for a variety of computing environments. The temptation to narrowly match software to specific applications should be resisted in favor of standard approaches. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH

OCR for page 11
44 INFORMATION Software, once developed, needs to be evaluated and disseminated. The TECHNOLOGY AND research establishment now evaluates research Information principally through THE CONDUCT peer review of funding proposals and manuscripts submitted for publication. OF RESEARCH SoDw~ needs to be dent with in a simper manner. EDUCOM has recently announced its support of a peer-review process for certain kinds of academic software. Other prototypes of systems for evaluating and disseminating software already exist (see boxes on BIONET and on IBM's software market). These See software market, box prototypes couple an electronic "market," through which software can be disseminated, with a conferencing capability that allows anyone with access to contribute to the evaluation of the market wares. The system provides an extremely important feature: those contributors who are most successful in the open market can automatically be identified and given credit in much the same way as authors of books and research papers now are. The infrastructure for information technology also depends on communica- tion links. The Panel believes that one of the most important services that computer networks can provide is the link between users and expert help. Existing links often take the form of electronic bulletin boards on various networks; other mechanisms also exist. Until more formal mechanisms come about, open communication with pioneers, advocates, and enthusiasts is one of AN EXA1MPLE OF A SOFTWARE MARKET INFRASTRUCTURE: IBM RESEARCH IBM's internal computer network connects over 2,000 individual computers worldwide, providing IBM's researchers, developers, and other employees with communications facil- ities such as electronic mail, file transfers, and access to remote computers. In recent years, software repositories and online con- ferencing facilities have grown and flour- ished, and become one of the primary uses of the network. With a single command, any IBMer has access to some 3,000 software packages, developed by other IBMers around the world and made available through the network. Many of these packages are com- puter utilities and programming tools, but others are tools for research. They include statistical and graphics applications, simula- tion systems, end AI and expert system shells, as well as many everyday utilities to make general use of the computer simpler. The high level of interconnection offered by the network and the centralization of informa- tion offered by the repositories allows scien- tists with a particular need to see if software to satisfy that need is available, to obtain it if it is, and to develop it if it is not, with confidence that they are not duplicating the efforts of some colleague. The online conferences (public special- purpose electronic bulletin boards), which are as widespread and accessible as the soft- ware repositories, allow users of the software (and of commercial and other software) to exchange experiences, questions, and prob- lems. These conferences provide a form of peer review for the software developer. For internally developed software, they provide a fast and convenient channel between the soft- ware author and the users; authors with an interest in improving their programs have instant access to user suggestions and to

OCR for page 11
the best ways to allow new technologies to be disseminated and evaluated by research communities. A final piece of infrastructure largely missing is housing and support for the storing and sharing of information. Such a function could be performed by disciplinary groups or, more generally, at the university level. Many university libraries have a professional core staff whose members hold faculty rank and function not only as librarians but also as researchers and teachers. Some university computer centers operate similarly. National laboratories, like astro- nomical observatories and accelerator facilities, have a core staff of astronomers or physicists whose main task is to serve outside users while also maintaining their own research programs. The existence of such a professional staff involved in the storage and retrieval of information for a discipline would provide a means of recognizing, rewarding, and providing status to these people. In some cases, a university might wish to consider integrating its information science department with its computer center and its library. eager testers. Users with a special need or a hard question have equally fast access to the author for enhancements or answers. The conferences also allow users with common interests to exchange other sorts of information in the traditional bulletin board style. AI researchers debate the usefulness of the concept of intentionality or discuss how software engineering methodologies apply to expert systems development; computer graphics and vision workers talk about the number of bits required to present a satisfac- to~y image to the human eye. Over 100 individual conferences support thou- sands of separate discussions about computer ~ and software and visual all other an peck of IBM's under. The sol repos itories provide a "reviewed" set of tools and appli- cations for a broad population on a wide spec- trum of problems. The organization that originally sets up a repository or a conference generally provides user support for it (answering "how to do it" questions), and installation and maintenance of local services is usually handled either by an onsite group that has an interest in the specialty served by the facility, or on a more formal basis by the local Information Sys- tems department. The benefits of these repositories and con- ferences are at least as widely distributed and probably even harder to quantify, but the success of these software libraries and online conferences within IBM should serve as an encouraging sign for others with the same sorts of needs. A market can be made to suc- ceed, provided that high levels of stan~iza- tion and compatibility in both hardware and software can be achieved. Such levels of in- teroperability have, so far, been easier to achieve at commercial institutions such as IBM Research than at research universities. such as IBM Research than at research universities. 45 THE USE OF INFORMATION TECHNOLOGY IN RESEARCH

OCR for page 11