Trends and Issues in Information Technology

Advances in information technology offer unprecedented opportunities as well as new challenges in the international exchange of scientific data. Rapid improvements have led to ever greater computational speed, communication bandwidth, and storage capacity at costs within reach of even small—scale users—a trend that appears likely to continue well into the future.1 Moreover, technical advances in satellites, sensors, robotics, and fiber-optic and wireless telecommunications are extending the range of technologies affecting the acquisition, refinement, analysis, transmission, and sharing of scientific data.

In this chapter, the committee examines some of the concerns that rapid changes and growing reliance on information technology have raised with respect to the exchange of scientific data. Table 2.1 frames some of the profound advances in technology that are having an impact on access to and exchange of scientific data and thus on research-related capabilities. The committee's overview of associated technical trends provides some context for its discussion of six barriers to and concerns regarding global access to scientific information, including access by scientists in developing countries. Its recommendations for technical improvements to facilitate the international sharing of scientific data are addressed to a range of participants.

OVERVIEW OF TECHNICAL TRENDS

The committee's discussion focuses on 10 trends (Table 2.2) that represent major forces of change in data and information technology. These trends interact with and reinforce each other, often further accelerating change and complicating



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 24
--> Trends and Issues in Information Technology Advances in information technology offer unprecedented opportunities as well as new challenges in the international exchange of scientific data. Rapid improvements have led to ever greater computational speed, communication bandwidth, and storage capacity at costs within reach of even small—scale users—a trend that appears likely to continue well into the future.1 Moreover, technical advances in satellites, sensors, robotics, and fiber-optic and wireless telecommunications are extending the range of technologies affecting the acquisition, refinement, analysis, transmission, and sharing of scientific data. In this chapter, the committee examines some of the concerns that rapid changes and growing reliance on information technology have raised with respect to the exchange of scientific data. Table 2.1 frames some of the profound advances in technology that are having an impact on access to and exchange of scientific data and thus on research-related capabilities. The committee's overview of associated technical trends provides some context for its discussion of six barriers to and concerns regarding global access to scientific information, including access by scientists in developing countries. Its recommendations for technical improvements to facilitate the international sharing of scientific data are addressed to a range of participants. OVERVIEW OF TECHNICAL TRENDS The committee's discussion focuses on 10 trends (Table 2.2) that represent major forces of change in data and information technology. These trends interact with and reinforce each other, often further accelerating change and complicating

OCR for page 24
--> TABLE 2.1 Advances in Technologies Relevant to the Generation and Exchange of Scientific Data and Information Technology Relevance Prognosis High-density storage and memory Capacity to deal with large volumes of data and high rates of transmission for today's science demands. Precipitous drop in cost. Holographic and high-density optical memory technology will enter market.a Encryption/authentication Ability to protect copyright, the privacy of individuals, and data integrity. Imbedded encryption in numerous products expected to make privacy and security applications manageable. Widespread application of public-key encryption Packet asynchronous transfer mode (ATM) communications Support for high-speed, flexible transmission of video and images. Long-term steady growth in ATM applications over high-speed fiber-optic links. Use of ATM within local area networks, competitive with other local area network (LAN) technologies (e.g., 100-Mbps Ethernet) Sensors Extension of the range of that which can be observed (more precision, more spectral range, higher sampling frequency, less calibration effort) New multispectral sensors, improved resolution, smaller and more numerous satellites. Additional terrestrial applications (e.g., agriculture). Small satellites (and inexpensive launches) Lowering of barriers to entry for remote sensing applications. Increased space and ground remote sensing activity. Broader array of applications.b Wireless (space and ground) communications Capability to enhance communication in remote areas or areas where post, telephone, and telegraph have limited capability/capacity. Worldwide access to voice and high-speed data transmission within 5 years. Wireless systems filling in to meet communications needs that ongoing investments in fiber cable or wireline systems have been unable to accommodate. High-performance computer processors Enhanced capability for computationally intensive science activities (e.g., models, transformation of large data sets). More expensive fabrication processes expected to cause reduction in supplier alternatives.c Potential for "single-electron" circuits. d Moore's law applying through 2005, or longer. Robotics for exploration and for technology data transmission, from or to inaccessible places Improved autonomy of vehicles for ocean and atmosphere studies and for planetary missions. New frames for small submarines and pilotless aircraft (driven in part by military applications); "downhole" oil, gas, and geologic exploration; micro-electromechanical systems (MEMs) applications. e

OCR for page 24
--> Technology Relevance Prognosis Hybrid analog/digital computers Capacity for new sensing and reasoning power, helping machines to do "intelligent" work. Potentially rapid advancement portended by breakthroughs in computation and neural science research.f Language processing by computer Assistance in getting relevant information to scientists on time. Capability for speakers of different languages to improve collaboration. Improved filtering and organizing of information in response to the WWW information glut. Many graphical tools have capabilities to deal with special fonts and 16-bit character sets. Limited to "major" language pairs in the near term. Database technology (including information retrieval, "knowbots") Capability to deal with the extremely complex variety of information in natural sciences and medicine; support in organizing relevant, current information. Object-oriented databases entering the market; massively parallel representations (e.g., Paradise at University of Wisconsing). Widespread use of electronic agents to assist research. Fiber-optic communications Vastly increased capacity to accommodate rates of many gigabits per second. New, erbium-based systems expected to reduce power and improve reliability. h Improved connectivity using undersea cables.i a Demetri Psaltis and Fai Mok (1995), "Holographic Memories," Scientific American, November:70-76; Praveen Asthana and Blair Finkelstein (1995), "Superdense Optical Storage," IEEE Spectrum, August:25-31; Robert F. Service (1995), "Pushing the Data Storage Envelope," Science, July 21:299-300. b K.C. Cole (1996), "NASA's Mission: Think Small," Los Angeles Times, January 21:1. c A discussion of the increasing cost of semiconductor production capability is provided in G. Dan Hutcheson and Jerry D. Hutcheson (1996), "Technology and Economics in the Semiconductor Industry," Scientific American, 274 (1):54-62. d Chappell Brown (1996), "Electron Switching Simplified," EE Times, January 8:35. e MEMS are nanoscale machines. Applications include instrumentation within living tissue. See <http://mems.isi.edu/>. f R. Colin Johnson (1995), "Mead Envisions New Design Era-Analog and Digital Techniques to Create a New 'Art Form'," EE Times. g David J. DeWitt, (1994), "The Trend Toward Object Oriented DBMS," briefing to NASA. See <http://www.hq.nasa.gov/office/oss/aisr/1994_minutes.html>. h George Gilder (1997), "Fiber Keeps Its Promise," Forbes ASAP, April 7:90-94; and Frank J. Denniston and Peter K. Runge (1995), "The Glass Necklace," IEEE Spectrum, October:24-27. i See <http://www.teleport.com/-simoriah/scow/sub.htm> for more information on undersea cables and plans. Also, AT&T has announced support of "Africa One," a $1.9 billion project intended to link coastal countries in Africa.

OCR for page 24
--> application choices. Each is discussed below, and their actual or potential effect on international scientific data exchange among the member countries of the Organization for Economic Co-operation and Development (OECD) is broadly characterized. The impact of these technical trends on access to scientific data in developing countries is also discussed. Decreasing Cost of Computing and Communications The cost of owning and operating increasingly powerful computers has dropped dramatically over the past several decades. Today's personal computers, for example, offer the processing speed of workstations of fewer than 5 years ago at a fraction of the cost. The availability of information technology products with ever-increasing computing, communication, and storage capability has contributed to the ubiquitous assimilation of computers into modern daily life, and complex applications taking advantage of continually improving computer performance have emerged. Among other uses, information technology is being applied increasingly to product development, manufacturing, and distribution, as well as to new financial services such as debit/credit transactions and investment portfolio management. One effect of this phenomenon is an opportunity for "technology leapfrogging": late entrants to the use of information technology can enjoy the immediate advantage of low-cost systems, without having had to make earlier investments in more expensive and less capable technologies and then carry the burden of depreciation of that investment. Modern computing technology is thus increasingly accessible to low-budget endeavors as prices fall also to the press of mass production and competition.2 Even though the pace of change can be daunting to information technology newcomers, in general it should become easier and cheaper with time to obtain technology to participate in the global sharing of scientific information. In the context of the natural sciences, this means that scientists and other users in developing countries or in economically depressed regions such as those in Eastern Europe and the former Soviet Union are increasingly able to acquire new computing and communications tools for carrying out their work. Enhanced Capabilities for Collecting Scientific Data The natural sciences produce prodigious amounts of data. Earth observation and weather systems lead the way, with the potential for collecting terabytes3 per day. The same trends in low-cost microelectronics that are fueling the information and network revolutions also are driving the development of low-cost sensors and (relatively) low-cost storage systems. Major "big science" efforts such as the International Geosphere-Biosphere Programme (IGBP) and the Human Genome Project involve the collection and distribution of large volumes of data

OCR for page 24
--> TABLE 2.2 Summary of Technical Trends Affecting Exchange of Scientific Data and Information Trend Description Decreasing cost of computing and communications; "technology leapfrogging" Following Moore's Law, the cost of computing, data storage, and communication has fallen consistently for more than 25 years. Developing countries and the newly independent states of the former Soviet Union have, in some cases, been able to acquire modern communications and computing equipment. New users have been able to avoid substantial capital expense and the burden of depreciation of that investment. Enhanced capabilities for collecting scientific and other data The collection power of their instruments enables major scientific enterprises such as the Human Genome Project, climate modeling, and satellite remote sensing studies to generate very large volumes of data. Increasing exploitation of broadband networks and emerging dominance of the video data type in networks The investment in fiber-optic cable over the past two decades is increasingly being exploited to support demanding new applications with high-capacity or real-time delivery requirements (video, medical imaging, large-scale science). The entertainment industry and new applications such as video teleconferencing, movies on demand, and interactive television have attracted substantial investment and will be the dominant factors in the development of networks in the next 10 years. Voice communication will require a minor share of telecommunications capacity. Advent of digital wireless communications Wireless networks are rapidly connecting the world in new ways, and at low cost. Ground-based wireless systems are creating modern infrastructure in cities that have had unreliable phone systems with inadequate capacity. Proposed satellite ventures will provide data and voice connections on a global basis.

OCR for page 24
--> Shifting dominance in data networks from primarily science/defense to commercial/entertainment applications The Internet was developed to support advanced science and technology activities. Recent changes (in particular, the advent of World Wide Web browsers) have transformed the Internet into a tool for a vast array of both commercial and noncommercial applications (including shopping, entertainment, education, and general publication). Increasing facility in collaborative work    Teams of scientists (remote from each other and often in different countries) are able to work together on a project, facilitated by high-performance communication for active, real-time interaction with each other using data and other information resources. Increasing capabilities for language processing Machines using natural language processing techniques are helping to organize the vast amount of information available in electronic form. New tools are providing transparent access (via rudimentary machine translation) for speakers of the world's major languages. Increasing recognition of the importance of standards Standards provide the means for interoperability and help to support competition and product evolution. Recognition of the role of standards (whether de facto, industry driven, or supported by formal national or international bodies) has grown, further accelerating the acceptance and applications of standards. Growing acceptance of a need for cooperation in monitoring and controlling network activity Mechanisms have been built into authentication systems, retrieval systems, and networks to account for specific activities of users and to support flexible billing systems. Public-key encryption technology is increasingly accepted as a means to protect data and authenticate users. This activity is being driven primarily by the needs of commercial users of the network. Increasing use of intranets The use of dedicated networks, particularly among private firms, is growing.

OCR for page 24
--> and data products. Other observational science and engineering projects 4 involving large-scale models, simulations, or sampling volumes also produce enormous quantities of data. NASA's Earth Observing System (EOS) is perhaps the best known example of a high-volume, long-term scientific observational system.5 EOS is expected to collect a terabyte per day of satellite sensor data by the beginning of the next century. The desire to collect, manage, and preserve scientific information always appears to exceed the financial and technical capabilities to do so, even in the wealthiest nations. Scientific communities must organize themselves better to select information for acquisition and for retention. Advent of Digital Wireless Communications Wireless communications received a major boost from the effort to develop mobile communications systems in the United States. Interest and investment also have been stimulated by the possibility of creating competition in local telephone service, heretofore a 100-year monopoly. Moreover, the end of the Cold War has forced aerospace companies to seek new markets for satellite technology, including direct-broadcast television and satellite-based cellular telephony. Wireless communications links are being installed worldwide, enabling mobile communication—and, for developing countries and other nations with historically weak telecommunications infrastructure and rapid growth, avoidance of much of the capital cost of a wired communication infrastructure. New competition will drive down the cost of telephony and offer new applications. Video broadcast from space or from fixed terrestrial sites may offer new ways to deliver data in interactive communications systems. Increasing Exploitation of Broadband Networks and Capabilities for Transmission of Video Data Commercial providers believe that new applications such as video conferencing, interactive television, and the ability to access movies on demand from a large archive will be the dominant factors in the development of networks over the next 10 years. Voice communications will require an ever smaller share of telecommunications capacity. The widely discussed convergence of personal computers and television has been accelerated through the widespread licensing of new tools for interactive World Wide Web (WWW) applications and through emerging standards by which cable television companies can provide high-speed Internet access. Much of this activity is driven by the goal of providing interactive access to large video databases in "real time" (at least 1 megabit per second). New, higher-bandwidth protocols such as the high-performance parallel interface (HIPPI), the first gigabit-per-second standard begun at Los Alamos Na-

OCR for page 24
--> tional Laboratories in the early 1990s, and the MBone (a virtual multicast backbone network for delivery of audio and rudimentary real-time video across the Internet) are being developed. In the short term, however, the impact of high-bandwidth applications will be negative (especially for high-data-rate users in OECD countries), since the need for higher bandwidth has already been outpacing bandwidth improvements, both on major backbone networks and on bridges between them (see the section below titled ''Specific Technical Concerns"). Shifting Dominance in Data Networks The international public infrastructure for data communications is built around the Internet. Originally developed in the United States by the Department of Defense, the National Science Foundation, and other agencies to support scientific and technical collaboration, 6 the Internet now serves a much wider range of purposes. In recent years, it has become a high-visibility source of entertainment as well as an indispensable tool for many commercial and noncommercial applications (e.g., catalog sales, news, social interaction, dissemination of company and product information). Advertisers use the Internet to promote themselves and their wares as "high tech" and, moreover, view the current demographics of Internet users (who have disposable incomes that are typically much higher than average) as extremely favorable. In 1995, the total number of commercial (".com") sites on the Internet grew to exceed the number of educational and government sites for the first time, and this continues to be the sector of most rapid growth. For example, the percentage of Web sites on the Internet running from the ".com" domain in the United States increased from 1.5 percent in June 1993 to 50 percent in January 1996.7 This trend toward commercial use of the Internet could have a significant impact on the scientific community. What has been until now a government-subsidized activity could become a significant cost factor to scientists as networks become privatized. Further, the scientific community originally played a major role in developing the technologies and standards for the Internet, but this is no longer the case. Scientific activity will have to follow (and potentially benefit from or suffer because of) the standards and pace set by others. Increasing Technical Support for Collaborative Work Scientists are increasingly aware of the importance of information technologies that facilitate collaborative work. The electronic messaging capabilities of operating systems used widely in the context of the ARPANET and in private, commercial messaging systems, as well as text retrieval systems such as IBM's STAIRS, System Development Corporation's ORBIT, NASA's RECON, Battelle's BASIS, and the work at Cornell University by Gerard Salton on SMART, provided much of the early technical framework for knowledge man-

OCR for page 24
--> agement and sharing. In recent years, electronic mail (e-mail) systems, mailing lists, and bulletin boards have enabled rapid information sharing among groups of people distributed throughout the world. Other commercially available computer-based tools and technologies have enhanced collaborative work by facilitating cooperative research involving, for example, the use of remote instruments, and electronic data publishing that speeds the dissemination of research results. 8 Indeed, the success of many complex scientific investigations now is predicated on bringing the capabilities of diverse researchers from multiple institutions together with state-of-the-art instruments. In addition to the purely technical issues raised by these requirements, however, the research agenda for creating such "collaboratories" must address fundamental psychosocial questions.9 Desktop video conferencing is a next logical step in the use of collaborative tools and may be as widely available within 10 years as e-mail is currently, provided that adequate bandwidth can be supplied. Users can now obtain rudimentary desktop video conferencing systems for as little as $100 using the CUSeeMe software from Cornell University; 10 such systems provide crude service today but offer great promise. The Internet Engineering Task Force (IETF) and several universities are using the MBone to broadcast symposia and conference events worldwide. 11 Video conferencing systems based on integrated services digital network (ISDN) services and asynchronous transfer mode (ATM) are now available commercially, offering high-quality images and advanced application sharing features.12 "Plain old" telephone service (POTS)-based video conferencing is expected to be available with the next release of major PC operating systems. The low cost of desktop video conferencing equipment and the ability to operate over a variety of media types will enable scientists who have access to these technologies to communicate more readily. These types of technologies can help improve the efficiency of scientific fieldwork, especially in remote areas, but only if they are supported by links with sufficiently high bandwidth. Investment in commercial products that support information sharing and workflow has accelerated as vendors recognize the importance of multiuser support to acquiring and sustaining market share. Growing Capabilities for Natural Language Processing Natural language processing has been an active branch of artificial intelligence for decades. Recent approaches and products have significantly improved automated document subject classification.13 In addition, the Internet has greatly increased interest in capabilities for indexing and locating knowledge, thus contributing to the rapid growth of the text retrieval industry. Users can now gain more rapid access to a wider base of scientific information.14 Moreover, numerous products (e.g., Fulcrum, Context, Limbex, InQuizit, Excaliber, Excite, Systran) and services (e.g., Digital Equipment's Alta Vista, Yahoo, Lycos, Dejanews, InfoSeek)

OCR for page 24
--> are now using natural language processing capabilities to help organize information. More advanced products from the U.S. government's Tipster project are maturing for "information robot" ("knowbot") applications, such as agent-based information gathering, data overload filtering, and extracting key facts from raw text. These new tools accelerate work by reducing the volume of information that needs to be evaluated. Slow but steady advances in machine translation are already beginning to produce acceptable levels of quality for some applications. New applications in handwriting and voice recognition as well as voice synthesis promise to bring the world's information resources within reach of many who previously had been excluded because of language differences or disability. The development of new language-processing capabilities is increasingly important as the historical dominance of English in data networks gives way to multilingual communications. The ability to perform automated language translation, though still crude, facilitates global data and information access by helping users with native languages other than English to participate in scientific activities. Although current investment is limited to a small number of the languages most widely used for political and economic purposes (e.g., English, French, Chinese, Japanese, Spanish, Russian, German), advancing techniques in language processing and computer power will make extension to new language domains less costly and time consuming. Some databases, such as the European Dictionaire Automatique, have been developed explicitly to facilitate machine translation and semantic analysis. Increasing Recognition of the Importance of Standards Standards play a major role in the evolution of telecommunication networks because of the importance of interoperability of these networks, which also must provide for continuous paths for improvement without disruption of existing infrastructure. In computing, vendors put substantial effort into proprietary approaches to protect market share. But the U.S. government's championing of "open systems" and the Portable Operating System Interface for Computer Environments (POSIX) standards has allowed a new class of vendor to emerge and create entirely new market forces with many suppliers in every niche of computing. IBM's decision to make the PC an open, standard product provided another major force toward standardization in computing. Standards for products and for the representation of information have advanced rapidly over the last decade. Industry standards such as Transmission Control Protocol/Internet Protocol (TCP/ IP), Simple Message Transfer Protocol (SMTP), Simple Network Management Protocol (SNMP), X.400, Standard Generalized Mark-up Language/HyperText Mark-up Language (SGML/HTML), and easy-to-use browser products such as Netscape and Mosaic were necessary for the rapid expansion of the Internet. Companies still use proprietary approaches to gain short-term market advantages,

OCR for page 24
--> often with the hope that their products will become the standard (e.g., Microsoft's OLE). Sun's Java language is an interesting example of a company-sponsored effort that is becoming a standard through rapid expansion of licensing agreements. The marketplace today often converges rapidly on one or a few standards, the standard for a high-density CD/ROM (and, more recently, digital versatile disks) being an excellent example.15 The music and entertainment community realized that competing standards would risk an expensive competitive battle. Other examples include, among many others, the widespread application of HTML. Technical standards increase competition and product availability, while reducing price. The downside is that standards themselves evolve and can contribute to a kind of industry-driven obsolescence. Also, when multiple standards apply in the same area, buyers are forced to try to choose prospective winners and losers (recall the battle for consumer support of the Beta and VHS standards). Within the scientific disciplines, there is increased attention to system interoperability in terms of both data and software. In the astronomy community, for example, the interchange of data has become fairly simple because of effective coordination in the United States and internationally. Radio astronomers developed a voluntary standard format for data interchange (the flexible image transport system; FITS) that was widely adopted in the astronomy community during the 1980s. This standard is maintained by an international committee, with support from several organizations, including NASA. There are related standard formats for planetary data, as well as a trend toward the development and adoption of a few comprehensive data analysis systems that could be used with a variety of types of astronomical data from different observatories and instruments and different subdisciplines. Sharing of analysis software and commercially developed computing tools among the different systems is encouraged. Of course, the need for standards for effective data exchange is not confined to telecommunications, computer languages, and storage media. Even within a narrow discipline or subdiscipline, true data exchange with proper interpretation of numbers, symbols, words, and graphics depends on standards for data structures, database management systems, and even terminology. Cooperation in Monitoring and Controlling of Network Activity The rapid growth in networks over the last 15 years has led to the need for appropriate levels of cooperative monitoring and control. Initial ad hoc activity in developing protocols such as SNMP has given way to more elaborate standards and tools today. Authentication systems, retrieval systems, and networks can now account for specific activities of users and can support flexible billing systems. Public-key encryption technology is increasingly accepted as a means of protecting data and authenticating users. Such developments are being driven by needs associated with the network as a market place. Version 6 of the Internet Protocol (developed by IETF and often referred to

OCR for page 24
--> TABLE 2.3 Summary of Major Technical Barriers to International Transfer of Scientific Data and Information Concern Impact Internet congestion is becoming a serious problem. Scientific activities are disrupted through lack of control of network capacity. High-bandwidth applications are impeded or blocked, and urgent communications are slowed. Description and indexing of data are inadequate to support their use by others. Data must be transformed, recomputed. Data cannot be located, and there is potential for error. Cost and delay in performing scientific work are generally increased. Electronic storage media have limited life spans. Data are lost or rendered unusable in the absence of long-term commitments for transferring them to new media on a regularly scheduled basis. Tools for authentication and privacy are immature. Data and networks are vulnerable. Valuable assets (data and infrastructure) could be lost or corrupted; intellectual integrity could be compromised. If tools depend on restricted (for export) encryption technology, barriers to free, protected exchange of information could emerge. Encryption technology for digital identification, authentication, and privacy safeguards might remain inconsistent from country to country, limiting information commerce. Alternatively, data would remain unprotected. Scientific requirements for computer technology could be left unsatisfied by priority support for other larger market needs (e.g., entertainment and business). Increased expense to scientists for equipment specifically tailored to research needs (e.g., supercomputers). There is the risk that specific requirements will not be met. Few international networks support real-time data. Lack of support for conferencing and collaborative work, large file transfer, or shared scientific infrastructure. collectively dictated its priorities and use. Today, the network is available for a wide range of activities, which sometimes significantly reduce or block access by scientists. Even scientific events, such as the Shoemaker-Levy Comet's crash into Jupiter, have caused such high lay public interest and associated high-volume transfer of images that scientific access to research resources has been impeded or blocked. Requirements for carrying out scientific research, the results of which are often at the service of the world community on an urgent basis, now are not being served with responsive mechanisms that give them priority.

OCR for page 24
--> Generally, the rate of international exchange of scientific information has risen steadily as the means to carry out such exchange have improved. Now scientists on different continents commonly share ideas and data daily, or even hourly. A researcher in one country can remotely connect to a computer in a different country to perform calculations and data analyses and sometimes complete experiments. When linking to a remote computer, one expects—or at least hopes—to transfer information between the local and remote computers at approximately the rate at which data move in the local computer alone. As the Internet developed in its first decade, it usually operated in this way. But as the Internet's popularity and use have grown, more people are expecting to get instantaneous service for all their activities. These include using the Web as well as linking to a distant site to extract data of all sorts (from accumulated electronic mail to numerical tables to animations) or to run programs and obtain the results remotely. As the speed and efficiency of computers have improved, scientists using them have generated and have expected to obtain ever more complex kinds of data. Images, particularly animations, require extremely large data sets if they are to be transmitted electronically.22 In fact, the burgeoning interest in images and video animation has led to a dramatic increase in the amount of information people want to transmit over the Internet, especially as more people use the Web and as the Web serves more commercial and entertainment functions. This explosion of use has strained the carrying capacity of the Internet, particularly for many of the most heavily used intercontinental links.23 Direct trans-Atlantic links between the United States and Germany, for example, functioned virtually as efficiently as local area networks until sometime in 1993 or 1994. Then delays began to occur at about 12:00 or 1:00 p.m. GMT when users on both sides of the Atlantic are active. The delays have grown ever longer, as has the time period during which delays occur, so that now, from about 8:00 a.m. until midnight GMT, delays can be so lengthy that the user is "timed off' by the system in the middle of a (delayed) transaction—not once, but many times. This problem apparently can occur even with data exchanges such as e-mail, which computers transmit whenever the lines are open. In nations with very limited network or gateway facilities, e-mail may take a day or more to be transmitted to or from some countries. Since the inception of this study network congestion and delays have become severe. If the Internet were to become saturated, it would be rendered ineffective as a means of transmitting scientific information directly. Although there are several satellite and undersea fiber-optic cable systems currently being developed that may be expected to supply sufficient transmission capacity worldwide, some near-term remedies will be necessary to ensure that scientists and others with professional needs will be able to continue using the Internet with at least moderate efficiency, until the new systems become operational.

OCR for page 24
--> Inadequate Description and Indexing of Data Responses to the committee's "Inquiry to Interested Parties" (see Appendix D) revealed the lack of common representations for data to be the primary technical challenge for international scientific exchange. Interdisciplinary, collaborative work builds on shared understanding and agreement on terminology. Standards for representation of data, including units and formats, as well as description (metadata), are vital. Effective directories and navigation tools are needed to help scientists locate relevant information. Shared understanding of the operation of algorithms is important to the application of information. This understanding must also evolve coherently over time as algorithms, standards, and collection instruments are developed. For all scientists, the lack of shared understanding can lead to duplicated effort, additional work to "normalize" data, or limited capability to integrate research results. In extreme cases, information collectors duplicate each other's information, because they have no a priori agreement on effective data representations. Problems of data compatibility and integration, even within the United States alone, were reviewed in some detail in a 1987 CODATA conference from three different perspectives: government, geography, and technology.24 Rapid Obsolescence of Electronic Storage Media The media on which scientific data are stored are vulnerable to decay and obsolescence. The standard lifetime of a particular disk or tape appears to be less than a decade; the data stored on these media must be copied or refreshed at regular intervals. A recent National Research Council study25 discussed the effects and implications of long-term commitments in scientific data management, with respect to both selection of data for long-term retention and media obsolescence. Further (and paradoxically), data collected before the advent of computers and stored on "archival media" (paper) must be put into electronic form to be used widely and effectively today. Such data can add enormous value to research efforts, particularly for studies examining long-term trends, but are costly to transform.26 Valuable records may fail to be transferred to new media or transformed to electronic form because of a lack of resources (funds or appropriate equipment) or lack of motivation. Scientists without long-term support commitments will face the discouraging fate of losing precious data assets over the long term. With the extraordinary volumes of data being collected, transferring data to new media and managing high-value data sets for active use will increasingly challenge the scientific community, particularly since the time frames for rescuing old, deteriorating data are frequently quite short. Examples include scientific publications printed on high-acid paper and data sets stored on magnetic tapes that are crumbling, some after only a score of years.

OCR for page 24
--> Vulnerability of Electronic Data Networks Some scientific data must be treated with special care to ensure their dissemination only within a prescribed community (e.g., to protect the privacy of individuals, to allow for verification of results, or to maintain the proprietary advantage of a private enterprise). Today, tools for authentication and for protecting privacy of data are difficult to use, do not follow widespread standards, and, in some cases, involve encryption technologies that cannot be universally distributed. Such tools, however, can help researchers to maintain control over the research environment. Because effective use of authentication and privacy measures involves the collaborative effort of numerous scientists or institutions, "top-down" leadership in standards setting across the scientific community may help speed the acceptance and use of emerging tools designed primarily for electronic commerce on the Internet. Beyond the basic issue of protecting information privacy and integrity, the scientific community must prepare itself for disruptions of basic network and computing infrastructure. The complexity of software and networks, the large number of users, the dynamic changes in staff, and the relative sophistication of programmers worldwide leave networks vulnerable to attack and catastrophic accidents. We already have experienced large-scale disruption of the Internet and telephone systems. International scientific data collection and dissemination activities are similarly vulnerable to both intentional and unintentional disruptions. A proper balance thus needs to be maintained between open, but vulnerable, access and secure, but not overly rigid, control. Scientific Requirements for Computer Hardware Potentially Unmet The demands of entertainment (in particular, interactive multimedia, animation creation, and delivery of large numbers of simultaneous video channels) are driving the frontiers of computer and communications technology. It is possible that computers and networks will be optimized for entertainment applications, making it difficult for some scientific endeavors to have access to cost-effective computer power. For example, a computer optimized for video streaming might not be suitable for running a large chemical model or ocean simulation. In the past, scientific applications—and funding for the advanced computers to support them—have driven computer design. Advances such as floating point and vector accelerators, massively parallel computers, gigabit networks, large-volume storage media, and visualization software were developed because of scientific needs. Continued funding on an international basis for research leading to these kinds of advances is necessary if vendors are to respond to the technical needs of science. The goal is to incorporate advanced features for scientists within commercially available products.

OCR for page 24
--> Lack of Sufficient International Real-Time Data Networks Scientists increasingly need real-time communications capabilities for collaborative scientific activities and optimal use of major experimental and observational facilities. Advanced networking and computing services can make large file transfers practical and provide for remote access to (and control of) large-scale scientific and medical facilities. However, even with the growth of new telecommunications capacity, the availability of high-bandwidth circuits to data acquisition and analysis sites and to the desktop will continue to lag behind demand. Also, the Internet protocols that are now in wide use do not effectively support time-synchronized activity. Circuits to the developing countries will be limited to the relatively low speeds of voice circuits until new (submarine and extension fiber) cable connects all corners of the world and affordable capacity becomes available. Wireless communication systems will operate at lower speed than comparable wired services for the most cost-effective use of spectrum. In some countries the existence of outmoded government-operated facilities could impede the development of new, high-speed or alternative-capacity links because government-run post, telephone, and telegraph ministries (PTTs) are used to subsidize nontelecommunication governmental activities and maintain monopolistic control over access and use. DATA ACCESS ISSUES IN DEVELOPING COUNTRIES Although at first sight, the gap between the "haves" and "have nots" for access to scientific data and information seems to widen each day, the long-term outlook for such access in developing countries is far better than it was before the advent of electronic communications, primarily because the cost of the technology continues to decline even as the capabilities improve. It is potentially more cost-effective to buy computers, networks, and mirror sites of the libraries and data centers in developed countries than to try to maintain autonomous libraries with up-to-date collections of books, journals, and data compilations. As scientists in developing nations obtain computers with connections to networks linking them to international collections of scientific information, they greatly increase their research capabilities. Low-cost computers and modern software approaches are available to help developing countries "leapfrog" multiple generations of equipment and approaches. Satellite ventures are planned to provide worldwide access for short messages, voice telephony, and broadband digital links.27 Direct-broadcast television also is having an impact by offering hundreds of channels of high-quality video to a growing percentage of the world's population. Many of the developing countries may soon be expected to be the beneficiaries of a broadly distributed modern telecommunications infrastructure. Scientific efforts in these countries will be supported, even in remote areas. For

OCR for page 24
--> example, the planned Teledesic wideband satellite communications system has pledged to give excess capacity to developing countries for a variety of uses, including applications in education, science, and medicine.28 Direct broadcast television could serve to raise education levels. The anticipated low cost of desktop video conferencing equipment and the ability to communicate with multimedia functions, as described above, can enable scientists and others in the less developed countries to participate more fully in global scientific research, subject to the availability of high-bandwidth transmission capabilities and the mitigation of local cost barriers. Hardware and software for electronic communication in the sciences therefore offer particularly high leverage for return on investment in foreign aid to developing nations. Unfortunately, many of these technologies are not yet widely available in the most developed nations, much less in the developing countries. For example, roughly half the nations now served by some form of Internet connection have access only to electronic mail.29 Even in those countries with at least one full-service Internet node, the proportion of users actually accessing all services is low. In addition, as noted throughout this report, full Internet service does not automatically mean full access to information. For example, the University of Chile has high-speed Internet service. Two components of the U.S. National Institutes of Health-the National Library of Medicine and the National Cancer Institute—provide Chilean researchers with excellent Internet access to abstracts from journals through outreach programs. Although requests for search results are answered quickly, the journals themselves usually are not in the Chilean libraries. Thus the researchers frequently must wait for months to receive the full-text reprints of published papers. This situation is improving, however, and the system will become truly effective with full-text on-line access to journals. Nevertheless, most scientists in less developed nations now have little or no access to the Internet and the World Wide Web and still must depend on inadequate library facilities for full-text access to scientific data and literature. There are various other reasons for low Internet usage in developing nations, aside from the lack of infrastructure. These include internal institutional policies stipulating the placement of computers in administrative offices rather than in laboratories, governmental restrictions on the free flow of information, poor-quality telecommunications systems, and, most commonly, lack of funds for use of whatever communications infrastructure does exist. The costs of telephone services, for example, generally bear an inverse relationship to the per capita income of a country. International calls that cost $1.00 when originating in the United States frequently cost many times that when originating in a less developed country or a country where the tariffs are a general source of revenue for the government. For example, a call from Nairobi or Moscow to Washington, D.C., can cost seven times as much as the same call originating in Washington, with the higher costs often being borne by those least able to afford them.

OCR for page 24
--> Various strategies can help mitigate these differences or eliminate the end user's cost entirely. The Internet itself, by institutionalizing communications facilities, can make the costs transparent to the end user. However, persons in developing nations often must limit their Internet use or drop their subscriptions to list servers because of cost.30 One biologist in Indonesia who dropped his subscription observed that the communication costs per month were considerably more than his salary and that his institution was passing these costs on to the end users. A number of Kenyan scientists have obtained calling cards from U.S. providers and are directly dialing the United States. The billing is to their Kenyan address. People in other countries also have adopted this strategy. One current approach to reducing communication costs in African countries is the use of the message-forwarding Fidonet system, a low-cost network of individual computerized bulletin board services that uses regular dial-up telephone lines and high-speed modems to transfer electronic messages.31 Although most of Africa currently lacks direct TCP/IP Internet and WWW connections,32 individuals can send and receive electronic mail via the Fidonet service of the Association for Progressive Communications, a U.S. nongovernmental organization dedicated to bringing low-cost communications to developing nations throughout the world.33 On other continents, the situation is somewhat better with respect to direct Internet access. However, even where Internet connections do exist, access still tends to be spotty in all but the most prestigious or centrally located institutions. For scientists in developing countries, another difficulty is competition for access to large remote data sets, which is made even more difficult by the increasing volume of data, particularly from new observational sensors. In addition, given the vast amount of data being collected, small data sets that they might contribute may be viewed as less important, limiting the ways in which researchers in the developing countries can participate in the scientific community. One result of such disparities is the perception by some scientists in developing countries that the OECD countries take information but seldom return it on an equitable basis. Currently, developing countries severely lag the OECD countries in bandwidth for emerging applications.34 If the majority of communication in developing countries is wireless, end users may not be able to take advantage of the more bandwidth-intensive applications. Moreover, as noted above, problems arise even after advanced communication capabilities are installed. Transoceanic and intercontinental communication and exchange of scientific information must compete with all the other electronic traffic—increasingly business and entertainment. Unless bandwidth is improved, the ''information superhighway" becomes the electronic equivalent of many urban highways during rush hour. Furthermore, in many of the developing nations, the decreasing costs and increasing bandwidths that might be available generally are not passed on to the scientific end user by the government communications monopolies.

OCR for page 24
--> RECOMMENDATIONS ON ISSUES IN INFORMATION TECHNOLOGY Based on the areas of concern discussed above, the committee makes the following recommendations for improving technical support for the international flow of scientific data and information. The principal scientific societies and the Internet Engineering Task Force (IETF) should begin a long-term planning effort to assess the carrying capacity and distribution capability of the Internet, using projections of storage and transmission capacity and of demand and taking into account the next generation of Internet protocols. Scientific societies should encourage their publication committees to maintain contact with the IETF and keep their members abreast of advances in technologies useful for scientific information management. One option that science societies and government science agencies should evaluate is the creation of dedicated international science networks, such as the Internet II now being developed. To improve the technical organization and management of scientific data, the scientific community, through the government science agencies, professional societies, and the actions of individual scientists, should do the following: Work with the information and computer science communities to increase their involvement in scientific information management; Support computer science research in database technology, particularly to strengthen standards for self-describing data representations, efficient storage of large data sets, and integration of standards for configuration management; Improve science education and the reward system in the area of scientific data management. Provide incentives and recognition for papers dealing with data representation standards, archiving strategies, data set creation, data evaluation, data directories, and service to users. Encourage the funding of data compilation and evaluation projects, and of data rescue efforts for important data sets in transient or obsolete forms, especially by scientists in developing countries where substantial cadres of highly educated scientists exist who are underemployed and relatively inexpensive to support. U.S. government science agencies, working with their counterparts in other nations, should improve data authentication and apply security safeguards more vigorously. They should implement the means to protect data, including safe storage of data copies, and support policies that make it easier to exchange encryption technology.35

OCR for page 24
--> Government science agencies also should continue funding for research and development in information technologies that are important to the pursuit of science. Examples include high-performance computing and communications, advanced database technology, higher-density storage media, and basic research in microelectronics. A consortium of intergovernmental and nongovernmental organizations concerned with the international exchange of scientific data and information—including the International Telecommunications Union, the World Bank, the U.N. Environment Programme, U.N. Industrial Development Organization, U.N. Commission on Economic Development, and other Specialized Agencies of the United Nations, as well as the International Council of Scientific Unions—should mount a global effort to reduce telecommunications tariffs to scientists in developing countries through differential pricing or direct subsidy. This reduction in tariffs would have to be coupled with more timely access to new telephone lines in some countries. The result would be increasing rates of scientific data transfer in the developing countries and a significant improvement in their research capabilities and economic development. Foreign aid to developing countries in the form of computers, computer networks, and associated software, coupled with the training and resources necessary to operate and maintain those technologies, should be given high priority, on the basis of the potential for long-term socioeconomic returns. The communication systems must have adequate carrying capacity to meet growing demand. NOTES 1.   Moore's Law, named for Intel founder Gordon Moore, predicts that the density of microprocessors will double every 18 months, thus halving the price. The by-product of this long-lived phenomenon has been the doubling of processor speed in the same 18-month period. Moore's "Law" is in fact a representation of the speed of change of the microelectronics industry over the last 20 years. It is expected to continue to apply to technology change for at least the next 5 to 10 years. See Ashley Dunn (1996), "The Demise of Moore's Law Signals the Digital Frontier's End," New York Times, August 14, at <http://www.nytimes./com/library/cyber/surf/0814surf.html>. See also <http://www-us-east.intel.com/product/tech briefs/man_bnch.html>. 2.   See Brian Grimes (1995), "Modeling and Forecasting the Information Sciences," Knowledge Science Institute, University of Calgary, at <http://ksi.cpsc.ucalgary.ca/articles/BRETAM/InfSci/> for a discussion of exponential change in the performance of these technologies and its impact. 3.   One terabyte is 1012 bytes, or 1,000 gigabytes. It is roughly the equivalent of 40,000 4-drawer files holding 500 million pages of paper documents. 4.   Oryx Energy Co. estimates that in petroleum prospecting a three-dimensional seismic survey of a 3-square-mile block in the Gulf of Mexico involved hundreds of gigabits of data, requiring three months of supercomputer time to digest. See J. Dubashi (1990), "Images and Imaginations," Financial World, 24 (July):8.

OCR for page 24
--> 5.   See NASA's EOS Project Science home page at <http://eospso.gsfc.nasa.gov/> for additional information and related sites. 6.   For a description of the origin of the Internet, see Vinton Cerf (1996), "Computer Networking: Global Infrastructure Drive for the 21st Century," On the Internet, 1(5): 18-27. 7.   See Matthew Gray of the Massachusetts Institute of Technology (1996), "Web Growth Summary," at <http://www.mit.edu:8001/people/mkgray/net/web-growth-summary.html>. 8.   National Research Council (1993), National Collaboratories: Applying Information Technology for Scientific Research, Computer Science and Telecommunications Board, National Academy Press, Washington, D.C. 9.   Richard T. Kouzes, James D. Myers, and William A. Wulf (1996), "Collaboratories: Doing Science on the Internet," IEEE Computer, 29(8):40-46. 10.   See the Cornell University CU-See Me Welcome Page at <http://cu-seeme.cornell.edu>. 11.   See the MBone Information Web home page, sponsored by ICAST Communication, Inc., at <http://www.best.com/~prince/techinfo/mbone.html>. 12.   See the Bits Scout Software home page at <http://www.bitscout.com/>. 13.   Kenneth W. Church and Lisa F. Rau (1995), "Commercial Applications of Natural Language Processing," Communications of the ACM, November:71-79. See also the home page of the American Society for Information Science at <http://www.asis.org>. 14.   A recent prototype is the Net Advance of Physics, a master hierarchical index of on-line review articles in physics. It is hoped that this free service eventually will be expanded to include the entire content of the physics e-print archives (see Science 272 (1996): 15-16). 15.   A technology going beyond CD-ROM is HD-ROM (high density-read only memory), originally developed at Los Alamos National Laboratory, which gets much greater storage density than conventional CD-ROMs at a fraction of the cost. It uses an ion beam to etch pins of stainless steel, iridium, or other similarly long-lasting materials. The etching is done in a vacuum, which allows the high densities, but the reading can be done in air. It is now in the process of commercialization. The DVD standard is being supported by Sony and Toshiba. See <http://www.islandtel.com/newsbytes/headline/dvddisputeen/dsincompro_350.html>. 16.   See the IP Next Generation (IPng) home page, developed by Robert Hinden of Ipsilon Networks, Inc., at <http://playground.sun.com/pub/ipng/html/ipng-main.html>. 17.   See the implementations IPng home page at <http://playground.sun.com/pub/ipng/html/ipngimplementations.html>. 18.   See the ATM Forum Communications home page developed by Raj Jain's group at Ohio State University in Columbus, Ohio, at <http://www.cis.ohio-state.edu/~jain/atmforum.htm>. 19.   Caryn Gillooly (1996), "Blazing a Trail in Intranet Usage," Information Week (September 9): 98-102; additional information on Intranet products can be found on the Internet Design magazine home page at <http://www.innergy.com>. 20.   See White House Press Release, "Background on Clinton-Gore Administration's Next-Generation Internet Initiative," Office of the Press Secretary, October 10, 1996, Washington, D.C. 21.   The economic aspects of Internet congestion are discussed in Chapter 4 in the section titled "Electronic Access and Internet Congestion." 22.   Text is far less demanding; for example, the 20-volume Oxford English Dictionary is available now as a single CD-ROM. However, storing a full-length movie on a single CD-ROM will be feasible only when the new, high-density CDs, with much greater storage capacity than the current variety, become available. 23.   See <http://www.nlanr.net/ISMA/Report/#background> for the results of the NSF-sponsored workshop on Internet-statistics measurement and analysis; also the Corporation for National Research Initiatives' home page at <http://www.cnri.reston.va.us>, and "The Interminablenet," The Economist, February 3, 1996, pp. 70-71. 24.   L.J. Allison, and R.J. Olson, eds. (1988), Piecing the Puzzle Together: A Conference on Integrating Data for Decision Making, U.S. National Committee for CODATA, Integrated Data

OCR for page 24
-->     Users Workshop, National Governors Council, published by the National Governors Association, Washington, DC: 264 pp. plus Appendices. For a more recent study regarding the barriers and issues inherent in data integration efforts, see National Research Council (1995), Finding the Forest in the Trees: The Challenge of Combining Diverse Environmental Data, National Academy Press, Washington, D.C. 25.   National Research Council (1995), Preserving Scientific Data on Our Physical Universe: A New Strategy for Archiving the Nation's Scientific Information Resources, National Academy Press, Washington, D.C. 26.   W. Grattidge, J.H. Westbrook, C. Brown, and W.B. Novinger (1987), "A Versatile Data Capture System for Archival Graphics and Text," Computer Handling and Dissemination of Data, ed. P.S. Glaeser, Elsevier Science Publishers B.V. (North Holland), CODATA; W. Grattidge, W.B. Lund, and J.H. Westbrook (1990), "A Data Capture System for Printed Tabular Data," Proc. 11th Int'l CODATA Conf.: Scientific and Technical Data in a New Era, P.S. Glaeser, ed., Hemisphere Press, Karlsruhe, FRG, 302-306; W. Grattidge, W.B. Lund, and J.H. Westbrook (1992), "Problems of Interpretation and Representation in the Computerization of a Printed Reference Work on Materials Data,'' Computerization and Networking of Materials Databases, Vol. 3, ASTM STP 1140, Thomas L. Barry and Keith W. Reynard, eds., American Society for Testing and Materials, Philadelphia. 27.   See the Norwegian University of Science and Technology Department of Computer Systems home page at <http://www.idt.unit.no/>. 28.   Patrick Seitz (1995), "Firms Battle for Spectrum," Space News, November 27:1. 29.   International Internet connectivity levels improve constantly and cannot be generalized about. For a continuously updated report, see the Internet Connecting Chart, copyright by Larry Landweber 1995, at <http://www.infopro.spb.su:8000/info/internet/tableeng.html>. See also the International E-mail Accessibility home page compiled by Oliver M.J. Crepin-Leblond at <http://www.ee.ic.ac.uk/misc/country-codes.html>; e-mail access information is based on International Organization for Standardization (ISO) standard 3166 names. For a survey of International Internet and K-12 Connectivity done by the NASA Science Internet Program, compiled by Antony Villasenor, see <http://nic.nasa.gov/ni/survey/survey.html>. 30.   See, generally, the series of articles in "Eye on Emerging Nations" in On the Internet, the Internet Society, Reston, Va. 31.   National Research Council (1996), Bridge Builders: African Experiences with Information and Communication Technology, National Academy Press, Washington, D.C. See also the African Academy of Sciences/American Association for the Advancement of Science (1993), Electronic Networking in Africa, AAAS, Washington, D.C. 32.   See the U.S. Agency for International Development's Africa Link home page at <http://www.info.usaid.gov/alnk/connect/conmap.html> for a detailed description of Internet and other types of network connectivity on the African continent. 33.   See <http://www.apc.org/about.html> for more information about the activities of the Association for Progressive Communications in Africa and other regions of the world. 34.   See Trudy Bell, John Adam, and Sue Lowe (1996), "Communications," IEEE Spectrum (January):40. 35.   See National Research Council (1996), Cryptography's Role in Securing the Information Society, Computer Science and Telecommunications Board, National Academy Press, Washington, D.C.