Summary and Findings: Research for National-scale Applications
RESEARCH CHALLENGES OF CRISIS MANAGEMENT
Crises can make enormous demands on the widely distributed information resources of a nation (see summary in Box 3.1). Responding to the Oklahoma City bombing disaster required a national call for search and rescue experts and their tools to help find survivors and to reinforce unstable areas of the damaged Alfred P. Murrah Building so that rescuers could enter safely, as well as massive coordination to focus a diverse set of teams on a common goal. Hurricane Andrew and the Northridge, California, earthquake caused widespread devastation and placed pressure on relief authorities to distribute food, water, shelter, and medicine and to begin receiving and approving applications for disaster assistance without delay.
Crises often bring together many different organizations that do not normally work together, and these groups may require resources that they have not used before. To mount an organized response in this environment, crisis managers can benefit from the use of information technology to catalog, coordinate, analyze, and predict needs; to report status; and to track progress. This kind of coordinated management requires communications networks, from handheld radios and the public telephone network to high-speed digital networks for voice, video, and data. Rapidly deployable communications technologies can help relief teams communicate and coordinate their actions and pool their resources. Crisis managers also need computers to help them retrieve, organize, process, and share information, and they rely on computer models to help them analyze and predict complex phenomena such as weather and damage to buildings or other structures.
NOTE: See Chapter 1 for a detailed discussion of the crisis management characteristics and needs that create demands for computing and communications.
When disasters occur, the public deserves and demands a rapid response, and so the ability to anticipate events is at a premium. For example, when a hurricane approaches, relief agencies deploy mobile communications centers to places where sophisticated computer models predict the storm will strike land. Damage simulations help planners decide where to send food, medicine, shelters, blankets, and other basic necessities even before the damage has occurred. As the response to California's Northridge earthquake demonstrated, relief agencies can
use computer simulation to speed the approval of disaster relief (e.g., home rebuilding loans) in areas that the model estimates are hard hit, even before agents have visited the site.
Preparing for and responding to crises place demands on information technology that cannot be satisfied readily with existing tools, products, and services. These unmet demands point to many promising research directions, which this chapter summarizes. They encompass most aspects of computing and
communications technology, including networks and the architectures and conventions that organize and manage them; the services and standards that unite a network of communications devices, sensors, computers, and databases into a useful information infrastructure; and the applications that rely on that infrastructure.
One common thread among the steering committee's findings is that some of the most severe technical challenges stem from the sheer scale of requirements that must be met. Scale in this context has many dimensions—the large number of people and devices (e.g., computers, databases, sensors) involved; the diversity of information resources and software applications that must be accessible; the amount of computing power needed to run models and process information quickly enough to meet the urgent demands of crises, along with the ability to
access and use that power rapidly and easily; and the complexity of the interactions among systems (both human and technical) that must interwork to deal with crises.
Another theme is that technologies must be easy enough to use that they complement people, rather than distract them from their mission. Technology does nothing by itself; people use technology, and designers and developers of technical systems must consider people and their methods of working as integral to the systems and their successful performance. For example, a secure, distributed information system may fail to remain secure in practice if it is so cumbersome that users ignore necessary procedures in favor of convenience. Too often, unfortunately, users are given too little consideration in the design of complex systems, and the systems consequently fail to be as useful as they could or should be. In the extreme case of a crisis, a system that is difficult to use will not be used at all.
Research on and development of computing and communications technologies that help crisis managers cope with extreme time pressures and the unpredictability of crises will likely be useful in other application areas domains.1 For example, breakthroughs in meeting the time-critical information discovery and integration requirements of crisis management would benefit broader digital library applications as well. Distributed simulation and the need to compose existing, legacy information sources and software smoothly into new, case-specific systems are among the overlaps with manufacturing. Secure, mobile communication in a crisis is also valuable for emergency medicine, particularly as confidential medical records begin to be communicated over networks. Tools that are easy to use in a crisis will probably also be usable for electronic commerce, which similarly must span a wide range of personal skills, computer platforms, and network access mechanisms.
Although many of the research issues identified throughout the workshop series are not new to the computing and communications research community, placing them in the context of crisis management and other national-scale applications suggests priorities and sharpens the research focus. The priorities fall across a spectrum. Research projects tied relatively closely to specific crisis management application needs are valuable both because of the potential benefit to the applications and for the advances they may produce in technology usable in other areas. Box 3.2 presents promising examples from the workshops.
To secure the full benefits of this application-specific research, there must also be recognition of the broader, increasingly interconnected context in which national-scale applications operate. These interconnections allow components to be called on in unforeseen ways. This presents powerful opportunities for creative new uses of resources, but only if technical challenges to these novel uses can be overcome. During Hurricane Andrew, for example, it was not only the difficulty of translating between different standards that delayed Dade County authorities from making data available to federal relief officials, but also their
The discussions between crisis management experts and technologists at the three Computer Science and Telecommunications Board workshops led to identification of a variety of compelling, application-motivated computer science and engineering research topics. A selection of these topics is presented here. It is not an exhaustive list of the technologies needed to solve problems in crisis management, nor does it imply that technological advances are crisis management's most dire needs. However, these topics do appear promising in terms of advancing the state of technology and testing broader architectural concepts.
|This page in the original is blank.|
hesitancy to share private data, which relates to the lack of reliable, in-place mechanisms for ensuring privacy and payment for those data. Therefore, applications require both efforts focused on specific needs and a broadly deployed information infrastructure, including services that help people and their tools to achieve integration, and standards and architectures that provide consistent interactions between elements at all levels.
Information infrastructure, of course, does not spring into existence from a vacuum. The workshops reinforced the observation that in crisis management and other national-scale applications, diversity—of people, organizations, methods of working, and technologies (e.g., databases, computers, software)—impedes creating national architectures from scratch. (See Box S.2 in the chapter "Summary and Overview" for further discussion.) Although it might be possible to imagine a single, uniform architecture that met crisis managers' needs for communications interoperability, data interchange, remote access to computation, and others, deploying it would not be practicable. The technical challenge of incorporating legacy systems into the new architecture would slow such an effort. In addition, many public and private organizations would have to agree to invest in new technologies in concert, but no single architecture could conform to all organizations' needs and work modes. Retraining and reorganizing organizations' processes to accommodate new systems would take time. Finally, crisis management illustrates that even a coherent architecture created for one domain would be called upon to integrate in unexpected ways with other domains.
Therefore, there is a need for—and the steering committee's findings address—research, development, and deployment efforts leading both to consistent architectural approaches that work on national scales and to general-purpose tools and services that make ad hoc integration of existing and new systems and resources easier. Specific applications, such as those listed in Box 3.2, should serve to test these approaches, to advance key technologies, and to meet important application needs. The organization of the findings reflects this view that both application-targeted and broader infrastructural research is needed. Finding 1 emphasizes the importance of experimental testbeds as a context for coordinating the crucial interplay among research, development, and deployment in one important and challenging application area, crisis management. Finding 2 highlights the value of investigating the features of existing national-scale architectures to identify principles underlying their successes and failures. These findings are discussed in the section "Technology Deployment and Research Progress."
The remaining findings identify architectural concerns that represent technological leverage points for computing and communications research investments, the outcomes of which could benefit many national-scale applications. The research underlying these findings is discussed in greater detail in Chapter 2. The findings abstract the common threads among the networking, computation, information, and user-centered technologies of Chapter 2 to indicate high-priority
application-motivated research needs that cross multiple levels of computing and communications. There is necessarily some overlap in the research issues discussed in these areas, because some technological approaches can contribute to meeting more than one architectural objective. These findings are presented in four subsequent sections:
- Support of Human Activities
Finding 3: Usability
Finding 4: Collaboration
- System Composability and Interoperability
Finding 5: Focused Standards
Finding 6: Interoperability
Finding 7: Integration of Software Components
Finding 8: Legacy and Longevity
- Adapting to Uncertainty and Change
Finding 9: Adaptivity
Finding 10: Reliability
- Performance of Distributed Systems
Finding 11: Performance of Distributed Systems
Outcomes of testbed and architecture study activities (see Findings 1 and 2) can and should inform future reexamination of findings in these architectural areas, which represent the best understanding of a range of technology and application experts in 1995-1996.
The findings frame research derived primarily from addressing the requirements of crisis management. However, the steering committee believes that such research would have much broader utility, because of the extreme nature of the demands that crises place on technology. In addition, many of the research directions relate to increasing the capabilities of information infrastructure to meet extreme demands for ease of use, integration, flexibility, and distributed performance, which will benefit any application using it. The findings are illustrated by practical needs identified in the workshops and examples of specific directions that researchers could pursue. These suggestions are not intended to be exhaustive, nor are they presented in priority order; deployment and experimentation are required to determine which approaches work best. However, they are promising starting directions, and they illustrate the value of studying applications as a source of research goals.
TECHNOLOGY DEPLOYMENT AND RESEARCH PROGRESS
The workshop series focused on applications partly in the recognition that computing and communications research and development processes depend on the deployment and use of the technology they create. This is true not only in the
The Committee on Information and Communications (CIC) has called for "pilot implementations, applications testbeds, and demonstrations, presenting opportunities to test and improve new underlying information and communications technologies, including services for information infrastructures." (CIC, 1995a, p. 8) The CIC plan anticipates these testbeds and demonstrations as fitting into three broad classes of user-driven applications related to long-term National Science and Technology Council goals:
The crisis management application area bridges all three of these classes. A testbed or technology demonstration that supported distributed training and planning exercises in crisis response, for example, could incorporate modeling of natural and man-made phenomena in real time; secure and reliable communications; interoperability and integration of existing information resources, such as public and commercial databases; and adaptive, user-centered interfaces.
Evolving the High Performance Computing and Communications Initiative to Support the Nation's Information Infrastructure, a comprehensive review of the High Performance Computing and Communications Initiative conducted by the Computer Science and Telecommunications Board (CSTB, 1995a), also supported the notion of emphasizing nationally important applications (what the initiative called National Challenges) to test and guide the development of basic infrastructure:
Recommendation 8: Ensure that research programs focusing on the National Challenges contribute to the development of information infrastructure technologies as well as to the development of new applications and paradigms. The National Challenges incorporate socially significant problems of national importance that can also drive the development of information infrastructure. Hardware and software researchers should play a major role in these projects to facilitate progress and to improve the communication with researchers developing basic technologies for the information infrastructure. Awards to address the National Challenges should reflect the importance of the area as well as the research team's strength in both the applications and the underlying technologies. The dual emphasis recommended by the steering committee contrasts with the narrower focus on scientific results that has driven many of the Grand Challenge projects.1
Testbeds for computing and communications technologies to aid crisis management would support the dual focus on applications and infrastructure by emphasizing the participation of crisis managers and technology experts in limited-scale deployments for training, planning, and to the extent practical, operational missions.
sense that efficient allocation of research investments should lead ideally to products and services that people want, but also in the sense that it is ultimately through deployment and use that technologists can test the validity of their theories and technical approaches. This is not a unique recognition; it fits within a stream of recent analyses, including a strategic implementation plan of the Committee on Information and Communications, America in the Age of Information (CIC, 1995a) and the Computer Science and Telecommunications Board review of the High Performance Computing and Communications Initiative (HPCCI), Evolving the High Performance Computing and Communications Initiative to Support the Nation's Information Infrastructure (CSTB, 1995a). The opportunities that the steering committee's first two findings identify for learning from study of deployed technologies, however, have not received extensive attention to date.
Finding 1: Crisis Management Testbeds
Testbeds and other experimental technology deployments can enable researchers and technologists to develop and test technologies cooperatively in a realistic application context (see Box 3.3). They can serve as a demanding implementation environment for new technologies and sources of feedback to identify and refine research objectives. Such testing is particularly important in progressing toward deploying national-scale applications, in order to verify theoretical concepts about the scalability of system characteristics, interoperability with other systems, and usability by people in realistic situations—all of which are difficult or impossible to predict in the laboratory.
Test projects and technology demonstrations are under way in most national-
scale application areas (NSTC, 1995). However, in civilian crisis management, additional government funding and leadership are required if the research and development benefits of these activities are to be realized. Crisis management is primarily a public service function, led and funded by government agencies. The relatively small operational budgets of federal, state, and local emergency management agencies do not include significant research and development funding. The small size of the commercial marketplace for computing and communications in civilian crisis management, in comparison to areas such as health care, medicine, commerce, and manufacturing, may limit the likelihood of large-scale industry investment in a testbed effort. Greater public investment has been made in the military context, but much of this is related to command and control in warfare, which overlaps with nonmilitary applications only partially. The JWID '95 (Joint Warrior Interoperability Demonstration of 1995, discussed in Chapter 1) military exercise was an exceptional case in incorporating civilian participation in a nonwar crisis response. JWID '96 will not repeat this activity, and no analogous large-scale exercise exists for civilian crisis management organizations to test and experiment with technologies on a similar scale.
Testbeds should provide a context for the participation of crisis management practitioners (such as the Federal Emergency Management Agency (FEMA) and state, local, and nongovernment emergency services organizations) in system testing and development. Application users' input is essential to assess the fit among systems, tools, and the needs of users and organizations and to ensure that technology is focused on usable, practical solutions. In addition, the workshops suggested that testbeds could make a direct contribution to crisis management by supporting realistic training and simulation. Training is crucial to effective crisis management. Workshop participants from the crisis management community made it clear that to be useful, training exercises must be realistic. High-fidelity simulations (including distributed simulation across networks) and field exercises in conjunction with technology demonstrations could help provide this realistic training. Realistic scenario-based "what-if" simulations could also be a useful tool to test operational plans and choices.
1. Establishment of one or more technology testbeds for computing and communications systems in crisis management would be valuable. In these testbeds, government, academic, and industrial researchers should work with application users—crisis managers—to test and validate technologies under research and development by subjecting them to demands in realistic applications, such as training and planning missions for civilian and military crisis management personnel through simulations and field exercises and, potentially, actual field operations.
Finding 2: Studies of Existing National-scale Information Infrastructure
Crises can occur anywhere, at any time, and their management and resolution may require expertise, information, and resources from around the world. The unpredictability of crisis locations and sources of aid leads quickly to the idea that what is needed is an infrastructure that increases the chance of harnessing far-flung information and computing resources on short notice to address a problem, anytime and anywhere they are needed. What are the technical characteristics that make it possible to create or leverage large-scale information infrastructure? How and where should these characteristics be deployed within the information infrastructure to enable ubiquity in the services it supports?2
Examples of narrowly focused but successful information infrastructure that functions on a large scale include bank electronic funds transfer systems (e.g., check clearinghouses), automated teller machine networks, airline reservations systems, and airline communications systems such as the Société Internationale de Télécommunications Aéronautiques
(SITA). More general systems include the Internet and applications it supports, including electronic mail and the World Wide Web. What aspects of these systems have led to their ability to scale—in terms of their complexity and the quantity and geographic distribution of their end points—and to accommodate diverse components and evolve over time? These and related questions should form the basis for an exploration of architectural imperatives for national-scale systems.3
As the workshops unfolded, it became apparent that many existing, large-scale computing and communications systems could be studied to understand better the general architecture of information infrastructure that can scale successfully to a national or global level. In this finding, it is proposed that an analytical research effort to study the design and operation of these systems could produce valuable results by exposing architectural and design features that contribute to successful operation and might be generalized to apply to many, if not all, information and communications systems.
Many benefits for development of new systems could arise from better understanding of existing systems.4 For example, in crisis management, access to specialty databases is often crucial; examples include hazardous substance databases, the general databases of the National Library of Medicine, databases at the Centers for Disease Control and Prevention, census databases, corporate human resources records, patient treatment databases, and the like. How are these databases organized and do methods exist that would permit them to interoperate more smoothly in a crisis? Replacing lost local communications resources is another common factor in crisis situations. Are there design principles for large-scale infrastructures that would support integration of rapidly deployed
emergency communications facilities with existing resources? Could an electronic "emergency lane" be made to work in some fashion? How can national and regional sensor networks (such as the Oklahoma Mesonetwork and NEXRAD (Next Generation Weather Radar) be integrated easily into other information processing systems? How can this be done on short notice in crisis situations? Relevant research should draw not only on developers or maintainers of major existing systems, but also on communities that typically use the systems.
2. Existing systems that provide widely accessible computing, communications, and information resources for applications such as electronic commerce, health care support, manufacturing, air traffic control, electronic reservations, and public information dissemination should be examined to identify and understand the critical features that make systems scalable, reliable, and broadly usable. The object is to improve understanding of what information infrastructures are, what components they should include, how they should be structured, what services they should provide, and how they serve the needs of particular applications.
SUPPORT OF HUMAN ACTIVITIES
In national-scale applications, people are a critical component—humans are "in the loop." Discussions with experts in a number of application domains revealed that although support of people-both as users and as integral parts of the system design—is of primary importance, this need often gets too little emphasis in system designs. To the extent that user roles do receive attention, it is frequently in terms of traditional views of information technology usage in a tool-user relationship, stressing easy-to-use human-machine interface technologies such as speech recognition and graphical displays. These are important and appropriate subjects for continued development; however, independent development of individual technologies will not extend the utility of information technology to the extent that national-scale applications require. Instead, these applications require an integrated approach to system design that recognizes that, increasingly, people and technologies work together as parts of a larger system; both sides provide inputs and add value. The challenge of integration complicates research design and adds to the difficulty of achieving useful results.
An example is an information system that integrates information from sensors, databases, and field workers and supports processing of the information by both computer models and human analysts. People affect the performance of systems both positively and negatively, and system designs should seek to improve and extend human performance. As components of systems, people may provide information; they may direct the activities of computer applications; they may check or verify that decisions or conclusions made by applications are reasonable; they may provide decisions when the applications reach an impasse
and cannot make decisions. People may make errors, particularly when under stress. The activities and needs of humans who are not computer specialists, but rather, specialists in their own application areas, must be considered as a component of any system architecture.5
Finding 3: Usability
Crisis management requires systems that people can use well without a great deal of specialized expertise. Improving the usability of systems for crisis management, as well as other national-scale applications, faces many challenges. Users are often geographically dispersed and have widely differing skills. Human-computer interaction technologies that will be deployed in national-scale applications cannot depend on high-performance computing and communications being available to all users; technologies must scale across widely varying systems and infrastructures. They must also adapt to a variety of needs. Usability is more than a technical issue. People use technologies that are deployed in the context of a social organization, and usability depends on meeting the user's needs and adapting to the user's capabilities in the context of organizational and social structures and objectives.
Basic preferences of users and their styles of interacting vary greatly depending on domain-specific practices, training, skill level, and the situation. A doctor on rounds may be accustomed to dictating observations, reading charts, and writing prescriptions. Speech recognition, ready access to sensor-supplied patient information and x-rays, and pen-based devices are examples of technologies and devices that, when integrated with patient and health care information systems, not only can enhance the effectiveness of patient care and the efficiency of the caregivers, but also can aid emergency personnel in times of crisis to obtain immediate and current information.
Progress is being made on many of the component technologies and mechanisms, but they remain largely fragmented among domain-specific and situation-specific modes of interaction. There is a need for interactive systems that can be tailored easily and quickly to fit the specific requirements and preferences of users. Research on modeling and understanding user needs in various situations should guide the development of techniques for tailoring interfaces.
In crisis management, situations arise in which transportation infrastructure has been damaged (e.g., during earthquakes). Mobile emergency personnel need very specific information about where to go and how to get there, taking into account the fact that status of local roadways and bridges may be in flux or unknown. They must be able to understand the information quickly; this is where research and experimentation with actual users under stressful conditions can help. In some situations, a brief ext message may be the best way to convey information quickly without leading to confusion; in others, a visual display such
as a map or aerial photograph may be appropriate, with the crucial locations highlighted automatically. Moreover, the information flow must be two-way. If a road is not where the database says it should be or a bridge has been knocked out, the mobile user should be able to mark that information on a handheld display and disseminate it to update other users about the situation. Appending a spoken annotation to the data, for example, could be an effective way to enrich the information value added. Implementing such technologies will require new presentation and format standards that link multiple data types easily, while conserving scarce radio bandwidth. The development of standards should be based on clear understanding of what factors matter most to users. Deployment in realistic crisis exercises and actual crises should serve as a means of exposing these factors and quantifying their value.
The granularity and specificity of information needed by different users at different times can vary greatly, as can the capability of users' equipment (e.g., display, storage, processing). Thus, technologies and capabilities to deploy multilayered data views derived from many sources to users with limited resources are needed. How this can be accomplished in one context is illustrated by the University of Oklahoma Center for the Analysis and Prediction of Storms' weather model discussed in Chapter 2: by zooming-in on one area of a larger picture—the state weather map—detailed, very specific information can be developed (e.g., local, time-specific thunderstorms). Mobile users obviously do not have the computation capacity available for this type of weather modeling or even to display the complete results of a regional simulation. Therefore, it would be valuable to be able to focus weather models on the scale and location of greatest interest to the crisis manager and rapidly present the information he or she most needs.
People are error prone, particularly if they are tired and under stress as is common in crises. Depending on the application and instance, human errors may be acceptable or disastrous; in either case, they often are not considered adequately in system designs. Applications should be able to cope with reliability problems caused by errors or failures in any component, including the human component. For example, adaptive mechanisms are needed to elide and compensate for errors and, ideally, to associate and propagate confidence factors with information that other users and applications can interpret.
Systems should also account for the obvious strengths that humans bring to applications—for example, unparalleled inferencing skills and vast arrays of experience and knowledge. Because crises overload people with demands on their time and attention, their strengths should be applied where they are most effective. Formally defining complex problems in a way that identifies where uniquely human abilities are most effective would be a significant advance.
Security is another usability consideration that is particularly important in
crisis management. For example, if crisis team workers are setting up a network from the local police radio communications network and a local hospital network, the security mechanisms associated with the combined network must reflect the existing security policies of both parts, yet allow authenticated emergency workers access to appropriate information. This must happen rapidly and may have to be implemented by nonexperts in security, which implies a strong need for highly usable network security management tools.
3. Research is necessary to gain a better understanding of users' needs, capabilities, and limitations in working with computing and communications technologies in diverse contexts. This understanding should be used to develop principles for optimizing the effectiveness of people in the overall performance of applications in which humans and machines work together.
Suggested Research Topics:
- Enabling of natural, interactive communication across a variety of devices and mechanisms used by individuals with a wide range of skills, needs, and financial constraints (which directly affect available computing power, displays, etc.) should be pursued. In building on streams of research already under way in many areas, such as speech and language recognition, graphics and visualization, human-computer interaction, human factors, organizational behavior, and other subdisciplines, a principal multidisciplinary research challenge is to integrate approaches (e.g., multimodal interfaces combining graphics, text, speech, gesture) to meet specific needs and conditions.
- Information management tools are necessary to fuse data from multiple sources, filter information from a potentially overwhelming flood of inputs, integrate it, and present the most crucial information to users under severe time pressures, distractions, and other stresses. These should include a capability to adapt the information management processes in real time in response to user feedback about relevance, timeliness, understandability, and other factors.
- Visually representing the quality of information would be of value to users. It is difficult to represent the quality of three-dimensional-plus-time information (e.g., full-motion video). Error bars may be adequate for showing statistical variance of a data point (such as a sensor input or a casualty estimate) but are less so for indicating the probability that a data point has been misread accidentally or falsified deliberately.
- Visualization of complex data and/or large quantities of information is needed, along with interactive virtual environments that enable users to explore the effects of various assumptions about uncertain or missing inputs and possible future courses of events.6
Finding 4: Collaboration
Crisis management is collaborative. Crisis managers interact synchronously (through face-to-face meetings, by telephone, and so on) and asynchronously (through e-mail, voice mail, and information services). They share and organize information, contributing to information stores shared by others, and aggregating and otherwise adding value to diverse pieces of information to enable that information to be a basis for planning, decision, and action. They share work flow, with deadlines for decisions and plans, schedules for actions, and precisely timed operational activities.
In crisis management, collaboration is currently dependent on human transportation—bringing together the key players so they can interact face-to-face—and on the telephone. The most common vehicle for collaboration is the situation room, serving as a physical locus for incoming information and outgoing decisions and plans. This creates delays and inefficiencies, particularly when key players are geographically dispersed or involved in multiple simultaneous efforts. Telephone and occasional video teleconferencing accommodate physical remoteness, but at the cost of reducing the efficiency of sharing and accessing information.
Collaboration, in this context, does not merely include crisis managers sitting in a situation room. It also includes a wide range of other kinds of interactions, from short-duration actions focused on specific decisions to longer-term efforts at gathering and collaboratively integrating information over long periods of time. An open collaboration technology should support the full range.
Research in distributed computing, human-computer interaction, information management, and other areas is beginning to create a foundation for a technology of collaboration, enabling effective interaction across space and time. To support collaboration, research on communications models should address how users communicate with each other.7 The models must reflect not just speech, but also many other modes of communication. They should include not just the people collaborating, but also the information resources they interact with. For example, extensions to the collaborative perceptual space could include not just video-conferencing, but also a rich synthetic space including visualizations of terrain, buildings, and participants in the crisis response. This kind of shared perceptual space would enable rapid planning iteration, including simulation.
The information space shared by collaborators could include databases, documents, working notes, and other material that crisis managers need to share. Commercial groupware, World Wide Web technology, and some tools associated with the Internet-based Multicast Backbone (MBONE) constitute important initial steps, but there is an unmet need for an Internet-based open-protocol collaboration technology that can support a very broad range of collaborations.
4. Research is needed to develop concepts for new, open, network-based collaboration tools. Collaboration is essential in crisis response and many other national-scale applications. Although networks are used to support collaboration, currently available collaboration tools are not adequate to suit application needs. Specific tools, technologies, and open protocols to support collaborative efforts could yield significant benefits.
Suggested Research Topics:
- Common perceptual spaces should be developed that mix video, synthetic (virtual) environments, and information visualizations, to facilitate collaboration among geographically dispersed participants.
- Common information spaces should be developed that support sharing, organizing, and evolving a jointly viewed collection of complex information interactions.
- A virtual situation room, combining common information and perceptual spaces in a collaborative virtual environment, would assist users to gather information, make plans and decisions, initiate actions, and monitor execution.
- Work flow management and judgment support tools (discussed in Box 3.2) are necessary to augment and enhance the capabilities of decision makers by exploiting collaborative problem-solving infrastructures.
- A collaborative problem-solving environment for software development by distributed sets of designers, programmers, and application users could aid more rapid, higher-quality development of software applications.8
SYSTEM COMPOSABILITY AND INTEROPERABILITY
Crisis management is a first-rate illustration of the need to rapidly incorporate and utilize a potentially globally distributed collection of communications, information resources, and system components into an application solution. The first requirement is to gain rapid access to information resources of many different types; the second, to integrate these resources into the information that each user needs. Computers and networks can help achieve these goals, but there is a strong need for improved ways to use existing resources (including both old, legacy resources such as city maps and new ones such as the latest satellite imagery from the World Wide Web) and to communicate with existing services. Research aimed at generic tools and services for integrating such systems is discussed in this section. However, scaling to national and global levels requires more than ad hoc approaches, so this section also addresses research toward more forward-looking architectural solutions.
Because crisis management places a premium on information access and integration, workshop discussions focused on composing information systems from multiple sources. However, other types of networked computing systems in
national-scale applications—distributed transaction processing and large-scale collaboration, for example—are increasingly interconnected and composed from many parts. The World Wide Web, for example, is a platform across which information, communications, and distributed software applications all work together.
Crisis management requires that information cross heterogeneous interfaces—a translation problem. For example, data files must be translated from one format to another; handheld radio signals must be translated between standards. These translation problems are what is usually meant by ''interoperability." Interoperability between systems is crucial for crisis management, but it is not enough. Truly collaborative solutions require the ability to compose or integrate systems. For example, an information system that must perform searches across different databases with different structures requires a closer integration with deeper understanding of semantics than the term "interoperability" often implies.
Semantic composition implies that the user and resource or service have some deeper level of understanding and agreement about an invocation or communication. Semantic composition requires agreement about the meanings of functionality across interfaces. For example, there are different syntactic formats for geographic information systems made by different vendors, such as ArcInfo and MapInfo. Files made for one must be translated to be used by the other, but both share the semantic idea of latitudinal and longitudinal coordinate reference and ideas such as "X is adjacent to Y" and "X surrounds Y."
Finding 5: Focused Standards
In a crisis, many diverse information and communications resources have to be brought together. Because the diversity of resources available over networks is increasing rapidly, they must be linked together by standard protocols and other elements. Without widespread agreement on critical common elements, systems cannot scale up, and national-scale applications will be unworkable.
At the network level, the Internet suite of standards is a particularly compelling example of how well-chosen technical standards create a powerful avenue for interconnection and interoperation. The Internet protocols serve many functions, such as domain name management, addressing, packet transmission, reassembly, and fault tolerance. They are organized into relatively small component elements, enabling users to select particular elements of the Internet suite without having to use the entire set. One example of how this promotes flexibility is that many firewall systems transmit only certain kinds of packets and respond to certain kinds of protocols (e.g., for electronic mail, but not for file transfer). This flexibility has enabled the Internet standards to serve as a de facto basis for many emerging national-scale applications.
In the geographic information systems (GISs) widely used in crisis
management applications, a small set of critical commonalities—such as the latitude and longitude coordinate system and depiction of features such as roads and rivers as arcs between points—are accepted widely by a broad community of users and tool vendors. Although translation between different vendors' proprietary data syntaxes is still necessary, these commonalities make interoperation and sharing through GIS possible. For example, the Consequences Assessment Tool developed by FEMA and the Defense Nuclear Agency (discussed in Chapter 1) integrated the outputs of weather models, building-stress analysis models, and nuclear explosion blast-pressure models with databases from the Census Bureau and other sources to project the impacts of Hurricane Emily on neighborhoods in its path. These components had not been designed to interoperate nor were they developed to be part of a crisis management solution. The interoperation was accomplished by accepting a single geographic coordinate system representation.
There are, however, many areas in which multiple standards exist that overlap in function, yet whose service models may not be entirely consistent. For example, a variety of standards apply to different ways of querying and accessing information. Examples include Structured Query Language (SQL; for accessing relational databases), Common Object Request Broker Architecture (CORBA; for accessing structured objects), Object Linking and Embedding (OLE; for accessing distributed structured documents), and American National Standards Institute (ANSI) standard Z39.50 (for digital library information retrieval). Although these services have different functions, there is nonetheless significant overlap among them. Interconnecting these services (e.g., to provide a single client interface for retrieval and presentation of information objects related to a particular application) presents formidable interoperability problems at multiple levels. Existing interoperability solutions are often ad hoc and do not scale well. Although Finding 6 suggests there may be ways to address these problems for current technologies, it is also important to consider new approaches to protocol design in the future that will be more resistant to proliferation and divergence of protocols.
The World Wide Web provides good examples. While there is a wide diversity of clients (browsers) and servers, these diverse components are unified by a set of common protocols and formats, such as HyperText Transfer Protocol (HTTP) for connection management, Uniform Resource Locators (URLs) to name Internet information resources, and HyperText Markup Language (HTML) to describe Web pages. These protocols enable interchangeability and interoperation among the increasingly diverse set of Web tools and technologies. However, the complexity of the HTML language has increased rapidly, and there are now multiple versions in simultaneous use and increasing numbers of browser-specific Web pages.
These examples highlight the need for a "focused standards" approach, in which individual standards are narrowly focused on specific requirements rather than locking in a large, interrelated set of requirements. They should be modular
and composable in the same way system components are composable. This facilitates adoption and offers more flexibility for vendors to compete and for integrators to evolve systems. For example, in the Internet, this approach has already led to significant success in achieving interoperation among data communications networks atop which many applications run. Workshop participants and others have argued that the Internet standards are successful because they follow a focused, minimal approach (CSTB, 1994b), with no single standard incorporating all the functions of Internet protocols and services, but instead a large suite (Transmission Control Protocol (TCP), Internet Protocol (IP), File Transfer Protocol (FTP), HTTP, and many others) allowing more flexible interaction with other services and standards than a large set of interlocked "all-or-nothing" standards would have done.
Technology that supports this kind of plug-and-play approach between applications could be called "component middleware."9 Component middleware technology is just beginning to emerge, and research should aim to accelerate its development. For example, although various developers have produced application programming interfaces (APIs) and libraries of enabling software (such as enabling software for databases distributed across a variety of network substrates), it is not yet apparent that they can interact in a modular plug-and-play fashion on a large scale. APIs introduce constraints on the communications between computing components, and collections of APIs drawn from various sources may not be compatible: even if they allow syntactic interoperation, the semantic meaning across the APIs may be incompatible. Development of general principles for middleware design and implementation that enable the creation of focused, modular standards could contribute greatly to the likelihood of successful interoperation of two components that had not interworked before. It is not yet clear what those general principles might be, but such an approach is clearly more likely than trial-and-error efforts to work well on a national scale.
Two kinds of research are needed: service protocol design and protocol design principles. Service protocol design research focuses on creation of protocols through which new kinds of services can be delivered. Service protocols effectively insulate users of the service from details of implementations and their continued evolution in capability, and they insulate service implementors from details of how the services are used in particular applications. This lessens dependence, facilitates service evolution, and thereby stimulates greater competition for supply of services and consequent growth in capability. Protocol design principles are needed to guide designers of the protocols through which these services are delivered and provide them with some confidence that a new protocol supporting a specific service will interact effectively with other service protocols already in use or emerging. This notion of composition of information services is analogous to concepts related to the composition of software components.
Once services become broadly commercialized, the dependence of multiple organizations on specific protocols and standards makes evolution of those standards difficult. Therefore, early and active involvement by researchers in developing prototype protocols for new service concepts can significantly hasten the acceptance and growth of these new services. Any research program that involves creation of new service concepts should also involve creation, promotion, and evaluation of associated protocols. Researchers developing new concepts for information services should consider protocol design issues from the outset, since the protocol design will be the most likely instrument for scaling up the concepts and moving the technology into broader practice. When new protocol concepts are unencumbered by dependence on specific implementation or platform details (i.e., keeping them focused and minimal), they are more likely to be accepted and to serve as a basis for growth.
5. Research is needed to identify design principles that can yield open standards (such as communications protocols and application programming interfaces) that interact well with other related standards and allow for diversity and evolution. Individual research and development efforts aimed at setting standards should focus on more narrow component functionalities or services, rather than promote aggregation into larger multifunction standards.
Suggested Research Topics:
- Service protocol designs must be developed with the participation of researchers. Areas in which new service protocols are being developed include multimedia (e.g., representation of various multimedia and virtual reality objects, compression, meta-data for indexing and search), database distribution (e.g., distributed queries, object management, agents), collaboration technologies (e.g., video-conferencing, shared virtual spaces, information sharing), and distributed or "migratory" software systems constructed from components available over a network (e.g., the Java programming language). Many of these protocol development efforts are now at the point of commercial standardization, but many others are still in nascent stages, and it is in these cases that participation from the research community can have significant benefit.
- Protocol design principles should be identified. Results of research on protocol design principles would be in the form of design principles that service developers can apply, along with reasoning tools that could be used to assess critical characteristics of protocol design, for example, freedom from deadlock and avoidance of emergent phenomena as use scales up.10
Finding 6: Interoperability
The assembly of resources in a crisis includes formation of teams, configuration of communications systems, and interlinking of data resources. It can also include interconnection of tools for planning, decision, visualization and presentation, and work flow management. The diverse players in crisis response are reaping significant benefits from their growing reliance on information technology, and that reliance creates opportunities for effective and rapid integration of resources. However, it also presents significant technical challenges related to interoperability.
For example, in Hurricane Andrew relief operations, enormous efforts were required to incorporate data from county tax assessor registers into the GIS coordinate system being used by relief officials. In the aftermath of the Oakland Hills fires, integration of location data from portable global positioning system (GPS) sensors with paper maps from the local utility proved valuable, but it was accomplished manually.
David Kehrlein, of the Office of Emergency Services, State of California, noted that many of the data gathered at relief stations following the Northridge earthquake, such as names and locations of injured survivors, were very difficult to integrate into a coherent, overall picture. Had the relief stations been using any of a variety of commercial personal computer databases, there would have been the problem of integrating data from those formats into the map-based GIS databases used for command and decision making at higher levels. In fact, many of the data were not even in databases, but in word processors; to officials at the station, that was a "database." These data had to be integrated manually by crisis managers even to yield an accurate overall listing of the locations of survivors, much less to correlate the data with a GIS representation of the disaster area or a logistical information system for ordering and allocating supplies.
Workshop participants identified multimedia data fusion as a valuable research area. This could provide capabilities for integration based on tagging of amateur and professional video with location and time of the images, along with real-time sensor (e.g., atmospheric, seismological) data, keyed to a GIS representation of the crisis area. These data sources are more diverse than in the case of multiple formats for essentially similar database or word processing applications. Significant research challenges relate to the difficulty of fusing data with different formats and access protocols, some with fundamentally different kinds of meta-data (time, location, image resolution, sensor network scale).
The interoperability challenge is particularly acute in crisis management because of stringent deadlines and the inability to anticipate fully the range of resources that must be made to work together. Of course, planning and coordination before crises occur can mitigate problems of interoperation to some degree. However, the planning and coordination have two roles. The first and more
obvious role is to reduce the relative proportion of unanticipated cases. The second role, which has potentially far greater leverage, is to create general mechanisms for interoperability and rapid integration that can lower the cost (and uncertainty) of dealing with those unanticipated cases. These factors should certainly be incorporated into the design of any nationally accessible information repository for crisis management, such as the National Emergency Management Information System described by John Hwang, of FEMA. A repository should include not only information for which the need can be anticipated, but also mechanisms for locating, obtaining, interoperating with, and integrating other, unanticipated information sources.
It must also be recognized that information systems and software applications are rarely developed from scratch. Legacy software and databases persist and are the key assets of many enterprises, even in cases where little record remains of the details of their implementation or design rationale. The increasing use of these resources in an interconnected environment raises the stakes for interoperability.
The result of the above factors is that interoperability challenges are increasing, and any technology specifically focused on the problem could have a significant impact. Solutions to interoperability problems need not always be ad hoc, and research focused on this problem may have great impact. For example, wrapper and mediator concepts (see Box 2.4) were initially used entirely in ad hoc ways, crafted by hand to solve specific problems as they arose. Researchers are starting to develop more generic methods (e.g., using data mining and other artificial intelligence techniques) that could lead to tools for semiautomatic generation of intermediate components such as wrappers and mediators. The problem is exacerbated by the increasing diversity of data models and new ways to manage (e.g., name and share) complex object types. However, even very simple approaches to allow sender and receiver to communicate a shared identification of types, such as the Multipurpose Internet Mail Extension (MIME) hierarchy used for structured e-mail documents, have been seen to have considerable benefit. As this report was being prepared, designers of languages for network-centered computation such as Java were still struggling with how to manage the name space for meta-information (types), and groups involved with Web-related technologies were still working out how the name space for Web objects is to be organized.
6. Specific research efforts should be undertaken to develop generic technology that can facilitate interlinking of diverse information resources. National-scale applications present new challenges to interoperability and integration of information resources. Emerging ideas could yield advances such that interoperability problems need not always be met with ad hoc solutions.
Suggested Research Topics:
- While ad hoc approaches to wrappers and mediators have long been in use, generic approaches should be developed that rely on consistent knowledge representation models. Research efforts should be undertaken to assess for what kinds of information resources such generic technologies are feasible.
- Methods are needed to better support the explicit identification and management of meta-information about information resources such as information about types, data models, schemas, and meta-data in networks. Meta-information provides a basis for identifying how information objects are to be interpreted—for example, what coordinate systems are used? What is the quality of the data? Without this basis, reverse engineering is required before information resources can be integrated successfully into composite systems.
- Advances in data fusion of multimedia information from sources such as sensors, relief officials, and amateur citizens are necessary. The usefulness of automatic tagging of inputs (e.g., tagging video images with location and time, as they are generated) should be evaluated.
Finding 7: Integration of Software Components
Rapid response to a crisis may involve integration of applications that were not originally intended to be used together. Crisis management most often involves the need to compose data from different information systems, although integration of software applications such as FEMA's Consequences Assessment Tool (which incorporates federated simulation models and databases) and meso-scale weather models can also be involved. However, the need for composability generalizes to almost every national-scale application area as new systems must be assembled or modified in response to new requirements. For example, design for manufacturing typically involves composing many programs (sometimes numbering in the thousands) that are used for parts of a complex design. These programs are typically legacy systems whose code will never be rewritten.11 Compounding this problem is the unwillingness of collaborating competitors to share source code.
The ability to compose new solutions out of existing parts is needed to control costs, reuse legacy code, meet competitive time-to-market requirements, solve crisis problems, manage complexity, and reduce programming effort. These depend in part on standards to support interoperability between software subsystems, but more fundamentally on an ability to predict or understand the properties of large software systems.
Currently, composability is often implemented in an ad hoc way. Research on more broadly applicable methods is needed. Solutions are likely multifaceted, encompassing a variety of technologies, including application program
interfaces, standards, wrappers, data fusion, cataloging, registering, and common object models. A framework for composability might include a communications model, a distributed computation model, an interface definition language, interface generation tools, protocol or interface translation, a negotiation protocol, and communications facilities among layers of abstraction within and between applications, the infrastructure, and the networks.
The Java language and programming environment serve as a clear example of an environment that was designed from the ground up with these sorts of models, in an effort to ensure composability. To incorporate legacy systems in new architectures is more difficult, and there is a need for an infrastructure with enough standardization to allow interoperation, but not so much as to stifle growth. The ability to address this need likely varies among application areas. For example, civilian crisis management applications are likely to pull together components developed for other purposes. Other industries such as manufacturing are developing application-level architectures for interface specifications, collaboration software, and metacomputing support systems. However, even in this application there are challenges: Lee Holcomb, of the National Aeronautics and Space Administration, noted, "Most companies aren't willing to change their computational infrastructure to work in partnership. So one of the difficulties is . . . trying to get . . . tools to work across different computational platforms through firewalls in each company that are different."
Software composability techniques have the potential to improve the state of the art in a relatively old and intractable problem area, large-scale software system development. Historically, the cost and complexity of large software projects have led to delays in application deployment. More fundamentally, they have also produced systems that do not perform—they do not provide the functionality or reliability required. One avenue toward improving the ability to produce software involves methods for composing systems out of components, including some that may not have been designed to work together, such as legacy systems. This is a very complex problem, requiring advances in many areas.12
A very difficult challenge lies in modeling the behavior of composed software systems. It is currently not possible to predict or reason about the functionality, performance, and correctness of most software systems; in practice, the ability of most large software systems to meet requirements is often not determined until the system is built, tested, and tuned. A direction for research that can perhaps address a subset of this problem is the identification and design of composability properties—properties of software components that also characterize systems composed from them. Unless these properties are chosen carefully, it is typically impossible to predict whether a composed system will have the properties of its components. This is why, for example, security or fault tolerance of software systems must virtually always be reevaluated when they are combined into larger systems, even if all the components are individually secure
or fault tolerant. Better understanding of composability properties might eliminate this need in some circumstances.
7. To reduce both cost and time, national-scale applications often require construction of software systems from components that already exist or are provided by different suppliers. Research is needed on composability of software systems, including ways of predicting performance, reliability, and other features of composed systems.
Suggested Research Topics:
- Programming models must be developed that facilitate interoperable, composable system construction, as well as prediction and reasoning about the scalability, performance, and correctness (conformance to specified operating parameters) of the resulting system throughout its life cycle.
- Research should address creating a capability for virtual secure groups across different computational platforms.
- Active software objects that users access across networks can provide computing and communications functions across networks, provided they are constructed according to a model that enables them to integrate with each other and with existing applications. For example, the Java language provides a framework of assumptions within which new functionalities can be provided as small, relocatable software components called applets. The ability of these assumptions to remain true when deployed on a national scale is yet to be determined and is worthy of research attention.
- Tools, frameworks, and infrastructure mechanisms are necessary to complement current work on composable, reusable objects. Examples of tools are registries and locators; examples of infrastructure mechanisms are dynamic, distributed linkers and loaders.
Finding 8: Legacy and Longevity
Crisis management places a premium on using known, stable resources and avoiding surprises, because there is no time for training or learning new tools during a crisis. As James Beauchamp, of the U.S. Commander in Chief, Pacific Command, observed (see Chapter 1), "The last [communication equipment] I need in a time of crisis is something I have never worked with before." This statement underscores the premium organizations place on maintaining the usefulness of resources that represent a significant investment of time, understanding, and money over their life cycle and cannot be abandoned lightly. These resources (e.g., radios, maps, databases, word processors) and the infrastructure they rely upon should be designed so they can remain accessible and, ideally, can
evolve and incorporate new technologies and services throughout their lifetime. The capacity to adapt and evolve is a necessary feature of the long-lived bodies of information that are central to many national-scale applications.
For example, important text documents containing information needed in a crisis may outlive the word processing or desktop publishing software through which they were created. These documents can always be preserved as images or printer-coded files (such as PostScript), but use of these representations sacrifices access to the content (e.g., for indexing and searching) as well as mutability (to make revisions or to take advantage of new features made available in newer versions of word processing software, for example). Compounding this further is the increasing proliferation of compression and encoding formats that make even basic textual information such as ASCII characters unreadable if knowledge about the formats is lost.
A consistent, lasting way of associating document-type information with long-lived documents would enable, for example, servers to be made available on networks that can interpret documents of various types and, depending on the present task, translate them into other formats, search and index them, or carry out other operations. Network services may help with problems such as the following, as well: How can data from legacy databases be preserved without preserving the actual database system that hosted the data? How can old software be used today, for example, through emulation or translation? Can computer-aided design (CAD) data be managed over the lifetime of a major system without requiring all the CAD tools and their platforms to be preserved as well? Clearly, the more complex the object type, the more challenging this problem becomes. In addition, other legacy assets may create more difficult problems that network-based format translators alone cannot solve. For example, features other than functional interfaces to resources may be long-lived, such as locations and access control lists associated with resources.
The need to support longevity is also paramount in other application domains reviewed in the workshop series. Medical records must be able to follow people as they move through life and must remain useful as technology for creating, organizing, and managing medical information changes.13 Libraries and other repositories for human expression have similar problems of evolving representations and abstractions of objects (e.g., books, paintings, indices). At Workshop I, David Jack, of the Boeing Company, expressed the same concern in reporting that Boeing must keep available in an accessible form the engineering plans for every airplane they make for the lifetime of that plane, which may be 40 years or more.
The problem in all these cases is that there is a feature of hard copy that must be duplicated in the networked computer environment; a hard copy of a piece of information or expression continues to be usable for the lifetime of the medium. In computer systems, however, the evolution of storage technologies means that
the medium may outlast the hardware or software for accessing the information on it, leaving the information inaccessible.
This problem is particularly critical for national-scale applications because these applications and the data supporting them should not be bound to particular software components, computer platforms, data formats, and other technological artifacts that will be outlived by the specific information being managed. Otherwise, application users in the future will be unable to optimize specific technology decisions to meet their needs because they will be shackled by a legacy of old information objects and software. The constraints placed on current technical options by the need to maintain access to technologies developed in the past are the essence of the technological hand-from-the-grave influence that currently restrains the evolution of many large, complex systems, such as the nation's air traffic control systems. An approach to the management of information objects and systems architectures that is based on sound general principles can prevent such constraints in the future.
There are three directions in which further research is needed to address problems of longevity—(1) naming and addressing, (2) resource discovery, and (3) support for evolution. With respect to naming and addressing, a key problem is that information and other resources and services are mobile, and over long periods of time anything that survives is extremely likely to move. For example, network hosts disappear or move to different locations, file systems are reorganized, and whole institutions split, merge, or move. As a result, the situation with respect to URLs, which identify the location of resources in the World Wide Web, is unstable. URLs contain not only the location (including both host name and path name within a host), but also the access method or protocol. Although the widespread deployment of the Web is only a few years old, many URLs have already become obsolete, often providing no recourse to discover whether the information sought has moved elsewhere or is simply unavailable.
One significant direction for improvement, whose requirements were recently defined within the Internet Engineering Task Force, is to separate naming from addressing.14 This would involve the definition of Uniform Resource Names (URNs), a new type of name intended to be long-lived, globally unique, and independent of either location or access method. These, in turn, are translated (resolved) into URLs as necessary, but it is the URNs that should be embedded in objects for long-term storage, enabling future identification and use. There is still significant work to be done in this domain, because the problems of how to do name-to-location resolution have not been solved. This undertaking is larger in scale by orders of magnitude than the host-name resolution provided by the Internet's Domain Name Service, which is probably inadequate to handle the degree of volatility and mobility needed for URNs because information probably can move much more frequently than hosts. A follow-on problem is that even if a service arises that scales and handles the rate of updates more effectively, in the long run it may well fail or be replaced. Research at the Massachusetts Institute
of Technology (MIT; the Information Mesh project) is attempting to address problems of allowing for both a multiplicity of resolution services and an architecture that provides fallback mechanisms, so that if one path to finding resolution fails, another may succeed; this is all very preliminary, however, and more research is needed.
The second part of the solution is to help users find resources. URNs are not intended to be user friendly, but rather computer friendly. Because they should be globally unique, they are unlikely to be mnemonic or to fit into the various naming schemes that suit human preferences. For this, additional resource discovery services are needed, such as keyword searching and similarity checking. There are some significant early efforts in this direction,15 but there continues to be a need for more sophisticated searching tools, especially as people with less computer savviness become frequent users. It is difficult to build a local naming and search tool that is tuned to particular application domains or to private use. All too frequently these services point to dead ends, such as outdated URLs; the services should be better able to weed out bad data. In a crisis, if a search engine overwhelms the user with an indistinguishable mix of good and bad information, the overall result may be useless.
A third area, discussed further in Finding 9 (''Adaptivity"), relates to the ability of information and other resources to evolve. Although it is desirable for new capabilities and technologies to be employed within equipment and services (e.g., to use new, enhanced interfaces), the evolution must be smooth and easy for people and their applications to adapt to or else the new capabilities may not be used. Application designers cannot know in advance all possible directions for evolution of useful resources, and so to support evolution, applications and infrastructures should be designed to enable applications to learn about and utilize new and evolving resources. The specific research directions implied by this need are discussed in Finding 9.
8. Technological and architectural methods should be developed for reconciling the need to maintain access to long-lived information and software assets with the need to enable application users to exploit new technologies as they become available. Such methods should be applied both at the middleware level and in the architectural design of national-scale applications.
Suggested Research Topics:
- Research is necessary to specify the minimal component services in an information infrastructure that allow for identifying, finding, and accessing resources, and to develop protocols for service definitions that are both minimal in terms of needs and extensible to allow for improved service. Some specific examples following the library analogy are services to help people determine
- which objects and resources they want (a service like that of a librarian who suggests books), the registration of individual resources (e.g., Library of Congress catalog numbers), the location service (e.g., a catalog), and mechanisms for user authentication and access control policies (e.g., placing books on reserve for students registered in a particular class). Mechanisms to implement these services require, in particular, ways to manage information about how to interpret typed information objects (ranging from documents to data in databases and software components) at the network level.
ADAPTING TO UNCERTAINTY AND CHANGE
A crucial problem faced by all national-scale application areas, but particularly crisis management, is that of dealing effectively with uncertainty in three areas: infrastructure (e.g, networks and network-based services such as naming and addressing), components of integrated solutions, and the nature and behavior of potentially useful resources. Uncertainty and change are involved in all of these areas. In a crisis, changes can produce uncertainty on a scale of minutes: Are the telephone lines in the disaster area down? How soon will they be restored? Change on a longer time scale can also produce uncertainty: Can a firm adapt its new computer system to work with its old databases? These problems highlight the need for systematic, architectural solutions to the problems of adaptivity and reliability. Progress in these specific areas will benefit any application domain that is sensitive to factors such as human errors, overloading of resources, and other unpredictable situations. Indeed, as all application domains grow in scale, these conditions will become more common.
Finding 9: Adaptivity
During and after a crisis, it is critically important that network services and resources be available. This need implies an adaptivity to unusual or extenuating circumstances beyond traditional network operational criteria. Other national-scale application areas could also benefit from increased adaptivity, for several reasons: sharing of network-based resources implies significant fluctuations in demand for and availability of those resources; human errors and system failures are inevitable; and new applications and unusual uses of existing applications can generate entirely unanticipated circumstances. Network-based systems (e.g., communications systems, computer networks, and sensor networks) should be prepared not only to route around points of congestion or failure, but also to adapt to changing availability of resources. Methods for achieving this adaptivity in a crisis are likely to be broadly useful in many domains.
Crisis management demonstrates a number of specific ways in which adaptivity is critical to system design. At the network level, for example, if the local,
preexisting network infrastructure is at least partially operational, it may be valuable to integrate it with components brought by the crisis management team. This could involve attaching portable computers preloaded with crisis-related data and software into existing local area networks or connecting predeployed sensors, such as security cameras, into a network deployed for the crisis response.
In practice, identifying and making use of the existing infrastructure are difficult; consequently, relief workers frequently arrive with an entirely separate system whose parts and operation they understand. Yet this approach does not eliminate their problems, because in many cases, multiple organizations arrive, each with its own equipment, networks, and policies for using them (such as access priority and security), making effective integration of all available resources difficult. Adaptivity in this case may reflect the ability to rapidly implement compromise positions where resources owned or controlled by different parties are integrated with agreements about policies for shared use.
Applications that run in uncertain environments also should be designed for adaptivity. For example, if network service is available only intermittently, applications such as shared information repositories and collaboration tools should be prepared to adapt to varying network resources. They should also be reconfigurable or able to configure themselves to take advantage of new or evolving resources. For example, information sent from a crisis command center might be sent to field workers as maps and diagrams when sufficient bandwidth is available, but as text when the bandwidth is reduced. Multiple, distributed copies of databases could be designed to replicate updates to each other (maintaining overall coherence) only when bandwidth is available or to restrict updates only to the highest-priority information such as locations of people needing medical attention. During a videoconference, if congestion occurs, a shift to a lower image resolution could enable the conference to continue. An attractive feature in such circumstances would be support for choice by the users between reduced resolution and fewer frames per second as appropriate to their needs.16
A different kind of example is the application that can adapt to changes in the availability of information inputs. Crisis managers must make judgments in the absence of complete data. Judgment support applications (e.g., building damage simulations, logistics planners to estimate emergency supply requirements, map-based evacuation route planners) must adapt not only to statistical uncertainty, but also to gaps, mistakes, and deliberate falsifications in their input data. This requires much more than simplistic interpolation of missing data—it demands an ability to make inferences about what the correct data are likely to be.
Applications also need to evolve and adapt to changes on a longer scale. For example, if simulation-based training programs are designed to train people by providing accurate maps and images of possible crisis locations, adaptivity should enable incorporation of new, better sources for that information over time. Originally, the simulation may use line drawings with altitude designations, later
incorporating information from aerial photographs and weather prediction systems.
In crises, it would be especially valuable for applications to discover and exploit automatically, without the need for time- and attention-consuming human intervention, the capabilities of resources whose usefulness could not have been anticipated when the application was written. These might include new objects or services with enhanced functionalities that did not exist when the application was written (e.g., new kinds of environmental sensors); legacy resources that have existed for a long time, but have a structure or form that the application designer did not anticipate having to access (e.g., records of earthquake damage patterns from past years); and resources created for use in a different application area (e.g., architectural designs used to plan evacuation routes during a crisis).
To enable successful use of unanticipated resources in all these cases, continued research should address the question of how applications might learn about and make use of such objects. This problem has two parts. First, the application must be able to learn about the functionality of the new resource, which can be expressed in its type. To find the type of the new resource, the application must be able to ask the resource itself or some other service to identify the type of the resource. Both CORBA and the Information Mesh project at MIT make a first cut at this by requiring that all objects (resources) support a function to answer such a query, if asked. Second, the application may have to import code to access the new type of resource. The importation of code at run time generally is possible only in programming environments that support interpreters, such as the Lisp programming language and its derivatives or Java; importing code at run time to interface into other languages such as C or C++ generally is not feasible. Thus, the problem of utilizing resources of unanticipated types can be split into two research directions, one directed toward protocols for querying objects and the services to support that activity, and the other advancing work in language, compiler, and runtime technologies.
9. Research is needed to increase the adaptivity of networks and applications by providing them with the tools, resources, and facility to function during and after unusual or extenuating circumstances. National-scale applications, especially those supporting crisis management, must be able to function in an environment marked by variability of available resources and of requirements for resources.
Suggested Research Topics:
- Self-organizing networks are those in which the components and resources of the network can discover and learn about each other without the need for a centralized management structure. Self-organizing networks will have less need for human intervention than is otherwise required. There is both theoretical and practical research to be done, ranging from whether such networks can stabilize
- themselves, to the protocols by which components learn about each other and the specific kinds of information that components must share to enable self-organization.
- Supporting mobile users and resources is a particular challenge, since the network must be able to reorganize continually. Mobile IP is one way of accounting for mobility by forwarding packets to the user's current location (similar to roaming in cellular telephone systems). However, it introduces latency that is often unacceptable for real-time communications such as voice and video (Katz, 1995).
- Improvements in network management are needed, including tools for discovering the state of existing infrastructure17 and extensions to current models of network capabilities to reflect such aspects as reliability, availability, security, throughput, connectivity, and configurability. These could enable management tools with new paradigms of merging the access, priority, and security parameters of networks that interconnect with each other during crises in unanticipated ways. One approach might be to develop a priority server that could administer access rights flexibly within a network as users and their needs change during a crisis.
- Methods are needed for reconciling network adaptivity with minimizing vulnerabilities to intruders and other threats. Legitimate actions taken by adaptive self-organizing networks to conform to changes in the available infrastructure may in some cases be difficult for network managers to distinguish from hostile infiltration by an intruder. Significant challenges exist in making secure, adaptive networks that recognize self and do not launch "autoimmune" attacks. Artificial intelligence methods in network management may be a fruitful area for research to meet this need.
- Security should adapt to the mobility of people and changing configurations of networks. For example, how can federal officials arrive in California after an earthquake and provide valid identification recognized by the network without requiring that the infrastructure assign everyone new identities and passwords? How do those officials access useful files from their home offices while in some other security domain? How do the secure domains decide they can trust each other? Research is needed to support composition of security policies across administrative domains and mobility of access rights.
- Crisis managers have a clear need for better tools for discovering what network-accessible resources are available to them in time of crisis. More powerful search and retrieval mechanisms than keyword matching are necessary, as are solutions that allow searching within an unanticipated application domain.
- Rapidly configurable virtual subnets are required that span multiple underlying network resources but provide services such as privacy and access control, as though users were isolated on a private network. Research is needed both to develop the actual protocols necessary to create functional virtual subnets and to provide a clearer understanding of how well virtual subnets can be isolated from broader network environments to support features such as security, access control, reliability, and bandwidth on demand.
- Application component interface specification and exploration protocols are needed to enable applications to interact with evolving or new resources. There has been some research into interface specifications, but uniformity is lacking. To provide application adaptivity that works at a national scale, either one architecture must be selected (which is unlikely) or protocols must be written to allow negotiation between applications and services of the interface specification language and support tools to be used in any particular case. For example, new protocols would be needed to allow an application that accesses both CORBA objects and OLE objects to discover from objects which kind they are and then use the appropriate model to query the object or resource about its capabilities.
Finding 10: Reliability
The utility of an application or application component often depends on an assessment of its reliability. Maximum reliability is not always necessary; what the user requires is to understand the degree of reliability, to determine whether or not it is within acceptable tolerances, and to decide appropriate actions. In managing a crisis, for example, decision makers must constantly judge the accuracy of the information they are using in making decisions. (They do not necessarily ignore questionable information, but they weigh it differently than more certain information.) Aircraft manufacturers assess the reliability of a subcontractor's part design before incorporating it into an airplane design. Health care workers assess the probable correctness of each item of data about a patient before making a diagnosis or taking action. The quality of inputs, the predictability of events, the validity of simulations, the correct functioning of large-scale applications, and similar factors underlie the quality of information yielded by computer and network applications. These must be understood for people to rely on information and computation technologies in national-scale applications.
To facilitate these assessments for computing and communications systems on which the nation increasingly depends, reliability attributes of system components need to be formalized and exposed whenever possible. This will require research. For example, a crisis response application constructed dynamically from disparate parts must continually predict and assess the reliability of each of its parts. Some of the parts, such as remote computing facilities running a well-tested modeling program, may be assumed by the crisis application to be highly reliable with known probabilities of correctness and measures of precision. More typically, however, many of the components contributing to a crisis management solution do not have such known attributes. This is particularly true if people are part of the system or if untested, previously unintegrated subsystems are used. Furthermore, the nature of the crisis may change a reliable system into an unreliable one through unanticipated scaling problems. Therefore, an important unmet
application need is the ability to develop confidence factors based on the reliability of parts of a system.
Assessment of confidence factors can complement other approaches to improving reliability. Many application areas, such as manufacturing, use design and testing processes and redundant subsystems to achieve reliability goals. Adaptive systems, such as those discussed in Finding 9, represent another set of approaches to achieving reliability. Some components of an application solution, however—particularly those involving people—do not have well-defined ways of developing reliability factors. New insights and approaches are needed to improve the reliability of the weak links in a system and, as a separate topic, to capture, quantify, and communicate the reliability status (whether strong or weak) of each component.
The latter topic is particularly important in national-scale applications, which have high public visibility and must provide the public with a high level of confidence that they function correctly and, when they do not, that the problem can be identified and corrected quickly. When an airplane crashes, investigators retrieve the "black box" and analyze recorded data to determine what may have caused the crash, so that steps can be taken to avoid future problems and reestablish public confidence. It would be valuable in national-scale applications to develop a black-box analog (perhaps a set of required procedures) for identifying and correcting errors.
10. Research is needed to enable accurate assessments of the reliability of systems composed of potentially unreliable hardware, software, and people. Consistent methods for evaluating reliability should lead not only to more reliable systems, but also to better ways of using systems in applications, such as crisis management, where absolute reliability is unattainable but reliability factors might be assessable. The ultimate goal of these efforts is to develop measures of confidence in the behavior of systems that support national-scale applications.
Suggested Research Topics:
- A black box technology should be developed for national-scale applications, analogous to that in aircraft, that enables the rapid identification and correction of errors, coupled with procedures for responding to problems that ensure continuing confidence in the viability of the application.
- Basic and applied research in chaotic processes is needed to better understand the reliability of applications in the presence of poor-quality information (e.g., errors, incompleteness, internal inconsistencies). Research might examine the trade-offs between urgency and fidelity of information collection in crises and methods for validating and reconciling poor-quality information.
- To adapt to errors, whatever the source, applications must be robust.
- Applications should be self-adapting and have self-describing, self-propagating metrics of component and information reliability. These metrics should reflect the implications of having people as an integral part of applications.
- Reliability attributes should be developed and propagated as meta-data associated with system components.
PERFORMANCE OF DISTRIBUTED SYSTEMS
As the scale of applications grows, not only in the geographical distance between components but also in the complexity of the interrelationships among the components and the utilization of lower-level resources (e.g., networks, processors, memory, storage), the performance of systems that support applications must increase if they are to achieve results rapidly enough to be usable. In addition, the performance of the various infrastructural resources must be balanced to produce effective results.
Finding 11: Performance of Distributed Systems
Crisis management presents an especially challenging set of requirements for balanced performance in both computer systems and networks. Because timeliness is nearly always paramount, extraordinary computing power and network bandwidth are required to ensure that results can be delivered soon enough to be relevant. Moreover, there is rarely time in a crisis to tune software performance, and the easier a computer program is to use effectively, the more likely it is to be used in the stress-laden working environment of a crisis.
Since crises are infrequent and seldom predictable as to place and time, establishing dedicated computing and communications resources is economically impractical. Whatever large-scale, high-performance computing and communications capabilities are made available for responding to a crisis will need to be preempted from less urgent work. The potpourri of data needed to help answer queries and supply input for simulations must be marshaled from its many resident locations as quickly as possible, and high-bandwidth networking must be delivered to the scene for transmission of imagery, including simulation results.
Achieving computer system interoperability, adaptivity, and reliability, especially in connection with a crisis, calls for exceptional computing power and storage capacity. For crisis management, capabilities even beyond those appropriate to ordinary circumstances are required to manage a largely ad hoc and unreliable interconnection of computer systems that were never designed to work together in the first place. The software that makes these deficiencies tolerable adds to the computing burden.
The deployment of computations across networks and the use of distributed and possibly heterogeneous computer systems to address single problems are attractive for crisis management and other national-scale applications.
Increasing the size of many candidate computations to national scale may be impractical because of poor performance. For example, storm and wildfire simulations may perform more poorly as distributed computations than data acquisition and reformatting do. As MIT's Barbara Liskov said, "Everyone knows scalability is important. But no one knows how to show [that] you have it, short of running experiments with huge numbers of machines, which is usually not practical. We need a way to reason about scalability." At every point in the parallel and distributed software design and development cycle, scalability in performance should be treated as a first-class problem.
11. Research is needed to better understand how to reason about, measure, predict, and improve the performance of distributed systems. Crisis management and other national-scale applications demand high-performance systems and tools that balance processing speed, communications bandwidth, and information storage and retrieval.
Suggested Research Topics:
- Current capability to model the performance of systems that are distributed across heterogeneous networks and computing platforms is very limited.18 Predicting the performance of large, distributed software systems is particularly difficult but would be quite valuable in addressing national-scale application needs. Research is needed to identify what parameters of network, processing, and storage components are critical to systems' ability to meet specified performance criteria, such as capacity and responsiveness, and to develop appropriate metrics for these parameters. Research should include a measurement program to evaluate the ability of models to predict how systems will perform under normal conditions and in crises. These models could be tested, for example, in the context of the crisis management testbeds discussed in Finding 1.
The reverse, however, is not necessarily true; technologies that have been developed for other domains may not meet the needs of crisis management for coping with urgency and unpredictability.
The "where" of deployment includes physical as well as conceptual locations, such as a layer or layers in the technical architecture.
For a discussion of the relationship between the study of deployed systems and the development of new research directions, see CSTB (1989) and CSTB (1992).
A different kind of negative effect that people may have on systems occurs when, in hostile situations such as crime or warfare, they attack systems to harm their performance.
The diversity of organizations with different structures and patterns of working makes it necessary for these communications models to accommodate different modes, when collaboration crosses organizational boundaries as it frequently does.
Middleware provides services within an information infrastructure that are used in common among multiple applications. For a discussion, see CSTB, 1994b, p. 49.
Revisions to code are no guarantee of improvement; managing the proliferation of different versions of the same code is another formidable challenge.
In fact, because genetic influences on medical conditions may be understood increasingly, maintaining medical histories longer than a lifetime may become more and more valuable to descendants.
Kunze, John A, "Functional Recommendations for Internet Resource Locators," Internet Request for Comments (RFC) 1736, February 1995; and Sollins, Karen, and Larry Masinter, "Functional Requirements for Uniform Resource Names," Internet RFC 1737, December 1994. Both are available on line at http://www.cis.ohio-state.edu/hypertext/information/rfc.html.
For example, doctors might decide that only the full level of performance is acceptable, whereas medical insurers might opt for lower resolution and professors showing chalkboard diagrams might opt for fewer frames per second.
Such tools exist, but are difficult to use and require a higher level of technical expertise than is readily available in a crisis response.