Technology: Research Problems Motivated by Application Needs
Chapter 1 identifies opportunities to meet significant needs of crisis management and other national-scale application areas through advances in computing and communications technology. This chapter examines the fundamental research and development challenges those opportunities imply. Few of these challenges are entirely new; researchers and technologists have been working for years to advance computing and communications theory and technology, investigating problems ranging from maximizing the power of computation and communications capabilities to designing information applications that use those capabilities. What this discussion offers is a contemporary calibration, with implications for possible focusing of ongoing or future efforts, based on the inputs of technologists at the three workshops as well as a diverse sampling of other resources.
This chapter surveys the range of research directions motivated by opportunities for more effective use of technology in crisis management and other domains, following the same framework of technology areas—networking, computation, information management, and user-centered systems—developed in Chapter 1. Some of the directions address relatively targeted approaches toward making immediate progress in overcoming barriers to effective use of computing and communications, such as technologies to display information more naturally to people or to translate information more easily from one format to another. Others aim at gaining an understanding of coherent architectures and services that, when broadly deployed, could lead eventually to eliminating these barriers
in a less ad hoc, more comprehensive fashion. Research on modeling the behavior of software systems composed from heterogeneous parts, for example, fits this category.
NETWORKING: THE NEED FOR ADAPTIVITY
Because of inherently unpredictable conditions, the communications support needed in a crisis must be adaptable; the steering committee characterizes the required capability as "adaptivity." Adaptivity involves making the best use of the available network capacity (including setting priorities for traffic according to needs and blocking out lower-priority traffic), as well as adding capacity by deploying and integrating new facilities. It also must support different kinds of services with fundamentally different technical demands, and to do so efficiently requires adaptivity. This section addresses specific areas for research in adaptive networks and describes the implications of a requirement for adaptivity; the importance of adaptivity at levels of information infrastructure above the network is discussed in other sections of this chapter.
Box 2.1 provides a sampling of networking research priorities discussed in the workshops. Although problems of networking that arise in national-scale applications are not entirely new, they require rethinking and redefinition because the boundaries of the problem domains are changing. Three issues that influence the scope of networking research problems are (1) scale, (2) interoperability, and (3) usability.
- Scale. High-performance networking is often thought of in terms of speed and bandwidth. Speed is limited, of course, by the speed of light in the transmission medium (copper, fiber, or air), and individual data bits cannot move over networks any faster. However, the overall speed of networks can be increased by raising the bandwidth (making the pipes wider and/or using more pipes in parallel) and reducing delays at bottlenecks in the network. High-speed networks (which include both high-bandwidth conduits or "pipes" and high-speed switching and routing) allow larger streams of data to traverse the network from point A to point B in a given amount of time. This makes possible the transmission of longer individual messages such as data files, wider signals (such as full-motion video), and greater numbers of messages (such as data integrated from large numbers of distributed sensors) over a given path at the same time. Research challenges related to the operation of high-speed networks include high-speed switching, buffering, error control, and similar needs; these were investigated with significant progress in the Defense Advanced Research Project Agency's (DARPA's) gigabit network testbeds.
- Speed and bandwidth are not the only performance challenges related to scale; national-scale applications must also scale in size. The number of information sources involved in applications may meet or even far exceed the size of the
Daniel Duchamp, Columbia University:
Rajeev Jain, University of California, Los Angeles:
- nation's or world's population. In theory, every information producer may be an information consumer and vice versa. Consequently, there is the need not only to reduce the amount of time needed for quantities of bits to be moved but, even at the limits of technology in increasing that speed, to transport more bits to more places. The set of people, workstations, databases, and computation platforms on networks is growing rapidly. Sensors are a potential source of even faster growth in the number of end points; as crisis management applications illustrate, networks may have to route bits to and from environmental sensors, seismometers, structural sensors on buildings and bridges, security cameras in stores and automated teller machines, and perhaps relief workers wearing cameras and other sensors on their clothes, rendering them what Vinton Cerf, of MCI Telecommunications Corporation, called "mobile multimodal sensor nets." Medical sensors distributed at people's homes, doctor's offices, crisis aid stations, and other locations may enable health care delivery in a new, more physically distributed fashion, but only if networks can manage the increased number of end points. In
- response, the communications infrastructure must be prepared to transport orders of magnitude more data and information and to handle orders of magnitude more separate addresses.
- A particular case, such as a response to a single disaster, may not involve linking simultaneously to millions or billions of end points, but because the specific points that will be linked are not known in advance, the networking infrastructure must be able to accommodate the full number of names and addresses. The numbering plan of the public switched telecommunications network provides for this capability for point-to-point (voice circuit) calling under normal circumstances. In the broader context of all data, voice, and video communications, the Internet's distributed Domain Name Servers manage the numerical addresses that identify end points and names associated with those addresses. The explosive growth in Internet usage has motivated a change in the standard, Internet Protocol version 6, to accommodate more addresses.1
- Interoperability. The need for successfully communicating across boundaries in heterogeneous, long-lived, and evolving environments cannot be ignored. In crisis management, voice communications are necessary but not sufficient; response managers and field workers must be able to mobilize data inputs and more fully developed information (knowledge) from an enormous breadth of existing sources—some of them years old—in many forms. Telemedicine similarly requires a mix of communications modes, although not always over as unpredictable an infrastructure as crises present. Interoperation is more than merely passing waveforms and bits successfully; interoperation among the supporting services for communications, such as security and access priority, is highly complex when heterogeneous networks interconnect.
- Usability. The information and communications infrastructure is there to provide support to people, not just computers. In national-scale applications, nonexperts are increasingly important users of communications, making usability a crucial issue. What is needed are ways for people to use technology more effectively to communicate, not only with computers and other information sources and tools, but also with other people. Collaboration between people includes many modes of telecommunication: speech, video, passing data files to one another, sharing a consensus view of a document or a map. In crises, for example, the ability to manage the flow of communications among the people and machines involved is central to the enterprise and cannot be reserved solely to highly specialized technicians. Users of networks must be able to configure their communications to fit their organizational demands, not the reverse. This requirement implies far more than easy-to-use human-computer interfaces for network management software; the network itself must be able to adapt actively to its users and whatever information or other resources they need to draw upon.
For networks to be adaptive, they must be able to function during or recover quickly from unusual and challenging circumstances. The unpredictable damage
and disruption caused by a crisis constitute challenging circumstances for which no specific preparations can be made. Unpredicted changes in a financial or medical network, such as movement of customers or a changing business alliance among insurers and hospitals that exchange clinical records, may also require adaptive response. Mobility—of users, devices, information, and other objects in a network—is a particular kind of challenge that is relevant not only to crisis response, but also to electronic commerce with portable devices, telemedicine, and wireless inventory systems in manufacturing, among others. Whenever the nodes, links, inputs, and outputs on a network move, that network must be able to adapt to change.
Randy Katz, of the University of California, Berkeley, has illustrated the demands for adaptivity of wireless (or, more generally, tetherless) networks for mobile computing in the face of highly diverse requirements with the example of a multimedia terminal for a firefighter (Katz, 1995). The device might be used in many ways: to access maps and plan routes to a fire; examine building blueprints for tactical planning; access databases locating local fire hydrants and nearby fire hazards such as chemical plants; communicate with and display the locations of other fire and rescue teams; and provide a location signal to a central headquarters so the firefighting team can be tracked for broader operational planning. All of the data cannot be stored on the device (especially because some data may have to be updated during the operation), so real-time access to centrally located data is necessary. The applications require different data rates and different trade-offs between low delay (latency) and freedom from transmission errors. Voice communications, for example, must be real time but can tolerate noisy signals; users can wait a few seconds to receive a map or blueprint, but errors may make it unusable. Some applications, such as voice conversation, require symmetrical bandwidth; others, such as data access and location signaling, are primarily one way (the former toward the mobile device, the latter away from it).
Research issues in network adaptivity fall into a number of categories, discussed in this section: self-organizing networks, network management, security, resource discovery, and virtual subnetworks. For networks to be adaptive, they must be easily reconfigurable either to meet different requirements from those for which they were originally deployed or to work around partial failures. In many cases of partial failures, self-configuring networks might discover, analyze, work around, and perhaps report failures, thereby achieving some degree of fault tolerance in the network. Over short periods, such as the hours after a disaster strikes, an adaptive network should restore services in a way that best utilizes the surviving infrastructure, enables additional resources to be integrated as they become available, and gives priority to the most pressing emergency needs. Daniel Duchamp, of Columbia University, observed, "Especially if the crisis is some form of disaster, there may be little or no infrastructure (e.g., electrical and telephone lines, cellular base stations) for two-way communication in the vicinity of an action site. That which exists may be overloaded. There are two
approaches to such a problem: add capacity and/or shed load. Adding capacity is desirable but may be difficult; therefore, a mechanism for load shedding is desirable. Some notion of priority is typically a prerequisite for load shedding."
Networks can be adaptive not only to sharp discontinuities such as crises, but also to rapid, continuous evolution over a longer time scale, one appropriate to the pattern of growth of new services and industries in electronic commerce or digital libraries. The Internet's ability to adapt to and integrate new technologies, such as frame relay, asynchronous transfer mode (ATM), and new wireless data services, among many others, is one example.
Self-organizing networks facilitate adaptation when the physical configuration or the requirements for network resources have changed. Daniel Duchamp cast the problem in terms of an alternative to static operation:
Most industry efforts are targeted to the commercial market and so are focused on providing a communications infrastructure whose underlying organization is static (e.g., certain sites are routers and certain sites are hosts, always). Statically organized systems ease the tasks of providing security and handling accounting/billing. Most communication systems are also pre-optimized to accommodate certain traffic patterns; the patterns are in large part predictable as a function of intra- and inter-business organization. It may be difficult or impossible to establish and maintain a static routing and/or connection establishment structure, because (1) hosts may move relative to each other, and (2) hosts, communication links, or the propagation environment may be inherently unstable. Therefore, a dynamically "self-organizing" routing and/or connection establishment structure is desirable.
Crisis management provides a compelling case for the need of networks to be self-organizing in order to create rapidly an infrastructure that supports communication and information sharing among workers and managers operating in the field. Police, fire, citizen's band, and amateur radio communications are commonly available in crises and could be used to set up a broadcast network, but they provide little support to manage peer-to-peer communications and make efficient use of the available spectrum. Portable, bandwidth-efficient peer-to-peer network technologies would allow information systems to be set up to support communications for relief workers. The issues of hardware development, peer-to-peer networking, and multimedia support are not limited to crisis management; they may be equally important to such fields as medicine and manufacturing (e.g., in networking of people, computers, and machine tools within a factory). Thus, research and development on self-organizing networks may be useful in the latter fields as well.
Rajeev Jain, of the University of California, Los Angeles, suggested two main deficiencies in terms of communications or networking technologies in a
situation where relief officials arrive carrying laptop computers: (1) portable computing technology is not as well integrated with wireless communications technology as it should be, and (2) wireless communications systems still often rely on a wireline backbone for networking.2 These factors imply that portable computers cannot currently be used to set up a peer-to-peer network if the backbone fails; radio modem technology has not yet advanced to a point where it can provide an alternative.3 In mobile situations, people using portable computers need access to a wireline infrastructure to set up data links with another computer even if they are in close proximity. In addition, portable cellular phones cannot communicate with each other if the infrastructure breaks down. Jain concluded that both of these problems must be solved by developing technologies that better integrate portable computers with radio modems and allow peer-to-peer networks to be set up without wireline backbones, by using bandwidth-efficient transmission technologies.
Peer-to-peer networking techniques involve network configuration, multiple access protocols, and bandwidth management protocols. Better protocols need to be developed in conjunction with an understanding of the wireless communications technology so that bandwidth is utilized efficiently and the overhead of self-organization does not reduce the usable bandwidth drastically (the current situation in packet radio networks). Bandwidth is at a premium because of the large volume of information required in a crisis and because, although data and voice networks can be deployed using portable wireless technology, higher and/or more flexibly usable bandwidths are needed to support video communication. For example, images can convey vital information much more quickly than words, which can be important in crises or remote telemedicine. If paramedics need to communicate a diagnostic image of a patient (such as an electrocardiogram or x-ray) to a physician at a remote site and receive medical instructions, the amount of data that must be sent exceeds the capabilities of most wireless data communications technologies for portable computers. Technologies are now emerging that support data transmission rates in the tens of kilobits per second, which is sufficient for still pictures but not for full-motion video of more than minimal quality. A somewhat higher bandwidth capability could support a choice between moderate-quality full-motion video and high-quality images at a relatively low image or frame rate (resulting in jerky apparent motion). Another example relates to the usefulness of broadcasting certain kinds of data, such as full-motion video images of disaster conditions from a helicopter to workers in the field; traffic helicopters of local television stations often serve this function. However, if terrestrial broadcast capabilities are disabled, it could be valuable to use a deployable peer-to-peer network capability to disseminate such pictures to many recipients, potentially by using multicast technology.
The statement of James Beauchamp, of the U.S. Commander in Chief, Pacific Command, quoted in Chapter 1 underscored the low probability that all individuals or organizations involved in a crisis response will have interoperable
radios (voice or data), especially in an international operation or one in which groups are brought together who have not trained or planned together before. Self-organizing networks that allowed smooth interoperation would be very useful in civilian and military crisis management and thus could have a high payoff for research. The lack of such technologies may be due partly to the absence of commercial applications requiring rapid configuration of wireless communications among many diverse technologies.
One purpose of the Department of Defense's (DOD's) Joint Warrior Interoperability Demonstrations (JWIDs; discussed in Chapter 1) is to test new technologies for bridging gaps in interoperability of communications equipment. The SpeakEasy technology developed at Rome Laboratory, for example, is scheduled to be tested in an operational exercise in the summer of 1996 during JWID '96.4SpeakEasy is an effort sponsored by DARPA and the National Security Agency to produce a radio that can emulate a multitude of existing commercial and military radios by implementing previously hardware-based waveform-generation technologies in software. Such a device should be able to act as if it were a high-frequency (HF) long-range radio, a very high frequency (VHF) air-to-ground radio, or a civilian police radio. Managing a peer-to-peer network of radios that use different protocols, some of which can emulate more than one protocol, is a complex problem for network research that could yield valuable results in the relatively near term.
Network management helps deliver communications capacity to whoever may need it when it is needed. This may range from more effective sharing of network resources to priority overrides (blocking all other users) as needed. Network management schemes must support making decisions and setting priorities; it is possible that not all needs will be met if there simply are not enough resources, but allocations must be made on some basis of priority and need. Experimentation is necessary to understand better the architectural requirements with respect to such aspects as reliability, availability, security, throughput, connectivity, and configurability.
A network manager responding to a crisis must determine the state of the communications infrastructure. This means identifying what is working, what is not, and what is needed and can be provided, by taking into account characteristics of the network that can and should be maintained. For example, the existing infrastructure may provide some level of security. Then it must be determined whether it is both feasible and reasonable to continue to provide that level of security. Fault tolerance and priorities for activities are other characteristics of the network that must similarly be resolved.
In addition to network management tools to assess an existing situation, tools are needed to incorporate new requirements into the existing structure. For
example, there may be great variability in the direction of data flow into and out of an area in which a crisis has occurred—for example, between command posts and field units. During some phases, remote units may be used for data collection to be transmitted to centralized or command facilities that in turn will need only lower communication bandwidth to the mobile units.
Adaptive network management can help increase the capability of the network elements, for example, by making the communications and computation able to run efficiently with respect to power consumption. Randy Katz has observed that wireless communication removes only one of the tethers on mobile computing; the other tether is electrical power (Katz, 1995). Advances in lightweight, long-lived battery technology and hardware technologies, such as low-power circuits, displays, and storage devices, would improve the performance of portable computers in a mobile setting. A possibility that is related directly to network management is the development of schemes that adapt to specific kinds of communications needs and incorporate broadcast and asymmetric communications to reduce the number and length of power-consuming transmissions by portable devices. For example, Katz observes that if a mobile device's request for a particular piece of information need not be satisfied immediately, the request can be transmitted at low power and low bandwidth. The response can be combined with those to other mobile devices, which are broadcast periodically to all of the units together at high power and bandwidth from the base stations. If a particular piece of information such as weather data is requested repeatedly by many users, it can be rebroadcast frequently to eliminate the need for remote units to transmit requests.
Priority policy is a critical issue in many applications; the need for rapid deployment and change in crisis management illustrates the issue especially clearly. Priority policy is the set of procedures and management principles implemented in a network to allocate resources (e.g., access to scarce communications bandwidth) according to the priority of various demands for those resources. Priority policy may be a function of the situation, the role of each participant, their locations, the content being transmitted, and many other factors. The dynamic nature of some crises may be reflected in the need for dynamic reassignment of such priorities. The problem is that one may have to change the determination of which applications (such as life-critical medical sensor data streams) or users (such as search and rescue workers) have priority in using the communications facilities. Borrowing resources in a crisis may require reconfiguring communications facilities designed for another use, such as local police radio. A collection of priority management issues must be addressed:
- Who has the authority to make a determination about priorities?
- How are priorities determined?
- How are priorities configured? Configuration needs to be secure, but also
- user friendly, because the people performing it may not be network or communications experts.
- How are such priorities provided by the network and related resources?
- How will the network perform under the priority conditions assigned?
The last is a particularly difficult problem for network management. Michael Zyda, of the Naval Postgraduate School, identified predictive modeling of network latency as a difficult research challenge for distributed virtual environments, for which realistic simulation experiences set relatively strict limits on the latency that can be tolerated, implying a need for giving priority to those data streams.
One suggestion arising in the workshops was a priority server within a client-server architecture to centralize and manage evolving priorities. This approach might allow for the development of a multilevel availability policy analogous to a multilevel security policy. A dynamically configurable mechanism for allocating scarce bandwidth on a priority basis could enable creation of the ''emergency lane" over the communications infrastructure that crisis managers at the workshops identified as a high-priority need. If such mechanisms were available they could be of great use in managing priority allocation in other domains such as medicine, manufacturing, and banking. In situations that are not crises, however, one might be able to plan ahead for changes in priority, and it is likely that network and communications expertise might be more readily available.
Victor Frost, of the University of Kansas, discussed the challenges of meeting diverse priority configuration within a network that integrates voice with other services:
Some current networks use multilevel precedence (MLP) to ensure that important users have priority access to communications services. The general idea for MLP-like capabilities is that during normal operations the network satisfies the performance requirements of all users, but when the network is stressed, higher-priority users get preferential treatment. For voice networks, MLP decisions are straightforward: accept, deny, or cut off connections.
However, as crisis management starts to use integrated services (i.e., voice, data, video, and multimedia), MLP decisions become more complex. For example, in today's systems an option is to drop low-precedence calls. In a multimedia network, not all calls are created equal. For example, dropping a low-precedence voice call would not necessarily allow for the connection of a high-precedence data call. MLP-like services should be available in future integrated networks. Open issues include initially allocating and then reallocating network resources in response to rapidly changing conditions in an MLP context. In addition, the infrastructure must be capable of transmitting MLP-like control information (signaling) that can be processed along with other network signaling messages. There is a need to develop MLP-like services that match the characteristics of integrated networks.
An ability to configure priorities, however, will require a much better understanding of what users actually need. Victor Frost also observed,
Unfortunately, defining application-level performance objectives may be elusive. For example, users would always want to download a map or image instantaneously, but would they accept a [slower] response? A 10-minute response time would clearly be unacceptable for users directly connected to a high-speed network; but is this still true for users connected via performance-disadvantaged wireless links? . . . Performance-related deficiencies of currently available computing and communications capabilities are difficult to define without user-level performance specifications.
Security is essential to national-scale applications such as health care, manufacturing, and electronic commerce. It also is important to crisis management, particularly in situations where an active adversary is involved or sensitive information must be communicated. Many traditional ideas of network security must be reconsidered for these applications in light of the greater scale and diversity of the infrastructure and the increased role of nonexperts.
To begin with, the nature of security policies may evolve. Longer-term research on new models of composability of policies will be needed as people begin to communicate more frequently with other people whom they do not know and may not fully trust. On a more short-term basis, new security models are needed to handle the new degree of mobility of users and possibly organizations. The usability or user acceptability of security mechanisms will assume new importance, especially those that inconvenience legitimate use too severely. New perspectives may be required on setting the boundaries of security policies not based on physical location.
Composability of Security Policies
As organizations and individuals form and re-form themselves into new and different groupings, their security policies must also be adapted to the changes. Three reorganization models—partitioning, subsumption, and federation—may be used, and each may engender changes in security policies. The following are simplistic descriptions, but they capture the general nature of changes that may occur. Partitioning involves a divergence of activity where unanimity or cooperation previously existed. In terms of security, partitioning does not appear to introduce a new paradigm or new problems. In contrast, subsumption and federation both involve some form of merging or aligning of activities and policies. Subsumption implies that one entity plays a primary role, while at least one other assumes a secondary role. Federation, on the other hand, implies an equal partnering or relationship. Both subsumption and federation may require that
security policies be realigned, while perhaps seeking ways to continue to support previous policies and mechanisms. Both models of joining may be found in crisis management, as local emergency services agencies provide radio networks that other organizations brought in from outside must interact with and/or assume control over.
If policies and mechanisms are to be subsumed, the problems for security become significantly more difficult to address than in the past. In this case, if a unified top-level policy is to be enforced that is a composite of several others, interfaces among them—or, more abstractly, definitions of the policies, abstraction, and modularity—will be necessary to allow for exchange in controlled and well-known ways. It is only through such formal definitions that the composition of such activities can be sufficiently trustable to allow for the provision of a top-level composite of security policies and mechanisms.
A perhaps even more difficult problem is peer-level interaction within a federated model, in which neither domain's security policy takes clear precedence over the other. Such interaction will become more common as alliances are formed among organizations and individuals who are widely distributed. As virtual networks are set up in conjunction with temporary relationships, there is a continued need for security during any coordinated activities within the affiliation. Thus, the security mechanisms required by each participant must collaborate in ways that do not impede the coordination of their activities. Since there is no domination model in this case, coordination and compromise may be necessary. Again, these problems will be helped by research that provides better modularity and abstraction in order to formalize the relationships and interactions.
Mobility of Access Rights
In many, perhaps all, of the national-scale applications, users can be expected to move from one security policy domain or sphere to another and have a need to continue to function (e.g., carrying a portable computer from the wireless network environment of one's employer into that of a customer, supplier, or competitor). In some cases, the mobile user's primary objective will be to interact with the new local environment; in others, it will be to continue activities within the original or home domain. Most likely, the activities will involve some of both. In the first case, the user can be given a completely new identity with accruing security privileges in the new environment; alternatively, an agreement can be reached between the two domains, such that the new one trusts the old one to some degree, using that as the basis for any policy constraints on the user. This requires reciprocal agreements of trust between any relevant security domains. It is even possible to envisage cascading such trust, in either a hierarchical trust model or something less structured in which a mesh of trust develops with time,
supporting transitive trust among domains. There is significant work to be done in such an area.
Mobile users who want to connect back to their home domain from a foreign one also have several alternatives. It is likely that the local domain will require some form of authentication and authorization of users. The remote domain might either accept that authentication, based on some form of mutual trust between the domains, or require separate, direct authentication and authorization from the user. In addition, such remote access may raise problems of exposure of activities, such as lack of privacy, greater potential for masquerading and spoofing, or denial of service, because all communication must now be transported through environments that may not be trusted.
If the user is trying to merge activities in the two environments, it is likely that a merged authentication and authorization policy will be the only rational solution. It is certainly imaginable that such a merged or federated policy might still be implemented using different security mechanisms in each domain, as long as the interfaces to the domains are explicit so that a composite can be created.
Usability of Security Mechanisms
Usability in a security context means not only that both system and network security must be easy for the end users (such as rescue workers or bank customers and officers) to use, but also that the exercise of translation from policy into configuration must be achievable by people in the field who are defining the policies and who may not be security experts. If security systems cannot be used and configured easily by people whose main objectives are completing other tasks, the mechanisms will be subverted. According to Daniel Duchamp, "Two obvious points . . . need considerable work. First, for disasters especially, technology should intrude as little as possible on the consciousness of field workers. Second, all goals should be achieved without the need for system administrators." Users often do not place a high priority on the security of the resources they are using, because the threats do not weigh heavily against the objective of achieving some other goal. Thus, the cost (including inconvenience) to these users must be commensurate with the perceived level of utility. As Richard Entlich, of the Institute for Defense Analyses, observed, "Creating a realistic way of providing security at each node involves not only technical issues, but a change in operational procedures and user attitudes." Ideally, technological designs and approaches should reinforce those needed changes on the part of users.
Unfortunately, the problems of formulating security policy are even more difficult to address with computational and communications facilities. Policy formation, especially when it involves merging several different security domains, is extremely complex. It must be based on the tasks to be achieved, the probability of subversion if security policy constraints are too obstructive, and
the capabilities of the mechanisms available, especially when merging of separate resources is necessary.
Discovery of Resources
Crisis management highlights the need for rapid resource discovery. Resources may be electronic, such as information or services, or they may be more tangible, such as computers, printers, and wires used in networks. First, one must determine what resources are needed. Then, perhaps with help from information networks, one might be able to discover which resources are local and, if those are inadequate, whether some remote resources may be able to address an otherwise insoluble problem. An example of this latter situation would be finding an expert on an unusual bacterial infection that appears to have broken out in a given location.
In crises, some of the tools mentioned above for network management and reorganization in the face of partial failures may also help to identify which local computing, communications, and networking resources are functional. If high-performance computing is necessary for a given task, such as additional or more detailed weather forecasting or geological (earthquake) modeling, discovering computing and network facilities that are remote and accessible via adequately capable network connections might be invaluable.
Another architectural requirement common to several of the application areas is the ability to create virtual subnetworks. The virtual "subnet" feature allows communities to be created for special purposes. For example, in manufacturing, the creation of a virtual subnet for a company and its subcontractors might simplify the building of applications by providing a shared engineering design tool. It would allow a global or national corporation to operate as though it had a private subnet. It might provide similar features for any community, such as a network of hospitals that has a need to exchange patient records.
A virtual subnet will appear to applications and supporting software as if communications are happening on a separate network that actually is configured within a larger one. In essence, the virtual subnet capability allows a policy or activity boundary concept to be made evident in the network model as a subnet. At present, virtual subnets are generally used to reflect administrative domains in which a single consistent set of usage and access policies is enforced.
The possibility of defining a subnet for crisis management in terms of security and priority access has already been suggested. Another potentially useful way to define a boundary around a subnet would be to control the flow of information passing into that subnet by using priority-based filtering
mechanisms. This would be done to reserve scarce bandwidth and storage within the subnet for only the most valuable information.
In order to make virtual subnets useful, there must be automated ways of creating them within the Internet or the broader national or global information infrastructure. This implies understanding the policies to be enforced on such a subnet with respect to, for example, usage and security, and being able to both recognize and requisition resources to create and manage subnets. It may mean provision of various services within the network in such a way that those services can be provided effectively to subnets. Examples of these might be trusted encryption services, firewalls, protocol conversion gateways, and others. A virtual subnet must have all the characteristics of a physical subnet, while allowing its members to be widely distributed physically.5
By providing application- or user-level community boundary models down into the network, one might create a more robust, survivable environment in which to build applications. Both advances in technology development and more fundamental research on architectural models for subnets are needed to automate support for creating such subnets in real time and on a significantly larger scale than is currently supported.
COMPUTATION: DISTRIBUTED COMPUTING
The networked computational and mass storage resources needed for national-scale application areas are necessarily heterogeneous and geographically distributed. A geographically remote, accessible metacomputing resource, as envisioned in the Crisis 2005 scenario in Chapter 1, implies network-based adaptive links among available people (using portable computers and communications, such as personal digital assistants) to large-scale computation on high-performance computing platforms. The network connecting these computing and storage resources is the enabling technology for what might be termed a network-intensive style of computation. Allen Sears, of DARPA, summarized this idea as "the network is the computer"; that is, computation to address a user's problem may routinely take place out on a network, somewhere other than the user's location.
Crisis management is a good example of a network-intensive application. People responding to crises could benefit from larger-scale mass storage and higher computation rates than are typically available in the field, for example, to gain the benefits of high-performance simulation performed away from the crisis location.6 The technical implication of network-intensive computing for crisis management is not merely a massive computational capability, but rather an appropriately balanced computing and communications hierarchy. This would integrate computing, storage, and data communications across a scale from lightweight portable computers in the field to remote, geographically distributed high-performance computation and mass storage systems for database and simulation
support. Research in many areas, such as mobility and coordination of resources and management of distributed computing, is needed to achieve this balanced hierarchy.
Modeling and Simulation
High-performance computation may be used to simulate complex systems, both natural and man-made, for many applications. Networks can make high-performance computation resources remotely accessible, enabling sharing of expensive resources among users throughout the nation. Applications of modeling and simulation to crisis management include the prediction of severe storms, flooding, wildfire evolution, toxic liquid and gas dispersion, structural damage, and other phenomena. As discussed in Chapter 1, higher-quality predictions than are available today could save lives and reduce the cost of response significantly.
Grand Challenge activities under the High Performance Computing and Communications Initiative (HPCCI) have been a factor in advancing the state of the art of modeling and simulation (CSTB, 1995a; OSTP, 1993, 1994a; NSTC, 1995). The speed of current high-performance simulation for many different applications, however, continues to need improvement. Lee Holcomb, of the National Aeronautics and Space Administration (NASA), observed, for example, that it is currently infeasible for long-term climate change models to involve the coupling of ocean and atmospheric effects, because of inadequate speed of the models for simulating atmospheric effects (which change much more rapidly than ocean effects and therefore must be modeled accordingly). In addition, whereas fluid dynamics models are able to produce very nice pictures of airflow around aircraft wings and to calculate lift, they are not able to model drag accurately, which is the other basic flight characteristic required in aircraft design. Holcomb summarized, "We have requirements that go well beyond the current goals of the High Performance Computing Program."
The urgency of crises imposes a requirement that may pertain more strictly in crisis management than in other applications such as computational science: the ability to run simulations at varying scales of resolution is crucial to being able to make appropriate trade-offs between the accuracy of the prediction and the affordability and speed of the response. Kelvin Droegemeier, of the University of Oklahoma, described work on severe thunderstorm modeling at the university's Center for the Analysis and Prediction of Storms (CAPS), including field trials in 1995 that demonstrated the ability to generate and deliver high-performance modeling results within a time frame useful to crisis managers. For areas within 30 km of a Doppler radar station, microscale predictions have been made at a 1-km scale and can predict rapidly developing events, such as microbursts, heavy rain, hail, and electrical buildup, on 10- to 30-minute time scales. At scales of 1 to more than 10 km, the emergence and intensity of new thunderstorms, cloud ceiling, and visibility have been predicted up to two hours in advance, and
the evolution (e.g., movement, change in intensity) of existing storms has been forecast three to six hours in advance. Rescaling the model thus allows greater detail to be generated where it is most needed, in response to demands from the field.7
As Droegemeier noted, time is critical for results to be of operational value:
These forecasts are only good for about six hours. This means you have to collect the observational data, primarily from Doppler radars; retrieve from these data various quantities that cannot be observed directly; generate an initial state for the model; run the model; generate the forecast products; and make forecast decisions in a time frame of 30 to 60 minutes because otherwise, you have eaten up a good portion of your forecast period. It is a very timely problem that absolutely requires high-performance computing and communications. If you can't predict the weather significantly faster than it evolves, then the prediction is obviously useless.
When high performance is required, adding complexity at various scales of prediction may not be worth the cost in time or computer resource usage. For example, the CAPS storm model could predict not only the presence of hail, but the average size of the hailstones; however, the cost is probably beyond what one would be willing to pay computationally to have that detail in real time. Because the model's performance scales with added computing capacity, more detailed predictions can in principle be made if enough computational resources can be coordinated to perform them.8
Crisis managers also require a sense of the reliability of data they work with-the "error bars" around simulation results. To achieve this, an ensemble of simulations may be run using slightly different initial conditions. Ensemble simulation is especially important for chaotic phenomena, where points of great divergence from similar input conditions may not be readily apparent. Ensemble simulation is ideally suited for running in parallel, because the processes are essentially identical and do not communicate with or influence each other. The difficult problem is identifying how to alter the initial conditions. As Droegemeier noted, Monte Carlo simulation optimizes these variations to give the best results, but depends on a knowledge of the natural variability of the modeled phenomena that is not always available (e.g., in the case of severe thunderstorm phenomena at the particular scales CAPS is modeling). The infrequency of large crises makes it difficult to gain this understanding of natural variability in some cases. More broadly, it impedes verifying models of extraordinary events. As Robert Kehlet, of the Defense Nuclear Agency, said, "We are in the awkward position of not wanting to have to deal with a disaster, but needing a disaster to be able to verify and validate our models."
Besides resources to perform the computations, remote modeling and simulation also implies the need for adequate network capability to transport input data to the model and distribute results to the scene. Input data collection requirements may be most demanding if large amounts of real-time sensor data are
involved (see the section "Sensors and Data Collection" below). Sensors will ideally send compressed digitized data in packets that are compatible with existing high-speed networks. However, the observation by Egill Hauksson, of the California Institute of Technology, that high-speed network costs remain too high for nonexperimental applications suggests that additional network research and deployment could be necessary to make this a practical reality for crisis management.
Don Eddington, of the Naval Research and Development Center, outlined a model, tested in the JWID exercises, for performing and/or integrating the results of simulations at "anchor desks." Anchor desks, located away from the front line of crisis, could be staffed with people expert at running and interpreting simulations, who could disseminate results to the field when conditions warrant (e.g., a major change in the situation is detected). This model reduces the amount of network traffic below that required by full-time connection from the field to the remote high-performance computing platform. Distributing results can be done by simply distributing a map or picture of the simulation result.
However, if information is to be integrated with other data available to workers at the scene, or if complex three-dimensional visualizations of the results are called for, a picture or map may not suffice and a complete data file must be sent. (Needs for information integration and display are discussed in the next two sections.) This implies higher-bandwidth connections and greater display capabilities on the front line user's platform. Ultimately, finding the optimal balance of resources for various kinds of crises will require experimentation in training exercises and actual deployments. It should also be influenced by social science research on how crisis managers actually use information provided to them.
Mobility of Computation and Data
Efficiency and performance typically demand that a computation be carried out near its input and output data. Although the traditional solution is to move the data to the computation, sometimes the computation requires so much data so quickly that it is better to move the computation to the data. Since the appropriate software may not already reside on the target system, an executable or interpretable program may have to be transmitted across the network and executed remotely. This extends the meaning of the term relocatable beyond the ability of programmers to port code easily from one platform to another to the ability of code to operate in a truly platform-independent manner in response to urgent demands.
In some circumstances, achieving high performance requires that the application software be optimized specifically for the machine on which it is to operate, which usually requires recompilation of the application. For this approach to have the desired effect, the compilation environment must be able to tailor the application to the specific target machine. This tailoring will not work unless the
application is written in a machine-independent implementation language and it can be compiled on each target machine to achieve performance comparable to the best possible on that machine using the same algorithm.
This problem—compiler and language support for machine-independent programming—is one of the key challenges in high-performance computation. Although languages such as High Performance Fortran (HPF) and standard interfaces like the Message Passing Interface (MPI) are excellent first steps for parallel computing, the machine-independent programming problem remains an important subject of continuing research. Comments from Lee Holcomb indicate that although progress has been made, research on machine-independent programming remains crucial to high-performance computing in all areas, not just crisis management:
I think [programming for high-performance computing] is getting better. I think many of the machines coming out today, as opposed to the ones that were produced, say, a year and a half to two years ago, provide a much better environment. But when you ask a lot of computational scientists, who have spent their whole life porting the current [code] over to one machine and then on to the next, when you give them the third machine to port it over to and have to retune it, they lose a lot of interest and enthusiasm.
An ability to relocate computation rapidly will require dynamic binding of code at run time to common software and system services (e.g., input-output, storage access). This implies a need for further development and standardization of those services (e.g., through common application programming interfaces; APIs) such that software can be written to take advantage of them.
However, software applications that were not originally written to be relocatable may require a wrapper to translate their interfaces for the remote system. In manufacturing applications, such wrappers are prewritten, which is often a costly, labor-intensive process. Research on generic methods enabling more rapid construction of wrappers for software applications—ultimately, producing them "on the fly," as might be required in a crisis—was identified by workshop participants as potentially valuable but currently quite challenging. Advances in wrapper generation for software applications would enable more reuse of software and would benefit many areas in addition to crisis management. However, such advances will require basic research leading to an ability to model, predict, and reason about software systems composed of heterogeneous parts that is far beyond current capabilities. These advances could be more generally relevant to many aspects of software systems, as discussed below in the section "Software System Development."
Storage Servers and Meta-Data
Crisis management applications employ databases of substantial size. For example, workshop participants estimated that a database of the relevant
infrastructure (e.g., utilities, building plans) of Los Angeles requires about 2 terabytes. Not all of it must be handled by any one computer at one time; however, all of it may potentially have to be available through the response organization's distributed communications. In addition, a wide variety of data formats and representations occur and must be handled; this may always be the case because of the unpredictability of some needs for data in a crisis. Reformatting data rapidly through services such as those discussed in the section ''Information Management" can be computationally intensive and require fast storage media.
Comprehensive provisions must also be made for storing not only data, but collateral information (meta-data) needed to interpret the data. Besides concerns appropriate to all distributed file systems (authentication, authorization, naming, and the like), these involve issues of data validity, quality, and timeliness, all of which are needed for reliable use of the data, and semantic self-description to support integration and interoperability.
To customize information handling for particular applications, storage server software should be able to interpret and respond to the meta-data. Workshop participants suggested that in crisis management, for example, a scheme could be developed to use meta-data to limit the use of scarce bandwidth and to minimize storage media access time while accommodating incoherence of data distributed throughout the response organization. To conserve bandwidth, a central database system located outside the immediate crisis area could maintain a copy of the information stored in each computer at the scene of the crisis. Instead of replicating whole databases across the network when new information alters, contradicts, or extends the information in either copy, a more limited communication could take place to restore coherence between copies or at least provide a more consistent depiction of the situation. A "smart" coherence protocol could relay only changes in the data, or perhaps an executable program to accomplish them. Relevant meta-data for making these determinations might include, for example, time of last update for each data point, so that new data can be identified, and an estimate of quality, to avoid replacing older but "good" data with newer "less good" data.
Besides resource conservation, a beneficial side effect of this coherence scheme would be the creation of a fairly accurate and up-to-date representation of the entire crisis situation, valuable for coordination and decision making. Modeling the coherence and flow of information into, within, and out of the crisis zone could be incorporated into a system that would continuously search for (and perhaps correct) anomalies and inconsistencies in the data. It could also support collaboration and coordination among the people working on a response operation by helping crisis managers know what information other participants have available to them.
Anomaly Detection and Inference of Missing Data
High-performance computing can be used for filling in missing data elements (through machine inference that they are part of a computer-recognizable pattern), information validation, and data fusion in many national-scale applications. For example, crisis data are often incomplete or simply contradictory. Simulation could be used to identify outlier data, flagging potential errors that should be verified. Higher computational performance is required to correct or reconstruct missing data from complex dynamic systems, interpolating information such as wind speeds and directions or floodwater levels through machine inference. Incorrect data—perhaps derived from faulty sensors, taken from out-of-date or incorrect databases, or deliberately introduced by an active adversary—could be detected and corrected by computers in situations where the complexity or volume of the data patterns would make it difficult for a human to notice the error. Ordinarily, the absence of key information requires users to make intuitive judgments; tools that help cope with gaps in information are one element of what workshop participants called "judgment support" (see the section "User-Centered Systems" below in this chapter).
The widespread presence of semantic meta-data could enhance data mining and inference for detecting errors in databases. Data mining in high-performance systems has been effective in other applications, for example, in finding anomalous credit card and medical claims; new applications such as clinical research are also anticipated (see Box 2.2). However, the nature of crises is such that data being examined for anomalies may be of an unanticipated nature and may not be fully understood. There is a challenge for research in identifying the right types of meta-data that could make data mining and inference over those unanticipated data possible.
Sensors and Data Collection
More widespread use of networked sensors could generate valuable inputs for crisis management, as well as remote health care and manufacturing process automation. The variety of potentially useful sensors is particularly broad in crisis management, including environmental monitors such as those deployed in the Oklahoma MesoNet or the NEXRAD (Next Generation Weather Radar) Doppler radar system; video cameras that have been installed to enhance security or monitor vehicle traffic; and structural sensors (as in "smart" buildings, bridges, and other structures networked with stress and strain sensors).
Some imagery, such as photographs of a building before it collapsed or satellite photographs showing the current extent of a wildfire, are potential input data for simulation. Timely access to and sharing of these data require high-performance communication, including network management, both to and from
Historically, challenges posed by medical problems have motivated many advances in the fields of statistics and artificial intelligence. Traditionally, researchers in both fields have had to make do with relatively small medical datasets that typically consisted of no more than a few thousand patient records. This situation will change dramatically over the next decade, by which time we anticipate that most health care organizations will have adopted computerized patient record systems. A decade from now, we can expect that there will be some 100 million and eventually many more patient records with, for example, a full database size of 10 terabytes, corresponding to 100 text pages of information for each of 100 million patients. Functionalities needed in the use and analysis of distributed medical databases will include segmentation of medical data into typical models or templates (e.g., characterization of disease states) and comparison of individual patients with templates (to aid diagnosis and to establish canonical care maps). The need to explore these large datasets will drive research projects in statistics, optimization, and artificial intelligence. . . .
Care providers and managers will want to be able to rapidly analyze data extracted from large distributed and parallel databases that contain both text and image data. We anticipate that . . . significant performance issues . . . will arise because of the demand to interactively analyze large (multi-terabyte) datasets. Users will want to minimize waste of time and funds due to searches that reveal little or no relevant information in response to a query, or retrieval of irrelevant, incorrect or corrupted datasets.
SOURCE: Davis et al. (1995), as summarized at Workshop III by Joel Saltz, of the University of Maryland.
the crisis scene. Moreover, models could be designed to take real-time sensor inputs and modify their parameters accordingly to accomplish a more powerful capability to predict phenomena. As Donald Brown, of the University of Virginia, noted, the nonlinearity of many real-world phenomena poses challenges for modeling; learning how to incorporate these nonlinearities into models directly from sensors could improve the performance of models significantly.
Sometimes a sensor designed for one purpose can be used opportunistically for another. For example, an addressable network of electric utility power usage monitors could be used to determine which buildings still have power after an earthquake, and which of those buildings with power are likely to have occupants. A similar approach could be taken using the resources of a residential network service provider. Workshop participants suggested that security cameras also provide opportunities for unusual use; with ingenuity it may be possible to estimate the amplitude and frequency of an earth tremor or the rate at which rain falls by processing video images. Given the high cost of dedicated sensor
networks and the infrequency of crises, technology to better exploit existing sensors opportunistically could facilitate their use.
People carrying sensors might be another effective mode of sensor network deployment. Robert Kehlet noted that field workers could wear digital cameras on their helmets; personal geographic position monitors could be used to correlate the video data with position on a map. Physical condition monitors on workers in dangerous situations could hasten response if someone is injured.
Research is needed to optimize architectures for processing real-time information from large, highly scalable numbers of inputs.9 The problem is likely amenable to parallel processing, as demonstrated on a smaller scale in research described by Jon Webb, of Carnegie Mellon University, on machine vision synthesized from large numbers of relatively inexpensive cameras. A highly decentralized architecture, perhaps using processors built into the sensors themselves (sometimes characterized as "intelligence within the network"), might be a highly effective way to conserve bandwidth and processing; sensors could detect from their neighbors whether a significant change in overall state is occurring and could communicate that fact to a central location, otherwise remaining silent. There could be value in research and development toward a network designed such that, in response to bandwidth or storage constraints in the network, discrete groups of sensors perform some data fusion before passing their data forward; an adaptive architecture could permit this feature to adjust to changing constraints and priorities.
Distributed Resource Management
Network-intensive computing places unusual stress on conventional computer system management and operation practice. Describing the general research challenge, Randy Katz said,
We tend to forget about the fact that [the information infrastructure] won't be just servers and clients, information servers or data servers. There are going to be compute-servers or specialized equipment out there that can do certain functions for us. It will be interesting to understand what it takes to build applications that can discover that such special high-performance engines that exist out there can split off a piece of themselves, execute on it, and recombine when the computation is done.
Because significant remote computing and storage resources may be necessary, standardized services for resource allocation and usage accounting are important. Other important issues are enforcing the proper use of network resources, determining the scale and quality of service available, and establishing priorities among the users and uses. Mechanisms are needed to address these issues automatically and dynamically. Operating system resource management is weak in this area because it treats tasks more or less identically. For example, many current
network-aware batch systems are configured and administered manually and support no rational network-wide basis for cost determination.
Dennis Gannon, of Indiana University, suggested the value of continued development of network resource management tools as follows: "High-performance computing . . . should be part of the fabric of the tools we use. It should be possible for a desktop applications at one site to invoke the resources of a supercomputer or a specialized computing instrument based on the requirements of the problem. A network resource request broker should provide cost-effective solutions based on the capabilities of compute servers." He pointed to the Information Wide-Area Year (I-WAY) experimental network as a useful early demonstration of such capabilities.10
Software System Development
To the extent that it improves capabilities for integrating software components as they relocate and interact throughout networks, research enabling a network-intensive style of computing may be helpful in addressing a long-standing, fundamental problem for many application areas, that of large software system development. Speaking about electronic commerce systems, Daniel Schutzer, of Citibank, said succinctly, "The programming bottleneck is still there." DARPA's Duane Adams described the problem as follows:
Many of our application programs [at DARPA] are developing complex, software-intensive systems. For example, we are developing control systems for unmanned aerial vehicles (UAVs) that can fly an airplane for 24 or 36 hours at ranges of 3,000 miles from home; we are developing simulation-based design systems to aid in the design of a new generation of ships; and we are developing complex command and control systems. These projects are using very few of the advanced information technologies that are being developed elsewhere in [D]ARPA—new languages, software development methodologies and tools, reusable components. So we still face many of the same problems that we have had for years.
This raises some interesting technology problems. Are we working on the right set of problems, and are we making progress? How do we take this technology and actually insert it into projects that are building real systems? I think one of the biggest challenges we face is building complex systems. We have talked about some of the problems. One of them is clearly a scaling problem . . . scaling to the number of processors in some of the massively parallel systems [and] . . . scaling software so you can build very large systems and incrementally build them, evolve them over time.
Software reuse through integration of existing components with new ones is necessary to avoid the cost of reproducing functionality for new applications from scratch. Building large systems often needs to be done rapidly, and because most large systems have a long, evolutionary lifetime, they must be designed to
change. However, these are not easy challenges. Distributed object libraries such as those facilitated by the Common Object Request Broker Architecture (CORBA; discussed in the next section, "Information Management") may be useful, but more developed frameworks and infrastructure are needed to make them fully usable in the building of applications by large distributed teams of people. Basic tools to support scalable reuse, to catalogue and locate them, and to manage versioning are still primitive.
It is clear that getting even currently available system-building technologies and methods into actual use in the software development enterprise is a major challenge. Changing the work practices of organizations takes time; however, there may be ways in which collaboration technology can make it easier to incorporate the available techniques into work practices more smoothly. A collaboration environment that allows software development teams to manage the complex interactions among their activities could reap benefits across the spectrum of applications. Dennis Gannon identified the need to design a "problem-solving environment" technology that provides an infrastructure of tools to allow a distributed set of experts and users to build, for example, large and complex computer applications. Participants in Workshop I developed a subjective report card rating the current state of the art in computing environments as follows:
In the absence of a deeper understanding of large, distributed software systems, however, new tools are not likely to improve the situation. Decades of experience with software engineering indicate that the problems are difficult—they are not solvable purely by putting larger teams of engineers to work or by making new tools and techniques available (CSTB, 1992, pp. 103-107; CSTB, 1989). Barbara Liskov, of the Massachusetts Institute of Technology, cited the need for a good model of distributed computation on which to develop systems and reason about their characteristics—a software infrastructure, not just a programming language or a collection of tools, that would support a way of thinking about programs, how they communicate, and their underlying memory model
At Workshop III, Barbara Liskov, of the Massachusetts Institute of Technology, observed:
"People have to write programs that run on these [large-scale, distributed] systems. Applications need to be distributed, and they have got to work, and they must do their job with the right kind of performance. . . . These applications are difficult to build. One of the things I was struck by in the conversations today was the very ad hoc way that people were thinking of building systems. It was just a bunch of stuff that you connect together—this component meshes with that component. You know, we can't build systems that way. And the truth is we hardly know how to build systems that worked on the old kind of distributed network. . . .
"We have a real software problem. If I want to build an application where I can reason about its correctness and its performance under a number of adverse conditions, what I need is a good model of computation on which to base that system, so that I have some basis for thinking about what it is doing. I think what we need is a software infrastructure, and I don't mean by this a programming language and I also don't mean some bag of tools that some manufacturers make available. I mean some way of thinking about what programs are, what their components are, where these components live, how they communicate, what kind of access they have to shared memory, what kind of model of memory it is, whether there is a uniform model, whether it is a model where different pieces of the memory have different fault tolerance characteristics, what is the fault tolerance model of the system as a whole, what kinds of atomicity guarantees are provided, and so on. We don't have anything approaching this kind of model for people to build their applications on today."
(see Box 2.3). A consistent software infrastructure model of computation could form the basis not only for building systems using that model but for reasoning about their correctness and performance as they are being built. It would be extremely useful for system developers to be able to predict the performance, fault tolerance, or other specified features of a system composed from parts whose properties are known.
This problem of composability of software components is very difficult and requires fundamental research. Increased understanding, however, could support a valuable increase in the ability to build systems driven by application needs. Dennis Gannon said, "We should be able to have software protocols that would allow us to request a computing capability based on the problem specification, not based on machine architectures." This will be especially crucial as the stability of discrete machine architectures becomes less fixed with network-centered computing. For example, Vinton Cerf observed that in network-intensive computing, the buses of the traditional computer architecture are replaced in some
respects with network links of a reliability that is unpredictable and often less than perfect. There must also be a way of representing the cost, reliability, and bandwidth trade-offs of various network links in a way that software can understand and act upon, so they can be optimized according to the needs of the problem at hand. These fundamental issues of computation represent a difficult but potentially very valuable avenue for investigation.
INFORMATION MANAGEMENT: FINDING AND INTEGRATING RESOURCES
In the past decade, there have been important transitions in information management technologies used in large organizations. This is usually characterized as a shift from centralized to more distributed resources, but perhaps a more accurate characterization is that it is a better balancing between centralized and distributed control of information production, location, and access. Technologies such as client-server architectures and distributed on-line transaction processing systems have enabled this more effective balancing. It is an ongoing activity at all levels of organization structure, from central databases to individual and group-specific resources.
This situation, difficult as it is within a single organization, becomes much more complex with the scale up to national, multiorganizational applications. This section considers the information management challenges posed by national-scale applications, with particular emphasis on crisis management. It examines several important issues and trends in information management and suggests additional challenges.
Information management involves a broad range of resources with different purposes, such as traditional databases (typically relational), digital libraries, multimedia databases (sometimes used in video servers), object request brokers (such as those in CORBA), wide-area file systems (such as the Network File System and Andrew File System), corporate information webs based on groupware and/or the World Wide Web, and others. Besides relational tables, conventional types of information objects can include multimedia objects (images, video, hypermedia), structured documents (possibly incorporating network-mobile, executable software components, or "applets"), geographical coordinate systems, and application- or task-specific data types. It is useful to classify these information management resources into four organizational categories: (1) central institutional resources, (2) individual desktop resources, (3) group resources, and (4) ubiquitous resources such as the communications network and e-mail service.
Central resources include institutional databases, digital libraries, and other centrally managed information stores. These typically have regular schemas; extensive support for concurrency and robust access; and supporting policy frameworks to maintain (or at least monitor) quality, consistency, security,
completeness, and other attributes. Data models for institutional resources are evolving in several ways (such as the evolution from relational to object-relational databases), but these models are meant to support large-scale and uniform data management applications.
Individual resources consist of ad hoc structures. These resources may be in a process of evolving into more regular structures of broader value to an organization (a process often called upscaling). Alternatively, they may be individual resources that differentiate and provide a competitive edge to the individual and so are unlikely to be shared.
Group resources can include scaled-down and specialized institutional resources as well as ad hoc shared resources. This suggests a continuum from ad hoc ephemeral individual resources through group resources to robust managed institutional resources. Examples of group resources are engineering design repositories, software libraries, and partially formulated logistics plans.
The final class of resources, which may be called ubiquitous resources, consists of shared communications and information services on a communications network, including services such as electronic mail, newsgroups, and the World Wide Web. These services exist uniformly throughout an organization and, unlike the other classes of resources, generally do not reflect organizational hierarchy.
This classification of resources provides a useful framework for examining broad trends in information management and considering, particularly, the special problems associated with national-scale applications, such as the following:
- Information integration. In many of these applications, information must be integrated with other information in diverse formats. This may include integration of diverse access control regimes to enable appropriate sharing of information while simultaneously maintaining confidentiality and integrity. It can also include integration of institutional, group, and personal information. Related to the integration problem is the issue of information location—how can information be indexed and searched to support national-scale applications?
- Meta-data and types. Shared objects in very large-scale applications can have a rich variety of types, and these types can be very complex. An example of a family of complex types is the diversity of representations and formats for image information. How can objects be shared and used when their types are evolving, perhaps not at the same pace as the applications software that uses them? Also, there is an evolving view of information objects as aggregations of information, computation, and structure. How will this new view affect information management more broadly? Related to this is the more general issue of meta-data: descriptive information about data, including context (origin, ownership, etc.) as well as syntactic and semantic information. Meta-data approaches are needed that support modeling of quality and other attributes through an
- integration process. This could include integration of information that may appear to be inconsistent due to quality problems.
- Production and value. A final information product can be derived through a series of steps involving multiple information producers and organizations. This involves addressing the development of models for the kinds of steps that add value to a product beyond the information integration problem mentioned above.
- Distribution and relocation. The linking of information resources at all levels into national-scale applications places great stress on a variety of distributed computing issues such as robustness and survivability, name management, and flexible networking. In addition, there is the issue of adaptivity—the interplay of network capability and applications behavior.
Before examining these four issues in greater detail, it is useful to point out some general trends in information management that are part of the evolution already under way to national-scale applications.
First, the ongoing shift over the past decade from central mainframe resources to more distributed client-server configurations is giving way to a steady migration of both resources and control over resources within organizations. This suggests that the main challenge is to better enable this shift as an ongoing process, rather than as a one-time effort. This steady flux is sustained by the emergence of ad hoc groups that establish and manage their own resources (which must later be integrated with others), by a continual change and improvement in information management technologies, and by structural change within organizations. A military joint task force and a civilian crisis action team are examples of ad hoc groups that both establish their own resources and rely on a broad range of institutional resources. In other words, we are just beginning to explore the interplay among institutional information resources, individual ad hoc information resources, and communications and information services such as electronic mail and the World Wide Web.
Second, the complexity and quantity of information, the range and diversity of sources, and the range of types and structures for information are all increasing rapidly, as is the need to assimilate and exploit information rapidly. The problem is not an overload of information, but rather a greater challenge to manage it effectively. Also, as noted above, the nature of the information items is changing: they have more explicit structure, more information about their type, more semantic information, and more computational content. There are also increasingly stringent requirements to manage intellectual property protection and support commerce in information items.
Finally, there is greater interconnectivity and heterogeneity both within and among organizations. This enables more complex information pathways, but it also creates greater challenges to the effective management of information. Related to this trend is the rapidly increased extent to which information users are
becoming information producers. The World Wide Web presents the most compelling evidence of this; when barriers are reduced sufficiently, greater numbers of people will make information available on the network. When electronic commerce technologies become widely used, in the relatively near future, this will create a rich and complex marketplace for information products.
Integration and Location
National-scale applications involve large numbers of participating organizations with multiple focal points of organizational control and multiple needs for information. They often involve solving information management problems that rely on multiple sources of data, possibly including legacy databases that are difficult to reengineer. This creates a problem of information integration in which multiple information resources, with different schemas, data representations, access management schemes, locations, and other characteristics, may have to be combined to solve queries. As discussed in Chapter 1, sometimes this information can be preassembled and integrated in response to mutually agreed-upon, anticipated needs; however, this is not always feasible. Strategies that make integration feasible are needed to meet the short-term press of crises, and they may well have utility in reducing costs and otherwise facilitating information integration in other, less time-sensitive applications, which Chapter 1 discusses with respect to digital libraries.
Information integration is an area of active research aimed at introducing advances over traditional concepts of wrappers and mediators. A ''wrapper" for a database provides it with a new virtual interface, enabling the database to appear to have a particular data model that conforms to a user's requirement for which the database may not have been designed. A "mediator" provides a common presentation for a schema element that is managed differently in a set of related database. A mediator can thus translate different users' requests into the common presentation, which multiple wrappers sharing that presentation can then translate into forms understood by the resources they interact with (i.e., "wrap"). Thus, mediators and wrappers give users a uniform way to access a set of databases integrated into a system, so that they appear as a single virtual aggregate database. In the past, much of this work has been performed on a laborious, ad hoc basis; more general-purpose approaches, such as The Stanford-IBM Manager of Multiple Information Sources (TSIMMIS; see Box 2.4) aim at producing mediation architectures of more general use.
Most research now under way focuses on how a virtual aggregate database can be engineered for a set of existing databases. This involves developing data models and schemas suitable for the virtual aggregate, and mappings among the models and schemas for the component databases and the common data model and schema elements. When this is to be done on more than an ad hoc basis, methods are needed to represent the aggregate schemas. When legacy databases
are involved, reverse engineering of those databases may be necessary to determine their schemas. This can be risky, because there are often hidden assumptions and invariants that must be respected if a legacy database is to remain useful. As Yigal Arens, of the University of Southern California, discussed, the information integration problem becomes more difficult when queries to the aggregate database need to be carried out efficiently (subject to a time constraint), creating research challenges for query optimization at the aggregate level.
New approaches in research on information integration are beginning to yield results, but scaling up to national or global scale will significantly complicate the information integration problem. For example, when multiple organizations are involved, access control issues become more important and also more difficult. Just as new schemas are required for the aggregate to reconcile multiple schemas, aggregate access control and security models may also have to be developed. Also, information integration may be complicated by distributed computing issues—for example, a set of databases may be interconnected intermittently or over a low-capacity link, which would affect the way query processing is carried out. This is a familiar issue in distributed databases that becomes more difficult in a heterogeneous setting.
Richer data models have been developed for specialized uses, such as object databases for design applications or information retrieval databases for digital libraries. When these kinds of information assets must be integrated with more traditional databases, the information integration problem can become much more complicated. One way to address this problem is to develop common reusable wrapper and mediator elements that can be adapted easily to apply in a wide range of circumstances.
Applications such as crisis management increase the difficulty of information integration by introducing the need to integrate rapidly a set of databases whose integration was previously not contemplated. The accounts of information management in crisis situations that were presented in the workshops focused on ad hoc information integration solutions designed to meet very specific needs. For example, geographic databases, land use and utility databases, real estate tax databases, and other databases from a variety of sources are necessary to gather information to rapidly process damage claims related to natural disasters such as storms and earthquakes. This suggests that there is value in anticipating this kind of integration, and developing, in advance, a repertoire of task-specific common schemas and associated mediators for legacy databases. This hybrid approach to integration has appeal, in that it supports incremental progress toward common schemas when they can be agreed-upon, and when common schemas cannot be arrived at, mediators can be developed to support interoperability. This also suggests that information integration provides techniques that may be applicable to more general (and less approachable) information fusion problems.11
In addition to integration, there is the related issue of information location. Searching within a database of a specific digital library depends upon finding the
Wrappers and mediators are not new technologies; they have been implemented in an ad hoc fashion for many years. One of the original motivations for work on wrappers was the desire to make legacy programs and information sources (such as databases) accessible to diverse requesting applications across networks. This required laborious, ad hoc production of wrappers that translate requests from users' applications into queries and other commands that the wrapped resources can interpret and will respond to correctly.
Disagreement among workshop participants and additional inputs solicited for this report illustrate the perhaps inevitable breadth of perspectives about what does or does not constitute a new research idea. Some contributors were pessimistic about the likelihood of solving complex integration problems through wrappers and mediators. They suggested, for example, that years of experience have shown that for integration to work well, applications must be written in the expectation that their output will be used as another application's input, or vice versa—leaving unaddressed the problem of integrating legacy programs and information sources that were not written with reuse in mind.1 Others accepted the truth of this observation, but interpreted it as an opportunity for fundamental research, pointing to recent research aimed at developing architectures within which generic techniques may be found for more rapidly and reliably building software components to integrate diverse resources, including legacy resources. Gio Wiederhold has described one example in this vein, a three-layer mediation architecture consisting of the basic information sources and their wrappers; a mediation layer that adds value by merging, indexing, abstracting, etc.; and the users and their applications that need the information (Wiederhold, 1992).
There is a range of research challenges to make such an architecture broadly useful. For example, models for representing diverse information sources and languages for interacting with them must accommodate not only sources with a well-defined schema (e.g., the relational model used in many databases), but others such as text files, spreadsheets, and multimedia.2 Automatic or semiautomatic generation of wrappers would be a significant contribution; this a serious challenge that requires identifying and representing not only the syntactic interfaces but also the semantic content and assumptions of information sources. Some research has focused on rule-based tools for generating wrappers.
Complementary to research on representing characteristics of sources is the formal representation of domain-specific knowledge that users may need to access and explore. This representation could facilitate generation of mediators optimized for understanding requests and translating them into searches that draw upon and integrate multiple information sources, interacting with each source through a wrapper. Yigal Arens, of the University of Southern California, discussed current research on applying a variety of artificial intelligence techniques to partially automate the creation of mediators for specific applications.3 In this approach, a model is constructed to describe not only the structure and content of a set of information sources, but also the knowledge domain about which the sources have information. The mediator translates user queries related to that domain into search strategies, which it then implements. Changes in the range of information sources available (e.g., addition of new sources) can be accommodated by changing the domain model, rather than rebuilding the mediator.
appropriate database or digital library. As Eliot Christian, of the U.S. Geological Survey, observed:
One of the fundamental issues in information discovery is that one cannot afford to digest all available information and so must rely on abstractions. Yet, the user of information may be working in a context quite different from what the information provider anticipated. While cataloging techniques can characterize a bibliographic information resource statically, I would like to see a "feature extraction" approach that would support abstraction of information resources based more closely on the user's needs at the moment of searching. Natural language processing may help in the direction of search based on knowledge representations, but the more general problem is to support a full range of pattern matching to include imagery and numeric models as well as human language. . . .
To me, the most immediate problem is that it is very difficult to find and retrieve information from disparate information sources. Although some progress has been made in building consensus on presentation issues through the likes of Web browsers, tools for client-based network search are conspicuously absent. With server-based searching, one can only search for information in fairly narrow and pre-determined domains, and then only with the particular user interface that the information source thought to provide.
For critical national-scale applications, approaches to this information resource location problem must go beyond the opportunistic searching and browsing characteristic of the Internet. Even when information resources are diverse, if they may have to be used in critical applications—particularly those with urgent deadlines—there would be benefit from registering them and their characteristics
in an organized manner. With improvements, for example, in schema description techniques, this could make the information integration problem more approachable as well.
Information location also relates to the distributed computing issues raised above, since one approach involves dispatching not just passive queries to information sources, but active information "agents" that monitor and interact with information stores on an ongoing basis. Information agents may also deploy other information agents, increasing the challenges (both to the initial dispatcher of the agents and to the various willing hosts) of monitoring and managing large numbers of deployed agents.
Meta-Data and Types
Information is becoming more complex, is interpreted to a greater extent, and supports a much wider range of issues. Evidence of the increase in complexity is found in (1) the growing demand for enriched data models, such as enhancements to the relational model for objects and types; (2) the adoption of various schemes for network-based sharing and integration of objects, such as CORBA; (3) the development of databases that more fully interpret objects, such as deductive databases; (4) the rapid growth in commercial standards and repository technology for structured and multimedia objects; and (5) the integration of small software components, such as applets, into structured documents.
One important approach to managing this increased complexity is the use of explicit meta-data and type information. William Arms, of the Corporation for National Research Initiatives, observed, "Very simple, basic information about information is, first of all, a wonderfully important building block and [second,] . . . a much more difficult question than anybody really likes to admit."
Multimedia databases, for example, typically maintain separate stores for the encoded multimedia material and the supporting meta-data. Meta-data provide additional information about an object, beyond the content that is the object itself. Any attribute can be managed as meta-data. For example, in a multimedia database, meta-data could include index tags, information about the beginnings and endings of scenes, and so on. Meta-data can also include quality information. In crisis management applications, this is crucial, since there are some cases where many of the raw data (40 percent, in David Kehrlein's commercial GIS example discussed in Chapter 1) are inaccurate in some respect. As David Austin, of Edgewater, Maryland, noted, "Often, data are merged and summarized to such an extent that differences attributable to sources of varying validity are lost." Separately distinguishable meta-data about the reliability of sources can help users identify and manage around poor-quality data.
Types are a kind of meta-data that provide information on how objects can be interpreted. In this regard, type information is like the more usual database schema. Types, however, can be task specific and ad hoc. Task specificity
means, for example, that the particular consensus types in the Multi-purpose Internet Mail Extension (MIME) hierarchy are a small subset of the types that could be developed for a particular application.
Because of this task specificity, the evolution of types presents major challenges. For example, the type a user may adopt for a structured document typically evolves over a period of months or years as a result of migration from one desktop publishing system to the next. Either the user resists migration and falls behind technology developments, or the user must somehow manage a set of objects with similar, but not identical types. One approach to this problem is to create a separate set of servers for types that serve up type information and related capabilities (e.g., conversion mechanisms that allow objects to be transformed from one type to another).
A related issue is the evolution of structured objects to contain software components. The distinction between structured documents and assemblies of software components has been blurring for some time, and this trend will complicate further the effective management of structured objects. For example, because a structured object can contain computation, it is no longer benign from the standpoint of security. An information object could threaten confidentiality by embodying a communications channel back to another host, or it could threaten integrity or service access due to computations it makes while within a protected environment. Many concepts are being developed to address these problems, but their interplay with broader information management issues remains to be worked out. This issue also reinforces the increasing convergence between concepts of information management and concepts of software and computation.
Production and Value
National-scale applications provide many more opportunities for information producers to participate in an increasingly rich and complex information marketplace. Every educator, health care professional, and crisis management decision maker creates information, and that information has a particular audience. Technology to support the efficient production of information and, more generally, the creation of value in an information value chain is becoming increasingly important in many application areas and on the Internet in general.
The World Wide Web, even in its present early state of development, provides evidence of the wide range of kinds of value that can be provided beyond what are normally thought of as original content. For example, among the most popular Web services are sites that catalog and index other sites. Many sites are popular because they assess and evaluate other sites. There are services emerging for brokering of information, either locating sites in response to queries or locating likely consumers of produced specialty information. Because of the speed of the electronic network, many steps can be made very efficiently along the way from initial producer to end consumer of information.
Related to these concepts of information value are new information services. For example, there are several candidate services that support commerce in information objects. Because information objects can be delivered rapidly and reliably, they can support commerce models that are very different from models for physical objects. In addition, services are emerging to support information retrieval, serving of complex multimedia objects, and the like. The profusion of information producers on the Web also creates a need for a technology that enables successful small-scale services to scale up to larger-scale and possibly institutional-level services. National-scale applications such as crisis management complicate this picture because they demand attention to quality and timeliness. Thus the capability of an information retrieval system, for example, may be measured in terms of functions ranging from resource availability (for meeting a deadline) to precision and recall.
Distribution and Relocation
As noted above, distributed information resources may have to be applied, in the aggregate, to support national-scale applications. In these applications, there can be considerable diversity that must be managed. The distributed information resources can be public or private, with varying access control, security, and payment provisions. They can include traditional databases, wide-area file systems, digital libraries, object databases, multimedia databases, and miscellaneous ad hoc information resources. They can be available on a major network, on storage media, or in some other form. They also can include a broad range of kinds of data, such as structured text, images, audio, video, multimedia, and application-specific structured types.
For many applications, these issues can interact in numerous ways. For example, when network links are of low capacity or are intermittent, in many cases it may be acceptable to degrade quality. Alternatively, relative availability, distribution, and quality of communications and computing resources may determine the extent to which data and computation migrate over the distributed network. For example, low-capacity links and limited computing resources at the user's location may suggest that query processing is best done at the server; but when clients have significant computing resources and network capacity is adequate, then query processing, if it is complex, could be done at the client site. When multiple distributed databases cooperate in responding to queries, producing aggregated responses, this resource-balancing problem can become more complex; when atomicity and replication issues are taken into account, it can become even more difficult.
In crisis management, resource management and availability issues take on new dimensions. In a crisis, complex information integration problems may yield results that go into public information kiosks. When communications are intermittent or resource constrained, caching and replication techniques must
respond to levels of demand that are unanticipated or are changing rapidly. Can data replicate and migrate effectively without direct manual guidance and intervention? This is more difficult when there are data quality problems or when kiosks support direct interaction and creation of new information.
USER-CENTERED SYSTEMS: DESIGNING APPLICATIONS TO WORK WITH PEOPLE
Research on natural, intuitive user interface technologies has been under way for many years. Although significant progress has been made, workshop participants indicated that a more comprehensive view of the human-computer interface as part of larger systems must be developed in order for these technologies to yield the greatest benefit. Allen Sears observed, "The fact that humans make . . . errors, the fact that humans are impatient, the fact that humans forget—these are the kinds of issues that we need to deal with in integrating humans into the process. The flip side of that . . . is that humans, compared to computers, have orders-of-magnitude more domain-specific knowledge, general knowledge, common sense, and ability to deal with uncertainty."
System designs should focus on integrating humans into the system, not just on providing convenient human-computer interfaces. The term "system" today commonly refers to the distributed, heterogeneous networks, computers, and information that users interact with to build and run applications and to accomplish other tasks. A more useful and accurate view of the user-system relationship is of users as an integral part of the total system and solution space. Among other advantages, this view highlights the need for research integrating computing and communications science and engineering with advances in the understanding of user and organizational characteristics from the social sciences.
Human-centered Systems and Interfaces
Traditional human-computer interface research embraces a wide array of technologies, such as speech synthesis, visualization and virtual reality, recognition of multiple input modes (e.g., speech, gesture, handwriting), language understanding, and many others.12 All applications can benefit from easy and natural interfaces, but these are relative characteristics that vary for different users and settings. A basic principle is that the presentation should be as natural to use as possible, to minimize demands on those with no time or attention to spare for learning how to use an application. This does not necessarily imply simplicity; an interface that is too simple may not provide some capabilities the user needs and lead to frustration.
In addition, designers of interfaces in large-scale applications with diverse users cannot depend on the presence of a particular set of computing and communications resources, so the interfaces must be adaptable to what is available. The
network-distributed nature of many applications requires attention to the scaling of user interfaces across a range of available platforms, with constraints that are diverse and—especially in crises—unpredictable. Constraints include power consumption in portable computers and communications bandwidth. For example, it is important that user interfaces and similar services for accessing a remote computing resource be usable, given the fidelity and quality of service available to the user. An additional focus for research in making interface technologies usable in national-scale applications is reducing their cost.
Crisis management, however, highlights the need to adapt not only to available hardware and software, but also to the user. Variations in training and skills affect what users can do with applications and how they can best interact with them. As David Austin observed:
Training is also critical; people with the proper skill mix are often in short supply. We have not leveraged the technology sufficiently to deliver short bursts of training to help a person gain sufficient proficiency to perform the task of the moment. . . .
[What is needed is] a system that optimizes both the human element and the information technology element using ideas from the object technology world. In such a system, a person's skills would be considered an object; as the person gained and lost skill proficiency over his career, he would be trained and given different jobs [so that he could be part of] a high-performance work force able to match any in the world. The approach involves matching a person with a job and at the same time understanding the skill shortfalls, training in short bursts, and/or tutoring to obtain greater proficiency. As shortfalls are understood by the person, he or she can task the infrastructure to provide just-in-time, just-enough training at the time and place the learner wants and needs it.
In addition, because conditions such as stress and information overload can vary rapidly during a crisis, there would also be value in an ability to monitor the user's performance (e.g., through changes in response time or dexterity) and adapt in real time to the changing capabilities of users under stress. By using this information, applications such as a "crisis manager's electronic aide" could adjust filtering and prioritization to reduce the flood of information given to the user. Improvements in techniques for data fusion in real time among sensors and other inputs would enhance the quality of this filtering. Applications could also be designed to alter their presentation to provide assistance, such as warnings, reminders, or step-by-step menus, if the user appears to be making increasing numbers of errors.
The focus of these opportunities is inherently multidisciplinary. To achieve significant advances in the usability of applications, improvements in particular interface techniques can be augmented by integrating multiple, complementary technologies. Recent research in multimodal interfaces has proceeded from the recognition that no single technique is always the best for even a single user, much less for all users, all the time, and that a combination of techniques can be
more effective than any single one. Learning how to optimize the interface mode for any given situation requires experimentation, as well as building on social science research in areas such as human factors and organizational behavior.
Recognizing that the ideal for presentation of information to the user is in a form and context that is understandable, workshop participants noted that in some applications a visual presentation is called for. Given adequate performance, an immersive virtual reality environment could benefit applications such as crisis management training, telemedicine, and manufacturing design. In crisis management training especially, a realistic recreation of operational conditions (such as the appearance of damaged structures, the noise and smoke of fires and storms, the sound of explosions) can help reproduce—and therefore train for—the stress-inducing sensations that prevail in the field. Because response to a crisis is inherently a collaborative activity, simulations should synthesize a single, consistent, evolving situation that can be observed from many distinct points of view by the team members.13
Don Eddington identified a common perception of the crisis situation as a feature that is essential to effective collaboration. A depiction of the geographic neighborhood of a crisis can provide an organizing frame of reference. Photographs and locations of important or damaged facilities, visual renderings of simulation results, logs of team activity, locations of other team members, notes—all can attach to points on a map. Given adequate bandwidth and computing capacity, another way to provide this common perception might be through synthetic virtual environments, displaying a visualization of the situation that could be shared among many crisis managers. (The Crisis 2005 scenario presented in Box 1.3 suggests a long-range goal for implementing this concept such that a crisis manager could be projected into a virtual world optimized to represent the problem at hand in a way that enhances the user's intuition.) Research challenges underlying such visualizations include ways to integrate and display information from diverse sources, including real observations (e.g., from field reports or sensors) and simulations. The variation in performance among both equipment and skills of different users may prevent displaying precisely the same information to all users; presumably, some minimal common elements are necessary to enable collaboration. Determining precisely what information and display features should be common to all collaborators is an example of the need for technology design to be complemented with multidisciplinary research in areas such as cognition and organizational behavior.
Collaboration and Virtual Organizations
Because people work in groups, collaboration support that helps them communicate and share information and resources can be of great benefit. Crisis management has a particularly challenging need: an instant bureaucracy to respond effectively to a crisis. In a crisis, there is little prior knowledge of who will
be involved or what resources will be available; nevertheless, a way must be found to enable them to work together to get their jobs done. This implies assembling resources and groups of people into organized systems that no one could know ahead of time would have to work together. Multiple existing bureaucracies, infrastructures, and individuals must be assembled and formed into an effective virtual organization. The instant bureaucracy of a crisis response organization is an even more unpredictable, horizontal, and heterogeneous structure than is implied by traditional command and control models of military organizations in warfare—themselves a complex collaboration challenge. Crisis management collaboration must accommodate this sort of team building rapidly; thus, it provides requirements for developing and opportunities for testing collaboration technologies that are rapidly configurable and support complex interactions.
One relatively near-term opportunity is to develop and use the concept of anchor desks (discussed above, in the section ''Distributed Computing"). The concept has been tested in technology demonstrations such as JWID (see Chapter 1); field deployment in civilian crises could be used to stress the underlying concepts and identify research needs. Anchor desks can provide a resource for efficient, collaborative use of information, particularly where multiple organizations must be coordinated. They represent a hybrid between decentralized and centralized information management. Each anchor desk could support a particular functional need, such as logistics or weather forecasting. A crisis management anchor desk would presumably be located outside the crisis zone, for readier access to worldwide information sources and expertise; however, it would require sufficient communication with people working at the scene of the crisis to be useful to them, as well as the ability to deliver information in scalable forms appropriate to the recipient's available storage and display capabilities (e.g., a geographic information system data file representing the disaster scene for one, a static map image for another, a text file for a third).
An anchor desk could not only integrate data from multiple sources, but also link it with planning aides, such as optimized allocation of beds and medicines and prediction of optimal evacuation routes implemented as electronic overlays on geographic information systems, with tools involving a range of artificial intelligence, information retrieval, integration, and simulation technologies. An anchor desk could also house a concentration of information analysts and subject matter experts (e.g., chemists, as envisioned in the Crisis 2005 scenario); computing resources for modeling, simulation, data fusion, and decision support; information repositories; and others.
Anchor desks could provide services to support cross-organizational collaboration, such as tools for rapidly translating data files, images, and perhaps even human languages into forms usable by different groups of people. Furthermore, the anchor desk might not be physically at one place; a logically combined, but physically separated, collection of networked resources could perform the
same function, opening the possibility for multiple ways of incorporating the capability into the architecture of the crisis response organization. The set of technologies implied by this sort of anchor desk could serve to push research not only in each technology, but also in tools and architectures for integrating these capabilities, such as whiteboards and video-conferencing systems that scale for different users' capacities and can correctly integrate multiple security levels in one system.
Nevertheless, information must be integrated not only at remote locations such as command centers and anchor desks, but also at field sites. David Kehrlein, of the Office of Emergency Services, State of California, noted, "Solutions require development of on-site information systems and an integration of those with the central systems. If you don't have on-site intelligence, you don't know a lot."
The most powerful component of any system for making decisions in a crisis is a person with knowledge and training. However, crisis decision making is marked by underuse of information and overreliance on personal expertise in an environment that is turbulent and rich in information flows. The expert, under conditions of information overload, acts as if he or she has no information at all. Providing access to information is not enough. The ability to evaluate, filter, and integrate information is the key to its being used.
Filtering and integrating could be done separately for each person on that person's individual workstation. However, a more useful approach for any collaborative activity would be to integrate and allocate information within groups of users. (In fact, information filtering at the boundary of a linked group of users could be one of the most important services performed by the virtual subnets discussed above in the section "Networking"; filters could help individuals and groups avoid information-poor decision making in an information-rich environment.) Information integration techniques such as those discussed in the section "Information Management" are generally presented in terms of finding the best information from diverse sources to meet the user's needs. The flip side of this coin is the advantage of being able to cull the second-best and third-best information, reducing the unmanageable flood.
A set of special needs of crisis management, which may have significant utility in other application areas as well, can be captured in the concept of judgment support. A crisis manager often makes intuitive judgments in real time that correspond to previously undefined problems without complete contingency plans. This should be contrasted with traditional notions of decision support, which are associated with a more methodical, rule-based approach to previously defined and studied problems. Judgment support for crisis management could
rely on rule-based expert systems to some extent, but the previously defined problems used to train these systems will necessarily be somewhat different from any given crisis. Workshop participants suggested a need for automated support comparing current situations with known past cases. To achieve this automation, however, much better techniques are required for abstractly representing problems, possible solutions, and the sensitivity of predicted outcomes to variations, gaps, and uncertain quality in available information.
The last point is particularly important for crises, because it is inevitable that some of the information the judgment maker relies upon will be of low quality. Two examples are the poor quality of maps that crisis management experts remarked on in the workshops and the rapid rate of change in some crises that continually renders knowledge about the situation obsolete. The technology for representing problem spaces and running computations on them must therefore be able to account for the degree of uncertainty about information. Moreover, data may not always vary in a statistically predictable way (e.g., Gaussian distribution). In some kinds of crises, data points may be skewed unpredictably by an active adversary (e.g., a terrorist or criminal), by someone attempting to hide negligence after an accident, or by unexpected failure modes in a sensor network.
Another reason the challenge of representing problems may be particularly difficult in crisis management is that the judgments needed are often multidimensional in ways that are inherently difficult to represent. James Beauchamp's call for tools to help optimize not only the operational and logistical dimensions of a foreign disaster relief operation, but also the political consequences of various courses of action, illustrates the complexity of the problem. Even presenting the variables in a way that represents and could allow balancing among all dimensions of the problem is not possible with current techniques. By contrast, the multidimensional problem discussed in Chapter 1 (see the section "Manufacturing")—simulating and optimizing trade-offs among such facets as product performance parameters, material costs, manufacturability, and full product life-cycle costs—although extremely complex computationally, is perhaps more feasible to define in terms with which computer models can work.
If a problem can be represented adequately, a judgment support system should be able to assist the judgment maker by giving context and consequences from a multidimensional exploration of the undefined problem represented by the current crisis. This context construction requires automated detection and classification of issues and anomalies, identifying outlier data points (which could represent errors, but could also indicate emerging new developments), and recognizing relationships between the current situation and previously known cases that may have been missed by or unknown to the crisis manager.
Because judgments are ultimately made by people, not computers, technologies intended to support making judgments must be designed for ease of use and with an ability to understand and take into account the capabilities and needs of the user. To a great extent, of course, it is up to the user to ask for the information
he or she needs, but a model of what knowledge that individual already has could be used to alter the system's information integration and presentation approaches dynamically. Another special application for crisis management is monitoring the decision maker, because of the stress and fatigue factors that come into play. Performance monitors could detect when the user's performance is slipping, by detecting slowed reaction time and onset of errors. This information could guide a dynamic alteration in the degree of information filtering, along with variations in the user interface (such as simpler menu options). These capabilities could be of more general value. For example, they could assist in assessing the effectiveness of multimedia training and education tools in schools and continuing-education applications.
Of course, to be useful, a monitoring capability would have to be integrated properly with the way users actually use systems. For example, users will ignore a system that instructs them to get some rest when rest is not an option. Instead, it might be valuable for a system to switch to a standard operating procedures-oriented, step-by-step interface when the user shows signs of tiring. Human factors research provides useful insights, including some that are of generic usefulness. However, needs will always vary with the context of specific applications, implying the strong necessity for researchers and application users to interact during testing and deployment of systems and design of new research programs (Drabek, 1991).
Partridge, Craig, and Frank Kastenholz, "Technical Criteria for Choosing IP the Next Generation (IPng)," Internet Request for Comments 1726, December 1994. Available on line from http://www.cis.ohio-state.edu/hypertext/information/rfc.html.
A description of the proposed demonstration is available on line at the JWID '96 home page, http://www.spawar.navy.mil.
In addition, the coarser-grained simulation can be used to provide dynamically consistent
boundary conditions around the areas examined in finer detail. The model, called the Advanced Regional Prediction System, is written in Fortran and designed for scalability. See Droegemeier (1993) and Xue et al. (1996). See also "The Advanced Regional Prediction System," available on line at http://wwwcaps.uoknor.edu/ARPS.
A CAPS technical paper explains that "although no meteorological prediction or simulation codes we know of today were designed with massive parallelism in mind, we believe it is now possible to construct models that take full advantage of such architecture." See "The Advanced Regional Prediction System: Model Design Philosophy and Rationale," available on line at http://wwwcaps.uoknor.edu/ARPS.
Details about I-WAY are available on line at http://www.iway.org.
One key data fusion challenge involves data alignment and registration, where data from different sources are aligned to different norms.