3
Self-configuration and Adaptive Coordination

Many of the anticipated applications of networked systems of embedded computers (EmNets) will be realized only if the systems are capable of configuring and reconfiguring themselves automatically. This chapter focuses on mechanisms needed to achieve automatic reconfiguration. In many EmNets, individual nodes will need to assemble themselves into a networked system, find available resources on the network, and respond to changes in their desired functionality and in the operating environment with little human intervention or guidance.1

A set of basic underlying mechanisms will be required to ensure that EmNets are self-configuring and adaptive. For example, components will need to be able to discover other resources on the network and communicate with them. Systems will need to be able to sense changing environmental conditions or changing system capabilities and respond appropriately so that the entire system, as well as individual components, can operate as effectively and efficiently as possible. Both software and hardware adaptability will be important; EmNets will consist not only of elements that can change their software but also of those that take advantage of reconfigurable computing technologies to adapt limited hardware to

1  

This requirement is central to DARPA’s self-healing minefield program, for example. For more information on this program, see <http://www.darpa.mil/ato/programs/apla/contractors.html>.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers 3 Self-configuration and Adaptive Coordination Many of the anticipated applications of networked systems of embedded computers (EmNets) will be realized only if the systems are capable of configuring and reconfiguring themselves automatically. This chapter focuses on mechanisms needed to achieve automatic reconfiguration. In many EmNets, individual nodes will need to assemble themselves into a networked system, find available resources on the network, and respond to changes in their desired functionality and in the operating environment with little human intervention or guidance.1 A set of basic underlying mechanisms will be required to ensure that EmNets are self-configuring and adaptive. For example, components will need to be able to discover other resources on the network and communicate with them. Systems will need to be able to sense changing environmental conditions or changing system capabilities and respond appropriately so that the entire system, as well as individual components, can operate as effectively and efficiently as possible. Both software and hardware adaptability will be important; EmNets will consist not only of elements that can change their software but also of those that take advantage of reconfigurable computing technologies to adapt limited hardware to 1   This requirement is central to DARPA’s self-healing minefield program, for example. For more information on this program, see <http://www.darpa.mil/ato/programs/apla/contractors.html>.

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers the operating environment. Many EmNets will contain components that are constrained in terms of their physical size, amount of memory available, and/or availability of local energy sources. For these system components, both the need for efficiency and the constraints on how it is achieved will be more severe than is the case for more traditional distributed computing systems. Efficient system designs will exploit higher-capacity and resource-rich components where they exist in the overall system and will exploit the redundancy provided by deploying large numbers of inexpensive components. Many current efforts do not focus on systems that operate under these kinds of constraints. Work on the design of personal digital assistants (PDAs) and cell phones, for example, does not need to take into account very large numbers of interacting elements, distributed control, severe energy constraints, or the kinds of physical coupling that many EmNets must accommodate. Approaches taken in the design of smart spaces for homes or office environments are relevant, but such systems generally have more infrastructure to support them than many of the EmNets discussed here. This chapter examines approaches to providing the mechanisms needed to support self-configuration and adaptive coordination of EmNets. The first section defines these key concepts. The second discusses the elements of self-configuration and adaptive coordination in existing distributed systems, serving as a primer on the state of the art. The final section of this chapter outlines the research needed to realize the vision for robust, scalable EmNets. TERMINOLOGY Self-configuration (sometimes referred to as reconfiguration) and adaptive coordination (sometimes referred to as adaptation) refer to the spectrum of changes that a system makes to itself in response to occurrences in its environment and internally. Neither of these terms is meant to convey infinite flexibility. The changes that self-configuration and adaptive coordination induce in a system should always be within the constraints of the system’s planned functionality (admittedly, one such change might be to modify the functionality of the system). For the purposes of this report, the terms self-configuration and adaptive coordination differ with respect to the frequency and degree of change they induce in or respond to from the EmNet. Making a sharp distinction between the two is not as important as recognizing that some techniques are more relevant to one than to the other. In the rest of this chapter the terms are distinguished in order to highlight the techniques that are more appropriate for each. Self-configuration involves the addition, removal, or modification of elements contained in an EmNet, along with the resulting process of es-

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers tablishing interoperability among the components and locating essential services (such as data aggregation nodes in sensor networks). Put another way, self-configuration is the process of interconnecting available elements into an ensemble that will perform the required functions at the desired performance level. As such, self-configuration changes the composition of an EmNet and may alter the distribution of functionality across the components that make up the system or may even alter the system’s overall function based on which components are available. Adaptive coordination involves changes in the behavior of a system as it responds to changes in the environment or system resources. For example, to achieve a long lifetime, a system may need mechanisms by which nodes can mediate their actions based on the density of redundant components. Nodes with redundant capabilities might be programmed to alternate responsibility for a given task in the style of sentry duty rotation. Similarly, EmNets could implement multiple levels of service, depending on locally perceived conditions or detected events. Thus, adaptive coordination refers to changes in operational parameters that are made because of variations in available resources or load. Included in these resources are available energy, computational resources, and communication bandwidth. In general, adaptive coordination induces less dramatic changes in system architecture than does self-configuration and does not alter the system’s function. The two processes often occur on different time scales. Adaptive coordination tends to take place more quickly than does self-configuration, with a very short lag time between the moment a change is detected in the operating environment and the time the system adapts its behavior. Another dimension to bear in mind is the level at which the configuration or adaptive coordination occurs. This level can range from reconfigurable hardware to operating systems and run-time environments all the way to application-specific code. Levels vary in the extent of the effect of the reconfiguration and/or adaptive coordination as well as in the amount of code that needs to be stored or retrieved to make the change. A crucial facility that must accompany EmNets’ ability to adaptively reconfigure themselves is the facility for self-monitoring. Despite some of the most rigorous testing in existence, many of today’s highly complex systems are prone to failure when reconfigured. Telephone switching systems, for example, have suffered severe outages when new software is brought online. Yet this report suggests that EmNets must be able to change along many distinct axes, perhaps without an expert present. New system testing and software update technology will have to be developed. Meeting this challenge has proven to be very difficult, even in more conventional systems; EmNets intensify this need. They will have to be able to convey their current operational state to their

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers users. As argued elsewhere in this study, establishing that state requires far more than just tallying hardware resources. An EmNet will require a way to monitor how well it is performing and to compare this result against its goals; it will also require a means for reporting such information to users.2 The nature of the configuration or adaptive coordination depends heavily on the type of application the EmNet supports. In automobiles, for example, the focus of self-configuration would probably be on accommodating the heterogeneity of system components introduced to, and removed from, the system continuously as, for example, the people, conditions, equipment, and procedures vary. Unlike more standard computer networks, such embedded monitoring networks must be built assuming that there is no professional system administration, such that the configuration is highly (if not completely) automatic. Further complicating such networks are two typical requirements (as, for example, would be needed for automobile control): that the overall network be capable of making certain service guarantees and that some operations (such as notifications of life- or safety-threatening events) take precedence over other forms of network traffic. In sensor networks that might be used for precision agriculture or environmental monitoring, system composition will vary less because the application is more constrained, while more attention must be paid to adapting the nodes’ operational parameters to unpredictable and varying environmental conditions. This is particularly challenging and critical in energy-constrained devices that must minimize their expenditure of communications resources on overhead functions and in which opportunistic listening can be relatively expensive because of the dependence on power-consuming communication resources (for example, a radio or other wireless communications device). Extensive capabilities that incorporate both adaptive coordination and reconfiguration will be required in systems such as those used on a battlefield, where changes in both the environment and system makeup can occur rapidly yet certain service guarantees are absolutely required. SELF-CONFIGURATION AND ADAPTIVE COORDINATION IN DISTRIBUTED SYSTEMS This section discusses the elements of self-configuration and adaptive coordination in existing distributed systems. These elements include the 2   A long-term trend of diminishing margins against the goal could alert the users to the system’s need for attention, for example.

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers notion of service discovery, as well as the critical issues of interfaces and interoperability. The discussion is primarily applicable to self-configuration; however, it is likely that adaptive coordination will require similar elements (e.g., mobile code). This background is useful in preparing to analyze the issues posed by EmNets. How EmNets differ from other types of distributed systems will become clearer as the analysis proceeds; later in this chapter, research challenges in these areas are examined. In general, EmNets present more extreme versions of the problems encountered in distributed systems, but they also pose a few unique problems of their own, such as low power requirements. Discovery in Distributed Systems Automatic self-configuration requires the ability to interoperate with new and old system components without human intervention. System components must be able to automatically discover each other and the services they represent. Building on the interface concepts of network configuration, wire protocols, and code mobility, this subsection discusses the issues involved in device and service discovery and how they relate to self-configuration. How entities on an existing network communicate is generally viewed as the interoperability problem. How those entities find each other, advertise their own services, or join the network is generally taken to be a separate problem, referred to as the discovery problem. Generally, the discovery problem can be divided into four parts: How does a network entity join the physical network; that is, how is it authorized and given a network address and a network identity? Once an entity is on the network and wishes to provide a service to other entities on the network, how does it indicate that willingness? If an entity is looking for a service on the network, how does it go about finding that service? How does geographic location affect the services an entity can discover or select for use? Joining the Network In traditional computing networks, the task of joining a system to a network has been done by hand: A system administrator configures the system with a particular network identity and then updates the appropriate routing and/or naming tables with the information needed to find the new member of the network. As networks have been scaled up, techniques have been introduced that allow the partitioning of the large network into smaller subnets and the propagation of (manually entered) bootstrapping information from the subnets to the larger networks. How-

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers ever, the advent of larger networks and networks that have little or no professional administration (such as those in the home or in networks of embedded systems) has led to an interest in automating this bootstrapping mechanism. Mechanisms that automate the joining to a network have been around for some time. The Apollo Domain system, for example, allowed a node (workstation or server) to be connected to the network by finding a location broker with which the new node registered. Then, having completed this registration, the new node could be found by any other node in the network. The Appletalk protocol enabled not only computers but also peripheral devices, such as printers, to join the network and be found automatically by other entities in the network. However, these mechanisms have been confined to particular (proprietary) networks and have not been generally adopted, especially in networks of smaller, embedded systems. One reason is that such mechanisms are based on resource-rich environments as opposed to the resource- and energy-constrained environments that many embedded systems and most EmNets must contend with. The actual mechanism most generally used for such bootstrapping tends to be conditioned (if not fully determined) by the physical network to which the device is attached. In an Ethernet Transmission Control Protocol (TCP)/Internet Protocol (IP) environment, for example, the Dynamic Host Configuration Protocol (DHCP) is commonly used to hand out addresses to entities that are connected to the network. A part of the Universal Plug and Play (UP&P) specification is a mechanism allowing devices to self-assign a network address to themselves on networks where DHCP is not present. For IEEE 1394 (otherwise known as Firewire), however, a very different mechanism is needed because the network itself will produce the equivalent of a bus interrupt when a new device is plugged in, thus informing every other device of the presence of a new entity. Networks designed for cell phone use have yet another way of allowing the phone to be recognized in the cell. The roaming function allows a phone to register its new location with a central database that then tells the phone’s home location how to reroute calls. The range of services achievable by automatic discovery and joining mechanisms is in part determined by whether nodes have unique identifiers or whether at boot time they are literally identical. Joining the network entails locating essential services as well as obtaining network-level address and routing information. Existing mechanisms make use of multicast3 and well-known service-location addresses to bootstrap this process. 3   Multicast describes communication on a network between a single sender and multiple targeted receivers.

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers Advertising and Finding Services The problem of advertising a service once a physical connection to the network has been established has been approached in a number of different ways. Perhaps the most common approach in general computing systems has been naming and directory services, in which the service that wishes to advertise itself creates an entry in a naming service or a directory service that allows others who know the name or description of the service (or at least the type of service) to get a reference to the new offering. Such mechanisms generally assume that there is a human being somewhere in the loop, because both naming systems and directory servers are string based, with the meaning of the string left to the user. When programs look for services, they need to know the name or description under which the service is registered. Some directory services have evolved rather complex ontologies in the form of description schemas to allow such programmatic access to services. A different approach has been taken by service traders and the Jini system (Arnold and Waldo, 2000), in which services are identified by the interfaces they support. In a traditional trader system (such as those found in the Distributed Computing Environment (DCE)4 or the Common Object Request Broker Architecture (CORBA)5 trading service), a service registers itself by indicating what interfaces it supports; clients look up a service by asking for a reference to something that supports a particular interface. If more than one object has been registered that implements a given interface, then any of the objects can be returned by such a query. In the Jini lookup service, services register by their Java language type; they can be returned to any client asking for something that is at least an instance of the requested class (for example, the returned object might be a subclass of the requested class). The problem of how an entity finds the place to advertise its services is not always addressed by the systems described above; most naming or directory systems consider this problem to be part of the general bootstrapping mechanism and assume that it is dealt with in some fashion outside their scope. The Service Location Protocol (SLP) is a mechanism that enables either clients or services to find a service directory. Essentially, the entity interested in finding a service directory (either to register a service or find one that has been registered) issues a multicast request 4   DCE is an industry-standard software technology for setting up and managing computing and data exchange in a system of distributed computers. 5   CORBA is an architecture and specification for creating, distributing, and managing distributed program objects in a network.

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers that will return the address of a service-finding service. This service supports a well-known interface that allows querying for a service directory, which is much like a standard directory service in which services can be registered under a description or found if they match a description. The Jini system is similar to SLP in that it begins (on TCP/IP networks) with a multicast request to the local network neighborhood. Rather than returning a directory of service locators, however, the Jini multicast request returns a reference that implements the interface to a Jini lookup service (including the stub code, or driver, allowing communication with the service) that can be used by the service provider (or client) to access that lookup service directly. Universal Plug and Play (UP&P) also makes use of a multicast request, but in UP&P what is multicast is a description (in the form of a Universal Resource Locator (URL) indicating where the description can be found) of the device that is joining the network. All entities that might want to use such a device must watch for such a multicast, and based on the description they will determine if they have the code needed to communicate with that device. There is no central repository of services in the UP&P mechanism. Bluetooth’s service discovery protocol (SDP) is specifically for Bluetooth communications and focuses on discovering services available from or through Bluetooth devices and can coexist with other service discovery protocols. Not all basic networking systems support multicast, so any extension of these types of service-finding protocols to such networks would require that some other bootstrapping mechanism be used to find the initial repository of descriptions or objects. This mechanism could be as simple as a conventionally agreed-upon URL that would be used to identify such a repository or a well-known name of some other form. Such approaches would need to find a way of preventing the entity with the conventional name from becoming a single point of failure (or they would need to determine that such a single point of failure was acceptable in the particular application). Other networks might allow entirely different approaches. An example of this is IEEE 1394 (Firewire), in which, as mentioned previously, attaching a device to the network generates a wire-level interrupt to all other devices attached to the network. On such a network, the service repository could simply notice when a new device was attached to the wire and send to that device the information needed to connect to the service repository. Location For systems deployed in the physical infrastructure, a service’s location (either absolute or relative to another entity) may determine how it is

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers used or even selected. The mapping between physical location and network connectivity is important. (See Chapter 2 for a discussion of the technologies that enable the determination of geographic location.) In wired or hybrid networks, two devices that are physically close may be, in fact, quite distant in terms of network communication. For example, a desktop personal computer (PC) and a cell phone may both be network-enabled, but for them to communicate, packets must travel through many network segments, including the building’s network, the link between the building and local backbone, the connection between the backbone and the cellular phone company, another hop to the appropriate base station, and finally, from the base station to the phone itself. Thus, when a device needs to determine, for example, the closest printer, network proximity is not at all likely to be an accurate measure. Geographic location is intimately connected to discovery. If each device knows its own geolocation and can provide that information to the discovery servers, then it may be possible to answer the question about “closeness” during the discovery phase. Access to services may also be based on location. If one assumes physical security measures permit a user to enter a particular space, then services locally available in that space can be put at that user’s disposal without requiring further authentication. Without location information, users would have to obtain access to the local networks, with accompanying security risks. Thus, location can be quite useful in optimizing service discovery as well as in connecting the physical and virtual worlds so that security measures in one can be applied in the other. In other types of EmNets, particularly resource-constrained, wireless networks, network organization needs to correspond more closely with geography in order to be efficient in its use of scarce energy resources (since communication over longer distances consumes significantly more energy). In these systems, geolocation may serve as a building block for organization of the network itself—for example, through the use of geographic routing (Karp and Kung, 2000). Interfaces and Interoperability Both self-configuration and adaptive coordination require interfaces, or standardized ways of communicating between components. An interface is simply a convention that is agreed to outside the scope of the communication of interest but that permits the communication to occur. These interoperability agreements can exist at every level of system abstraction, including electrical, signaling, transport, network, and application levels. Moreover, these agreements extend to code mobility and application adaptation. When EmNets communicate, they must assemble

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers a collection of information that will be interpretable by the receiver. This information may include not only data but also code that the receiver can execute to interpret the data, process it in some way, or forward it to other entities. The format of the information must comply with the interface on which both entities agree in advance. At the lowest level, interoperability requires the assembling of information (data and code) into a sequence of bits that will be properly interpreted by receivers on the network. At higher levels, this means supporting an abstract machine for which the sender can include instructions within the information it sends. If there is agreement with the receiver on the execution semantics of these instructions, this serves as a powerful model for extending the functions that each device is capable of performing. That is, it becomes possible to move code from one entity to another so that functionality can be modified and extended in ways not predicted by those who originally deployed the device. Other levels of interoperability include transport protocols (e.g., TCP/IP) that permit a sequence of network packets to be generated and reassembled at the other end, as well as remote procedure calls (RPC) and remote method invocations (RMI) that permit one entity to execute an operation on another by sending parameter data and having the result returned. How interoperability is to be achieved is often one of the major design decisions that needs to be made for networked systems.6 In traditional distributed systems, methods such as DCE, RPC, and CORBA are implemented to pass a method or procedure identifier to the receiver to indicate the code that is to be invoked on the data by the receiver. Parameters are linearized and included in the RPC packet. More specialized systems can make either or both of these classes of information (procedure identifier and input parameter data) implicit. In a simple system in which data are sent from embedded sensors to a central processing node, only the data need be transmitted, because the operation to be performed on the data is known by the receiving node. In some publish/subscribe systems, even the data that triggered the notification of an event need not be explicitly passed, because the notification itself is enough to indicate the data that triggered the notification. In a more complex, ad hoc sensor 6   This discussion describes interoperability from the perspective of systems that use a call-return or remote-procedure-call model of communication. Networks can also be set up to communicate through message passing by using events in a publish/subscribe fashion or by using various forms of shared memory with adaptive coordination technologies. At some level, however, all of these communication approaches are equivalent with respect to the problems discussed. Although the exact details of the problems may vary from one approach to another, the basic outlines of the problems and the solutions are similar in all of these approaches.

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers network, intermediate nodes between the originator and its final destination may aggregate the data. Thus, the interpretation of the data may change as it travels from node to node. Each node may want to indicate to the next how to properly interpret and process each data item. The remainder of this section discusses address configuration, wire protocols, and code mobility as illustrative examples of key interface and interoperability concepts. Address Configuration One of the most familiar types of self-configuration is the process by which new devices are added to local area networks. The Dynamic Host Configuration Protocol (DHCP) performs this function on IP networks. A device new to the network must obtain a new IP address in order to have packets routed to it appropriately. A DHCP server for a network allocates a set of IP addresses to acceptable visitors for a limited period of time. DHCP servers do not have to be present on every subnetwork but must be reachable through the standard routing mechanisms. A device finds a DHCP server using a discovery message that is propagated by the network routers to a nearby DHCP server. The server then responds with the IP address for the device to use. This address may be dynamically allocated or determined based on the physical address of the device’s network interface card (providing a mechanism for mobile devices to store and later retrieve their network parameters). Devices can easily determine if they need to obtain an address using DHCP if their request packets are not acknowledged. This is an indication that the IP address being used is no longer compatible with the network at which the device is now located. The DHCP packet format provides a standard interface for devices to use in connecting in a new network environment, thus ensuring interoperability at the level of IP packets. The servers’ functions provide a higher-level interface that provides addresses only to authorized visitors and only for limited periods of time. Wire Protocols The most common way of ensuring interoperability is to define a standard protocol that all entities on the network will use to identify operations and convert to and from their own internal data representations to the data representation used on the wire. Each entity on the network contains some code that performs this conversion. In a standard RPC system, the code used by a client for this purpose is called the stub code and the corresponding code on the server side is called the skeleton

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers BOX 3.2 Cooperative Behavior and Control A possible approach to distributed control is directed diffusion. Directed diffusion amounts to controlling a system by means of activation and inhibition messages, the sum of which can either reinforce or discourage a course of action.1 As an example, consider a sensor network in which multiple nodes have access to the outside world through a specialized node with long-range communications capabilities and that communicates to the rest of the nodes by passing messages from one node to another (that is, via multihop connections). If several nodes observe an event, then directed diffusion can help determine which nodes should be involved in deciding whether to report the event, which one should do the processing, and what information should flow to the long-range link given a desire to minimize energy expenditures. If latency (delay) in making a decision is not an issue and the probability of a node accurately detecting an event is related to the strength of the signal it receives relative to background noise (the signal to noise ratio, or SNR), then the nodes can wait a period of time based on the SNR before alerting or inhibiting neighbors. The node that receives the signal with the highest SNR will send its alert first, communicating a message to the long-range link and sending short inhibition signals to its neighbors. The other nodes then avoid transmitting their decisions or activating one another to engage in cooperative detection. If the signal at the node with the highest SNR is still below the threshold for reporting the event, the node could instead activate its neighbors, asking for their decisions and the relative certainty of those decisions. These activation messages will propagate outward with reduced intensity (that is, they will require progressively higher certainties to respond), and nodes with higher SNRs will reply sooner. When enough replies have been combined to allow the original node to make a decision with the desired of how groups of machines with different capabilities could be organized to perform a set of activities that are presented to the rest of the system as a single unit. Similar hierarchical organizations have been used in more traditional systems, but they are not based on the capabilities of the individual components in the manner described above. How to adapt the overall system configuration (or subsystem configuration) to maximize the information obtained while minimizing the use of scarce resources is a promising area for future research. Some systems may benefit from decentralized control schemes, which also require further research and analysis. The minimum number of bits that must be communicated to make a reliable decision is unknown for all but the simplest of problems involving more than one sensor node. Given the high power cost of communications, it would be useful to know what the threshold is and thus to learn whether particular algorithms are any-

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers level of certainty, that node can issue inhibition signals to its neighbors while propagating its decision to the long-range link. This procedure progresses through several distinct phases of operation: detection of a stimulus, formation of subnetworks of communicating nodes, gathering and processing of information, destruction of subnetworks, and long-range communication of results. To minimize energy expenditures, it avoids using complicated set-up signals to establish subnetworks, instead employing the natural decay of communications signals with distance to establish a perimeter. Although perhaps failing to pick the optimal fusion center or routing of information, this approach can dramatically reduce the overall amount of sensor information transmitted within the system and help conserve energy. Varied behavior can be obtained with a few control signals (with feedback), with no need to designate a central controller before the procedure starts. Of course, the long-range link could also serve as a master node, commanding different thresholds to become active or inhibiting their behavior. In this way, behaviors can be adapted over time to meet changing global objectives. Human operators could perform this adaptive coordination, but as understanding of the system grows, networks could be designed with increased autonomy. NOTE: Some work in this area has been done by the chair of this study committee (Intanagonwiwat et al., 2000). 1   This approach is similar to that used by ants for a variety of highly complicated functions, such as establishing trails to food and removing them when the food supply dwindles. Successive use of a trail reinforces it, but small random deviations that provide a more direct route to a food can alter (e.g., straighten) the trail and lead to increased energy efficiency. Other signals can terminate an activity and focus attention on other tasks. where near optimal. (For a discussion of local computation vs. communication as related to EmNets, see Box 3.4.) If the processing problem is cast as a rate-distortion problem, in which (weighted) false alarm and missed detection probabilities constitute the distortion and the communications energy takes the role of rate, then additional questions can be explored. For instance, What is the effect of array density on the rate-distortion region for a given communications and signal propagation law and set of source statistics? This is a deep and difficult problem (for example, under what conditions is there a convex rate-distortion region?), but its solution could have a large payoff. Preliminary progress has been achieved with simple versions of this problem, but a huge problem space remains to be explored. The interaction between a system element and its neighboring elements is not typically considered in control theory but is essential to

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers BOX 3.3 Control Theory EmNets bring together two established research communities—distributed systems and control. Control is a rich research area that studies how to use feedback to optimize the behavior of electromechanical systems. Control has its roots in simple servo control systems but is now used in the design and operation of a wide class of electronic and electromechanical systems. Often these systems have hundreds of processors and components from multiple vendors. Some of these systems run chemical plants, manufacturing plants, and even buildings. By bringing together these two areas, EmNets create a number of new research areas. Control theory is used to solve a number of difficult problems. For example in flight control systems, the dynamics of the plane are carefully studied, creating an optimal controller for this system. Often this controller is combined with a number of estimators that produce an estimate of what the measured parameters should be. The estimator can be used to provide input from sensors that might not be read each cycle (for example, the computation might require 25 data points while only 10 are being collected at any given time) or check that the current model of the system represents the actual system. In some highly critical situations, banks of estimators can be used to model how the system would behave under various fault conditions. During normal operation, these estimators will poorly match the system, but under a fault condition one of these estimators might become a better match than the original system. Thus, when a fault does occur (such as the loss of an engine in an aircraft), that fault’s estimator has current information and can be used to update the control equations for the plan, to allow it to continue to function at some reduced performance until the error is repaired. Rather than using a fixed system model, model predictive control adapts the system model and the control formulation. It solves an optimal control problem at each step, using current sensor data and measured system performance. This type of control was initially used in large-scale refineries, where cycle times are very long (tens of minutes), providing sufficient control for the required computation. modeling EmNets. The interaction between a node and its immediate neighbors is critical, because physical stimuli are often correlated to physical space and because the communications costs and latencies to near neighbors are likely to be less than average. Centralized control rules can be devised for such a group, but the complexity of the decision-making process, even for a relatively small collection of nodes, will demand some decentralization and probably hierarchy as well. Layered control hierarchies are notoriously difficult to optimize, but perhaps by scaling to large numbers designers can reduce the demand for efficient use of the individual components. In any scheme, the fundamental issue of stability arises. Once the design moves away from centralized control, the theory

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers Both types of system rely on getting sensor measurements at fixed time increments. While networks are often used in control systems, their properties are not considered in the problem formulation. For high-performance control loops, sensors are given logically separate networks (or even physically separate wire) to collect the data, making variable packet delay and possible data loss nonissues. In addition, in almost all cases the control algorithm is centralized and not run in a distributed fashion. The long cycle time of many process control systems makes the issue of networks in these systems uninteresting, and in any case existing technology meets the requirements of these systems. While robust operation is critically important, with commands being issued to individual pumps, valves, heaters, and the like (in a factory setting), the long cycles provide time to consider and reject outlying data and every actuator is likely to have a secondary sensor for redundancy and prediction checking. While the notion of fixed time samples is fundamental to most control theory, there are some methods that might migrate to network-based systems more easily. One possibility is to use Lyapunov methods, where the idea is for each unit to greedily minimize a value function that serves as a coordinator. This transposes to asynchronous systems very nicely. In general, the actions of each unit would have to be coordinated carefully (simple examples show that activating two feedback systems simultaneously can lead to disastrous loss of performance or instability), but if there is a value function that each is separately minimizing, the actions are automatically coordinated. To the standard control issues EmNets add the issues of resource constraints, distributed systems, and networks. In control environments, networks are assumed to be stable, not to lose information, and not to have delays. All of these are likely to be violated at some point for EmNets posing new research challenges. NOTE: The committee thanks Stephen Boyd of Stanford University for his guidance in developing this description. for characterizing the system and guaranteeing stability is not well developed. Note that actuation, signal processing, and communications (or more likely, a combination of these) all raise fundamental questions of resource allocation in response to a physical stimulus. Accordingly, a solution in any one of these domains may well have applications to all the rest. The problem of cooperation thus appears to offer an excellent opportunity for multidisciplinary research; there are probably lessons to be learned from diverse disciplines, with a potentially high payoff. (An example of an area in which multidisciplinary approaches are used is distributed robotics, described in Box 3.5.)

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers BOX 3.4 Local Computation Versus Communication One of the design choices that must be made in EmNets is the balance between local computation and the communication of data back to a more centralized processing node. In other words, to what extent should an individual node process the data it has collected or been sent when it also has the option of communicating raw, unprocessed data to another node for processing? This issue is particularly important in EmNets that operate with limited stores of energy and must therefore minimize energy consumption. It is extremely important in systems that rely on wireless communications to transport data because of the energy requirements of wireless systems. Many sensor networks will be in this category, as will mobile elements of other EmNets, such as smart spaces. The high energy consumption of wireless communications systems leads to unique conclusions about the distribution of tasks in the distributed embedded system network. For example, in a typical wireless sensor network, the network’s task is to identify events that occur in the network environment and communicate these occurrences to a remote user. Conventionally, this would be done by transmitting received sensor information to a remote asset for processing. EmNets composed of many distributed devices become collectively more capable if significant computation is performed locally, with the goal of recognizing local events and communicating only event identification codes, as opposed to complete data sets. As an example of the trade-off between computation and communication in an EmNet, consider a wireless sensor system that is distributed over a large surface. Communication between devices occurs between nodes in a multihop architecture in which information is passed from the source node to the destination node by traveling through a number of intermediate, proximate nodes. Under these conditions, the power transmitted from any one node declines rapidly as the distance from the transmitting node increases.1,2 The severe decay of wireless communications has a profound influence on the balance between communication and computation resources. System designers must decide between communicating data directly for remote processing or performing local processing and communicating a shorter message or perhaps none at all to a remote node. The energy required to transmit even short messages could power significant amounts of computational processing locally. The large computation budget is available for potentially quite powerful information processing that could reduce the amount of information that needs to be communicated. Hence, considerable design and development effort will need to be directed to the deployment of EmNets that leverage powerful local computation and cooperative processing to identify local events and even command local action. Low-power wireless embedded systems will therefore create demands for a rich set of novel network and distributed computing solutions that have not been previously needed in conventional wireline systems. 1   See, for example, Parsons (1992) as a starting point into the total body of literature dealing with propagation in personal/mobile environments. 2   See also Sohrabi et al. (1998).

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers Collaborative Processing A sensor network is an example of an EmNet that illustrates the benefits of using system architectures and adaptive coordination to improve overall system performance in the face of stringent resource constraints. Sensor networks generally require constant vigilance by at least a subset of the sensors so that desired events can be reliably detected. At the same time, the system must avoid generating false alarms when a particular event has not occurred. Sensor networks can employ a power-conserving hierarchical detection scheme to meet these objectives. For example, individual sensors may use energy-efficient procedures for detecting acoustic, magnetic, infrared, or other forms of energy and then attempt to make a detection decision independently. If the sensor cannot reliably make a decision, it could employ some processing and sensing to seek information from nearby sensors. These processes involve larger expenditures of energy, especially if the sensor and its neighbors must communicate. Additional processing, using a large neural network or some other sophisticated procedure, could be used to provide greater assurance if necessary. In the worst case, raw data might be transmitted back to a remote site where human operators analyze the data and determine whether an event has been detected. This step consumes large amounts of energy and must be avoided, except when absolutely necessary. As this example illustrates, there are trade-offs to be made with regard to the extent of processing to be conducted by individual sensors and the amount of information communicated among them. In many applications, there will be no events to report much of the time and no need to apply the most expensive algorithm, which is transmitting data to human operators for analysis. But, there may be too many circumstances in which the least expensive detection algorithm will fail. A processing hierarchy can lead to huge reductions in energy consumption while assuring the required level of reliability. Processing hierarchies are intertwined with networking and data storage issues. How long and where data are stored (or queued) will differ at different levels in the hierarchy; the decision on whether to communicate with neighboring nodes—and which ones—will depend on the signal-processing task. The amount of energy consumed by communications and the degree to which energy is scarce will affect the processing strategy (that is, the willingness to communicate and whether processing is centralized or distributed). All of this, in turn, depends on the physical constraints that the system faces, allowing the physical layer to intrude. Given the amount of energy needed to communicate a short message, it often pays to process the data locally to reduce the volume of traffic and make use of multihop routing and advanced communications techniques,

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers BOX 3.5 Distributed Robotics Distributed robotics is the study of algorithms for the control and coordination of groups or teams of robots. A multirobot group is a superb example of a networked embedded system that embodies challenges in control, communication, and coordination as it faces uncertainty in sensing and action, unexpected failures, and a dynamic environment. The notion of a single, centralized controller coordinating a distributed robot group is considered untenable, as it is neither scalable nor robust. Thus, control must be distributed to the individual robots, which must communicate and adapt as necessary to produce globally efficient behavior of the system as a whole. Several key methodologies are relevant to multirobot control, as they are to individual robot control. Reactive control involves the lookup and execution of precompiled, stateless collections of rules, with no looking into the past or planning for the future. Deliberative control uses centralized world models and planning but scales poorly with the complexity of the control problem and the group size. Hybrid control attempts a compromise between reactive and deliberative approaches by employing both and compromising between them as necessary; this is a dominant paradigm in robotics. The other dominant paradigm is behavior-based control, which is of particular relevance in distributed robotics. Behavior-based controllers consist of collections of behaviors, time-extended processes or control laws that achieve and maintain goals. For example, “avoid obstacles” maintains the goal of preventing collisions, and “go home” achieves the goal of reaching some destination. Behaviors can be implemented in software or hardware and as processing elements or as procedures. Each behavior can take inputs from the robot’s sensors (for example, camera, ultrasound, infrared, tactile) such as coding, to reduce energy consumption. Collaborative processing can extend the effective range of sensors and enable new functions. For example, consider the problem of target location. With a dense array of networked sensors, one means for tracking the position of an object (for example, a target or a detected event) is for all nodes that detect a disturbance to make a report. The centroid of the reporting nodes is one possible estimate of the position of the target. This approach requires the exchange of very few bits of information per node. Much more precise position estimates can be achieved with a technique called beam forming, in which individual sensors exchange information about detected events and the time they were detected. Although this approach consumes more energy, it offers several benefits: higher quality data for subsequent classification decisions, long-range position location, and even some self-location and calibration possibilities for the

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers and/or from other behaviors in the system and send outputs to the robot’s effectors (for example, wheels, grippers, arm, speech) and/or to other behaviors. Thus, a behavior-based controller is a structured network of interacting behaviors. Behaviors themselves embed state and can form arbitrary representations when networked together. Thus, behavior-based systems are not limited in their expressive and learning capabilities, and they are well known for their real-time response and scalability. The metaphor of a robot being controlled by a collection of behaviors scales well to systems of robots being themselves behavior collections. Currently, behavior-based control appears to be the de facto standard for distributed multirobot control, owing to its robust and scalable properties. As EmNets evolve to include actuation and mobility, lessons can be learned from the area of distributed robotics. The significant open problems in distributed robot control include the synthesis and analysis of adaptive group behavior, group coordination, and communication strategies that facilitate dynamic, run-time, efficient resource allocation within a distributed system. Distributed robots need to be self-configuring and will usually be unattended. Latency is also an important concern for both types of systems. Both are likely to interact with humans at some points or on some level, and it may be the case that usability and interaction issues will overlap. However, the constraints on EmNets differ in some ways. Many EmNets will have severe power limitations, whereas many distributed robots may be large enough to incorporate more than adequate battery power. In addition, EmNets will probably consist of many more components and nodes than distributed robots would need to incorporate. NOTE: The committee thanks Maja Mataric and Gaurav Sukhatme of the University of Southern California for their guidance in developing this description. nodes.12 In some applications, sparse clusters of nodes that use beamforming techniques might be preferable to dense deployment of less-intelligent nodes, or it might be better to enable both sets of functions. For example, a dense network of less-intelligent sensors deployed in conjunction with a less-dense array of intelligent nodes could capture information on demand for beam forming. Such collaborative processing can be regarded as a further extension of the signal processing hierarchy to multiple nodes, with the collaboration being extremely expensive in terms of energy use but performed only rarely, such that its marginal energy cost may be acceptable. Key to any network collaboration is the idea of synchronization among 12   See, for example, Parsons (1992) as a starting point into the total body of literature dealing with propagation in personal/mobile environments.

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers elements of the network. Synchronization depends on both the accuracy of the local clocks and the ability of the network to coordinate local clock accuracy. Both long- and short-term clock drift are important for providing various levels of functionality. For spread-spectrum communication, high-accuracy clock synchronization with the received signal is necessary to decode the information sent. However, only relative synchronization is needed for node-to-node communication, because the propagation delay is not quantified at each node. In addition to enabling communication, coordinated synchronization is important as a means to enhance power savings, enable collaborative sensing, and allow multisensor self-location. Local power requirements on a remote EmNet must be reduced to the bare minimum needed to supply continuous sensing and a minimum level of event detection, while incorporating functionality to expend power as needed for communications or more intensive processing. This is appropriate for situations in which the frequency of events is expected to be high enough that every EmNet in a network needs to be ever vigilant. For longer-lifetime sensors in environments with a lower event probability, support communication and processing may be set up to operate intermittently. If the network is operating in a form of TDMA communication, then for low latency event reporting, each sensor must stay synchronized. In addition, to coordinate sensing times and enable coherent collaborative processing, each EmNet needs to be synchronized to a global time scale. Thus, clock drift on each sensor limits the length of noncommunication between sensors or the power savings achievable by powering down the radio. Additionally, if a sensor field is put in a somnolent state in which only selected sensors are powered down, total network power savings will be greater if the multiple sensors coordinate their sleep time (requiring synchronization) as opposed to randomly powering down to provide a reduced alert state overall. Collaborative sensing (by, for example, using beam-forming algorithms) benefits from synchronizing all the sensing inputs. The combining of results from multiple sensors at different locations to counter jamming, enhance resolution, or enable distributed sensing requires relative timing information. On the coarsest scale, timing is required to coordinate which event occurs where. Finer resolution of timing allows recognizing coordinated events by coherently combining results from multiple sensors, thereby fully realizing the utility of a distributed sensor system. In fact, the effective resolution of coherent combinations of inputs from multiple sensors is limited by the time synchronization of the sensors. Programming EmNets to achieve significant collaborative processing raises some of the same challenges as are faced in parallel computing and distributed databases. Neither model adequately addresses the combined

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers constraints of EmNets, however. For example, in contrast to parallel computing environments, the data in an EmNet will be inherently distributed. And in contrast to distributed databases, EmNets are much more resource constrained. An assumption in distributed databases is that moving the data from place to place is relatively inexpensive. In EmNets, the emphasis will be on performing the processing where the data are located. Some techniques from each of these models may prove useful, however, and their applications to EmNets merit further investigation. Finally, the cooperative and collaborative nature of EmNets might frequently create requirements for configuration actions that are implemented across all or nearly all the nodes in a network. If a system is self-configuring, at times there may be a need to clearly identify the subsets of the system that have changed or been upgraded. This is referred to as a need for “atomicity,” in which the system as a whole is considered a single, atomic (indivisible) entity. Specifically, the configuration of network protocols or security functions may be an action that must be applied with complete assurance across all nodes in a network. Errors in configuration for one node in a vast area may have unbounded impact. Atomicity of some kind may be needed when a change must be collective or coordinated, but it might not be achievable using standard techniques because there is no enumeration or unique identification of individual components. Moreover, there is a possibility that not all elements need to be upgraded; some components may be disconnected or obstructed for significant periods of time. If a piece of the system is changed, there must be a way for the system to detect whether the resulting final state is workable. How does one determine that enough components have been upgraded to take on the new behavior? How do old components detect that they should change their behavior when they encounter new ones? SUMMARY Self-configuration involves the addition, removal, or modification of elements in an EmNet and the subsequent process of establishing interoperability. In contrast, adaptive coordination addresses changes in the behavior of a system as it responds to changes in the environment or system resources (such as remaining energy). Together, these processes are critical for creating robust and scalable unattended EmNets. The state of the art in self-configuration is fairly well developed, with well-understood approaches to address assignment, service discovery, and mobile code. However, significant research progress is needed to achieve automatic self-configuration among large numbers of distributed nodes, while still conforming to well-defined trust and failure models, which are critical to embedded systems applications. Adaptive coordination is a well-

OCR for page 76
Embedded, Everywhere: A Research Agenda for Networked Systems of Embedded Computers developed discipline for centralized systems, and distributed coordination is widely applied outside of embedded applications (for instance, in Internet applications and protocols), but there is much work to be done in the area of distributed adaptive coordination to support embedded applications. Promising directions include techniques for exploiting system redundancies and localized processing and collaborative signal-processing techniques. Such techniques are particularly critical for unattended, resource-constrained systems. REFERENCES Abelson, Harold, Don Allen, Daniel Coore, Chris Hanson, George Homsy, Thomas F. Knight, Jr., Radhika Nagpal, Erik Rauch, Gerald Jay Sussman, and Ron Weiss. 2000. “Amorphous computing.” Communications of the ACM 43(5). Also as MIT Artificial Intelligence Memo 1665, August 1999. Arnold, Ken, and Jim Waldo, eds. 2000. The Jini Specifications, 2nd ed. Cambridge, Mass.: Addison-Wesley. Corson, M. Scott, and Joe Macker. 1997. Presentation of draft entitled “Mobile Ad Hoc Networks: Routing Protocol Performance Issues and Evaluation Considerations,” IETF. RFC 2501. Fall, K., and S. Floyd. 1996. “Simulation-based comparisons of Tahoe, Reno, and SACK TCP.” Computer Communication Review 26(3):5-21. Floyd, S., V. Jacobson, C. Liu, S. McCanne, and L.A. Zhang. 1997. “Reliable multicast framework for light-weight sessions and application level framing.” IEEE/ACM Transactions on Networking 5(6):784-803. An earlier version of this paper appeared in ACM SIGCOMM ’95, August 1995, pp. 342-356. Gun Sirer, Emin, Robert Grimm, Brian Bershad, Arthur Gregory, and Sean McDirmid. 1998. “Distributed virtual machines: A system architecture for network computing.” Eighth ACM Sigops European Workshop. Intanagonwiwat, Chalermek, Ramesh Govindan, and Deborah Estrin. 2000. “Directed diffusion: Scalable and robust communication paradigm for sensor networks.” Proceedings of the Sixth Annual International Conference on Mobile Computing and Networks (MobiCOM 2000), Boston, Mass. Available online at <http://lecs.cs.ucla.edu/~estrin/papers/diffusion.ps>. Jacobson, V. 1988. “Congestion avoidance and control.” ACM SIGCOMM ‘88. Karp, B., and H.T. Kung. 2000. “GPSR: Greedy perimeter stateless routing for wireless networks.” Proceedings of the Sixth Annual International Conference on Mobile Computing and Networks (MobiCOM 2000). Mcquillan, J., I. Richier, and E. Rosen. 1980. “The new routing algorithm for the ARPANET,” IEEE Transactions on Communications 28(5):711-719. Mullender, Sape. 1992. “Kernel support for distributed systems.” Distributed Systems, 2nd ed. S. Mullender, ed. Cambridge, Mass.: Addison-Wesley. Parsons, David. 1992. The Mobile Radio Propagation Channel. New York: John Wiley & Sons. Sohrabi, K., and G.J. Pottie. 1999. “Performance of a novel self-organization protocol for wireless ad-hoc sensor networks,” IEEE VTS 50th Vehicular Technology Conference 2:1222-1226. Sohrabi, Katayoun, Gregory J. Pottie, and Bertha Manriquez. 1998. “Near-ground wideband channel measurement in 800-1000 MHz.” IEEE 1998 Vehicular Technology Conference. Yu, H., D. Estrin, and R. Govindan. 1999. “A hierarchical proxy architecture for Internet-scale event services.” Proceedings of WETICE’99, June.