Models of Computation
As discussed in Chapter 2, advances in circuit design, packaging, power management, and networking (especially wireless networking) provide the components needed to construct large networked systems of embedded computers (EmNets) for a wide range of applications. The opportunities are, in fact, overwhelming, because these components will be incorporated into systems of increasing complexity on which society will depend in unprecedented ways. The effort needed to design systems so that they can be maintained, configured, and trusted will be substantial. If EmNets are to be designed in a principled way rather than being assembled using techniques determined on a case-by-case basis and specialized to the system being built, computational models will be needed to provide a conceptual framework in which the designs can be created, thought about, and tested.
Designers of complex systems use a range of conceptual models to help them construct and reason about systems. These conceptual models are built out of a set of abstractions that hide those aspects of the system that are considered to be either irrelevant or sufficiently unimportant. By not being part of the model, these irrelevant or unimportant aspects need not be thought about in the design of the system, and a variety of ways of implementing the abstractions they correspond to can be used when constructing the system. Thus, the right computational model will simplify the system as well as allow different implementations of the design. Further, the computational model provides the designer with the conceptual mechanisms that allow trading off one aspect of a design against other aspects. When given the appropriate abstractions, the designer of a sys-
tem can decide to maximize certain features of the system at the cost of others, or decide to design a system that trades functionality in one area for functionality of some other part of the system.
The adequacy of a computational model is determined by two measures. The first measure is the suitability of the abstractions that have been chosen: They should allow those aspects that are important to the system to be represented in the model and not require the designer to think about those aspects of the system that are not important. The second measure of adequacy is the implementability of the computational model on the environment it is meant to encompass. A model may incorporate abstractions that make the design of a system easy, but that is no help if the abstractions cannot be implemented in the target technology of the system. On the other hand, a set of abstractions might be straightforward to implement but not allow the designer to focus on the properties that are needed, because the abstractions do not simplify the system enough to make the design tractable, or they might simplify it in the wrong way, making it impossible to attain some important aspect of the design.
Computational models are not required to build working systems. Indeed, since one of the questions that needs to be answered in evaluating a computational model is whether it is possible to implement the abstractions of the model, some systems must be built before a model is completely fleshed out and fully validated. In particular, functioning EmNets have been and will continue to be built without complete computational models for them. However, without such models, these systems must be built in an ad hoc fashion, and problems that are not addressed by the existing models must be addressed while the system is being constructed. These problems need to be solved anew by each system implementation, making the process more costly and more time consuming. In short, coherent, well-thought-out computational models will eliminate these problems and will facilitate analysis of systems (for example, to ensure trustworthiness) as they evolve over time.
A number of existing computational models might be applicable to EmNets. Because these systems are built with multiple processors used for a particular task, models of parallel computation could be extended to them. EmNets also share characteristics with storage area networks and distributed databases, so models that have been used in those arenas could also provide insights. However, the computational model most often used in thinking about an EmNet treats it as a distributed system, focusing on the interaction of computation and communications. In distributed systems, these models describe both how the various processors carry out the computation and how they communicate with one another.1
Because all computational models are really contracts—that is, particular abstractions can be used given that they can be adequately implemented and particular functionality can be reflected in the abstractions—it is important to examine the models when the problem domain, the properties that the system needs to maintain, or the hardware configuration changes. All of these changes come into play with the design of EmNets. Hence, it is important to ask the following questions:
What abstractions used in traditional computational models might be applied to EmNets, and are those abstractions rich enough to allow a model that is sufficient for the properties that are needed in EmNets?
Are there new abstractions that must be created, either in addition to or replacing those of a traditional computational model, when computational models for EmNets are built?
Is it possible, given the abstractions that can form a coherent and adequate computational model for EmNets, to implement those abstractions in the technology that will be used for EmNets?
This chapter examines these and other key modeling issues. The first section provides a primer in models of computation. The second section examines the models of computation already developed and in use for describing distributed computing systems. The third section identifies ways EmNets might strain or require extensions in existing models and describes potentially fruitful avenues of inquiry that could lead to the development of new or enhanced models appropriate to these systems. The last section suggests an overall approach to pursuing this type of research.
WHAT ARE MODELS OF COMPUTATION?
Existing computational models function at many different levels of abstraction; often, high-level abstractions build on simpler ones. The abstractions can involve data, computation, and communication. The most familiar computing model is probably that of a sequential processor, which states that the output of the system can be modeled by a simple sequential execution of the instructions in the program. Although almost all processors execute instructions in parallel to enhance performance, and some modern processors execute instructions out of order (see Chapter 2), the computational model used by programmers assumes that the processors obey a set of constraints that allow this simple, sequential computational model to be retained.
Computational models evolve over time, as abstractions are introduced to eliminate unnecessary details and clarify the important design
points of the systems being modeled. In the early days of computer science, the data aspects of a computational model were thought of in low-level terms, such as bit strings or integers. Such low-level abstractions were often tied to particular machine architectures, which used data words of different sizes, and they were difficult to use in different environments or to reason about. This led initially to the notion of simple data types (for example, 8-bit bytes and byte streams) and ultimately to the introduction of the higher-level data abstractions that are used today, such as abstract data types or objects.
Abstract data types, rather than focusing on the data structure implementation, model the data and operations at a higher level in terms of the desired response. One way of implementing these abstract types is through objects, which represent information in terms of the operations (often called methods) that can be performed on that information and which associate with that information the code needed to manipulate it. Thus, rather than representing a geometric point as a pair of integers indicating the x and y coordinates, an object representation would define methods that returned the x and y coordinates and would allow the point to be drawn or moved. How the object actually represents the information is left up to the implementation of the object (for example, it can use a pair of integers, polar coordinates, or some other scheme). Such objects allow functionally equivalent representations of information to be treated as identical from the point of view of the user, allowing the user or a higher-level model to concern itself with the use of information rather than the representation of it.
Computational models for distributed computing have followed a similar evolution. Early models were concerned with the communication of data from one cooperating computer to another. For example, the Open Software Foundation’s Distributed Computing Environment (DCE) Remote Procedure Call (RPC) (Zahn et al., 1990) system centered on describing data and communicating them from one machine to another, no matter what the internal representation of that data might be. Abstract data types in the form of interfaces were introduced in the Common Object Request Broker Architecture (CORBA) (Object Management Group, 1991), allowing definitions of the types of information that could be exchanged from machine to machine without reference to the way the machine represented or computed that information. Object-based systems, such as those in Modula-3 Network Object (Birrell et al., 1994) or the Java Remote Method Invocation (Wollrath et al., 1996), allow objects and associated methods to be communicated from one machine to another in the system. These systems can be seen as extensions of the techniques used on a single machine, adding the communication aspect to the model for the distributed system case.
Such innovations represent important progress because they allow a change in the level of detail, from how bits or other groups of entities are managed, to behavior that can be depended on by the rest of the system. This shift enables a modular decomposition of functionality that is critical for keeping system complexity under control. Thanks to these additional layers of abstraction, reasoning about the system needs to take into account only the information supplied by the abstract data type or object, not how that information is represented in the underlying execution engine. This specification of what information is supplied (or required) acts as an interface, stating only what is necessary for the information and not the incidental features of the particular representation of that information. As discussed previously, an increase in the level of abstraction of the interfaces on which the system relies also greatly reduces system fragility, because a system can adapt and change some of the lower-level mechanisms while maintaining the higher abstractions needed for system operation.
By supplying these abstractions, the computational model also limits what can be expressed within the computing model. Each abstraction limits the detail that is considered important in the model, simplifying reasoning about the system at the price of limiting the vocabulary of the designer. When applying a computational model for one discipline, such as distributed computing, to the domain of EmNets, the overriding question is whether the trade-off between abstraction and expressive power has been accomplished correctly. If not, the computational model will need to be extended or replaced by one that gives the proper vocabulary to the designer of the systems in the new domain.
Whether or not a particular computing model can be implemented is often determined by the set of presuppositions on which the model is based. Building an abstraction may require certain properties in the underlying system that are not explicitly part of the model. For example, one of the major differences between the distributed computing model articulated by the CORBA abstractions and the model articulated in Java Remote Method Invocation is the latter’s ability to pass objects, including their methods, from one participant in the network to another. This, in turn, is implementable because the system presupposes the existence of the Java Virtual Machine on all members of the system, allowing both behavior and data to be passed in the distributed system. The CORBA system does not make this presupposition, so it can only allow the passing of behavior in very limited circumstances, since a general model of mobile behavior, while useful, would be unimplementable.
Models of computation also allow the precise definition of notions of resource complexity. In more conventional systems, this has often meant time, space, and communications bandwidth. In EmNets, trade-offs be-
tween energy, latency, memory, processing, bandwidth, and persistent storage will be necessary. As algorithms are constructed to work within the computational models created for EmNets, it will be necessary to evaluate them with respect to these various complexities and the trade-offs between them.
DISTRIBUTED COMPUTING MODELS: CURRENT PRACTICE
While there are several models for distributed computing, nearly all of them are based on one of two underlying abstractions: distributed objects and distributed shared memory. Both provide a basis for understanding computing systems in which elements are distributed across a network and, as such, can offer a starting point for thinking about EmNets. Other models can be built on top of these basic models, offering higher levels of abstraction when necessary. These two models, however, form an expressive base that is carried through in the models built on top of them. If these basic models lack a way of expressing concepts that are needed for thinking about EmNets, models built on top of them will be unable to add the concepts at a higher level. If these basic models cannot be implemented in the environment presented by EmNets, it will not be possible to implement computational models built on top of them. As will be seen, both models have serious deficiencies when used as a base for EmNets.
As interesting as the concepts used in building the traditional computational models of distributed systems are the concepts that have been abstracted out of such models. The traditional model has concentrated on the mechanisms for passing information from one network component to another (RPC, message passing, shared memory). However, the traditional model has abstracted away notions such as communication timing, resource use, and memory requirements for the underlying system. These are not important concepts in traditional distributed systems, since those systems assume that the entities that are connected by the network are sufficiently powerful computers, plugged into an adequate source of long-term power, with few limits in terms of the amount of memory available or the ability to store persistent information. However, a number of these concepts that do not appear in traditional computational models of distributed systems are vital to the design and understanding of EmNets. A similar example has to do with the failure models that have been developed for distributed systems (Schneider, 1993), which range over a variety of ways in which the communication between systems can fail but have a simple model of failure in terms of the components of the system themselves. This simple model of failure may be inadequate for EmNets,
where there are likely to be large numbers of networked systems that may fail (or turn themselves off) often.
Differences such as these call into question the use of traditional distributed computing models in the domain of EmNets. At the very least, it seems clear that certain concepts that have been abstracted out of the computational model for other kinds of systems will need to be added to reach a model that is adequate for reasoning about EmNets.2 The rest of this section elaborates on some of the assumptions made in traditional models and explores why such assumptions may not be adequate for EmNets.
Both distributed shared memory and distributed objects are based on attempts to abstract over many of the details for the communication needed in a distributed system. Sometimes this is achieved by assuming that a robust network is used in the system that can deliver information to the desired destination. Other systems may attempt to mask communication failures or reflect such failures to the next layer or even the application. The goal, in both cases, is to allow the system designer to concentrate on the way the system works without having to worry about the reliability of the underlying communication framework.
In the distributed objects model, the entire system is composed of objects, or combinations of information and the functions or methods used to manipulate and access that information. These objects can reside on different machines; in some of these systems, the objects can migrate during the computation. In this model, objects are created with the knowledge of how to communicate with certain other objects (that is, they are provided with references to these objects when they are created) or types of objects (that is, they are provided with references to these objects as the result of a method call), and they do so by calling the methods associated with those objects. When objects call the methods of other objects, the object being called can be on either the same machine as the caller or on a different machine. The call mechanism abstracts away the details of the communication needed to make a remote call, thus simplifying the model. This means that in the implementation of the model, the call mechanism must handle all the communication issues, such as dealing with an unreliable network by retrying the call as appropriate. Some systems try to supply a call mechanism that can deal with all forms of failed communication, but some forms of failure break this abstraction. Other systems attempt to reflect such failures to the caller, perhaps by an error message indicating communication failure. However, in all of these systems the
assumption is that communication rarely fails and that the cost of communication is at worst the time it takes for the communication to take place.
In the distributed shared memory model, individual computation units do not communicate directly with one another. Instead, an area of memory is provided to each unit and made to appear to be common to all units. Computation units use this area of shared memory to communicate indirectly, by calling methods of objects in this shared system state. A typical way of using this model is to make the objects in this system state very simple, so that their only methods are read and write; but the model can also be applied to objects that allow any kind of method. Note that this technique does not require an actual area of physical memory to be shared by all computation elements; rather it is an abstraction of a possible, more complex interconnection network that provides this illusion. As in the case of the distributed object model, the communication mechanism must “do the right thing” in the presence of network problems and failures and convey the right information to users when problems cannot be masked. The shared memory model attempts to present a model to the programmer in which there is no communication, only the invocation of methods on local (but shared) objects. With such a model, either the underlying system must be able to mask all communication failures from the participants or the computational model of shared memory must be compromised to allow information about such possible failures to be visible. Implementing the model without accommodating failures requires a network that can be made as reliable as memory access, and again the cost of communication is represented as (at most) increased latency in the access to shared memory.
Other models can be and have been built on top of one of these two models. An example is the class of models built on the idea of a shared whiteboard, which can be seen as an extension of either the shared memory model or the distributed object model. In such systems, there is a single shared repository of information objects that is accessible to all participants in the distributed system, and communication involves writing information into such spaces and allowing it to be read out by some other member of the distributed system. The shared space can be viewed as shared memory with special access operations or as a special type of distributed object. In either case, the new model is a further abstraction on one of the more basic models. Rather than adding new concepts to the model, it builds new abstractions on the old models. Lessons may also be drawn from higher-level parallel programming models, such as NESL,3
For more information on NESL, see <http://www.cs.cmu.edu/~scandal/nesl.html>.
BSP,4 and HPF,5 where a rich set of aggregate operations is provided to the programmer and compiled down into code for the constituent nodes and components. However, with EmNets the collection of nodes may be unstructured, constantly changing, and oriented toward real time. This problem is also related to database query processing, if one views the data being collected from the pool of sensors as a fine-grained distributed database. This view is attractive, because data are identified by key, rather than by address. However, the model for EmNets will not be working with regular tables and records but with a changing collection of data streams, where aggregate query operations must be spread across many tiny nodes and must be placed as close as possible to the data so as to minimize energy-consuming communication. A third and related viewpoint is that the EmNets are an extremely fine-grained tuple space, as in Linda6 or JavaSpaces (Freeman et al., 1999). Linda-like systems can be seen as a shared whiteboard in which a particular naming system is used that has been extended to deal with both communication and concurrency issues. Many operation sequences take place in the tuple space concurrently, with associative operations utilizing the inherent parallelism. A unique element of EmNets is the opportunity to exploit redundancy in an adaptive fashion to manage density and power utilization.
The hardware design community employs discrete-event concurrency models (as implemented primarily in Verilog and VHDL) to design highly reliable and understandable concurrent systems. Synchronous models, which originated in the hardware community, are arguably one of the most powerful concurrency abstractions by virtue of their ability to handle complexity in understandable ways. These models have spread to software design, as embodied in such languages as Esterel7 and Lustre.8 Even within the culture of the software world, abstractions such as process networks, port-based objects, I/O automata, functional languages, rendezvous-based models (such as CSP or CCS), and data-flow models all provide abstractions for use in their particular problem domain. All of
For more information on BSP (the Bulk Synchronous Parallel computing model), see <http://www.bsp-worldwide.org/>.
For more information on HPF (High Performance Fortran), see <http://www.crpc.rice.edu/HPFF/>.
Linda is a language for parallel programming in which communication occurs by inserting and retrieving tuples, collections of data referenced by a name, into a shared area. For more information, see the Linda Group at <http://www.cs.yale.edu/Linda/linda.html>.
For more information on Esterel, see <http://www.esterel.org/>.
For more information on Lustre, see <http://www-verimag.imag.fr/SYNCHRONE/lustre-english.html>.
these models, however, are built on top of either the RPC model or the shared object models, and similar limitations with respect to EmNets apply.
NEW MODELS FOR NETWORKED SYSTEMS OF EMBEDDED COMPUTERS
EmNets have many of the characteristics of traditional distributed computing systems, since they are collections of computing elements connected by networks attempting to perform some task in a cooperative fashion. However, EmNets are made up of components that have characteristics very different from those that make up traditional distributed computing systems, components whose limitations make it difficult to implement the standard abstractions of the traditional models. Because of the way EmNets will be used, the design trade-offs made for those systems will often be very different from those made in the design of standard distributed systems, requiring the introduction of new concepts and abstractions to allow thinking about appropriate balance.
A computational model is useful only when the abstractions in the model can be implemented in the technology for which the model is constructed. A useful computational model must also allow the designer to reason about the characteristics of the system that are important. In EmNets a number of characteristics are important that are not present in the standard computational models for distributed systems and that make it difficult to construct the abstractions common in computational models of distributed systems. These characteristics include the following:
Reasoning about time and location. Since EmNets will often interact with the physical world in a way that satisfies real-time constraints, designers will require a model that has reified the notion of time and allows making design trade-offs concerning timely response. The tight coupling of EmNets to the physical world allows those systems to make use of notions of location, colocation, and proximity that are not possible in standard computational models of distributed systems. Because of this coupling, the functioning of EmNets often depends on inputs or requires outputs that are not modeled by an exchange of information between parts of the distributed system. Thus, a computational model in which behavior of the overall networked system is defined by the information exchanged between the computing elements of the system cannot be implemented in EmNets tightly coupled to the physical world.
Resource limitations. The limited resources—in terms of the resources available on the computing elements themselves and of the ability of those elements to communicate—in an EmNet will require a com-
putational model in which the use of those resources becomes part of the model. Notions such as memory limitations, energy conservation, and access to persistent storage cannot be abstracted away but must be an explicit part of the design of EmNets. A computational model that assumes an environment without such constraints will not be implementable in EmNets.
Heterogeneity. EmNets are built out of components that show a high degree of heterogeneity. Some of the components will make use of traditional computing elements with persistent storage and abundant energy supplies and will be connected by wired networks with high reliability and bandwidth. Other components will be built with specialized processors having limited processing power, will have limited or no persistent storage, will be connected using low-bandwidth wireless networking, and will have limited, self-contained power supplies. A computational model that does not allow differentiating the kinds of nodes that will be used to construct these systems will not be able to conserve the limited resources available to the lowest-level members of the network nor will it be able to capitalize on the power of the most competent members of the system.
Nonexpert users. Since EmNets will often be operated by nonexpert or casual users who have only a superficial understanding of the technology, the failure of such systems will need to be communicated to those users in ways that allow the failure to be understood and appropriately responded to. The computational model will need to have a rich failure model, allowing designers to decide which of the failures can be dealt with by the system and which will need to be reflected to the users. Unless the various kinds of failures in such systems are part of the conceptual model, designing a system with such failure models will be difficult or impossible.
Many redundant components. The ability to produce large numbers of similar components cheaply will allow some EmNets to introduce levels of redundancy and scope that are not possible with more conventional computational models for distributed systems.
Long lifetimes. Since EmNets will often be designed for a lifetime that exceeds the lifetime of any one of the components, the need to reason and design around in-process upgrades of the system requires a computational model unlike those used in more conventional distributed systems. In effect, this means that the already high degree of heterogeneity in these systems will also have a time element, with the components changing over time as well as from place to place within the particular system. This will require more than just the kinds of reconfiguration and adaptation talked about in Chapter 3; it will also require a computational
model in which the abilities of the various parts of the system can be queried and reacted to within the software of the system.
As has been emphasized throughout this report, no single aspect of EmNets is unique to the emerging field. Other systems have had real-time constraints. Other systems have been built from small, resource-limited components. And other systems have had to interact with the physical world.9 All of these systems have been based on, or formed the basis of, a computational model that has addressed some of the needs of the computational model for EmNets. What makes developing the computational model for EmNets unique is not any particular aspect of the model, but the combination of large numbers of networked components, resource limitations on those components, duration of deployment, connection to the physical world, and richness of potential connectivity. The mission-critical and, sometimes, life-critical nature of these systems makes a coherent computational model for these systems a high priority for the academic and industrial research communities.
In the next sections, the committee identifies areas in which the computational model can make use of information or needs to allow for reification if it is to account for the unique combination of features and requirements presented by EmNets. The computational models that arise for EmNets may not include all of the areas that are discussed, or they may include features that are not included in the discussion. What follows are the features that appear at this point to be the most promising for enriching a computational model for EmNets.
Models with Resource Constraints
An immediate challenge in creating a computational model adequate for EmNets is to determine the right level of data abstraction. As discussed above, existing distributed system computational models abstract
Distributed control systems (discussed in Chapter 3) have operated distributed infrastructures such as the electric grid, pipelines, and railroads that (1) are closely tied to the physical world, (2) must cope with location, and (3) operate under time and resource constraints. However, in each of the above examples, their layout has been predetermined and their interaction with the physical world extremely prescribed. The physical coupling discussed in this report is of a much tighter nature (for example, chips embedded in everyday objects with which the user has experience in interacting directly rather than with the computer system to which it is connected). In addition, the aforementioned systems are generally tethered (that is, connected directly to easily replenishable sources of power and to communications infrastructure) and do not have the power limitations under which many EmNets will have to operate.
away performance issues, both on a node and in the network, and are concerned about the order of events but not their timing. This simplification is often useful but sometimes hides too much information. For example, one way of handling diversity in a system with a long lifetime is to run a virtual machine (VM) on each node. Although this provides an environment in which code can run on any node, it completely prevents the application from determining the available resources of that node. One of the critical problems is to find some new, low-level models that extend the VM notion to allow designers, and even applications, to reason about resources. The difficulty is how to accomplish this while maintaining a general framework that is simple enough to be useful. If applications need to select an algorithm given the current resource constraints, determining which algorithm to run should not consume more resources than are saved by the algorithm selection.
Resource constraints also affect issues such as data abstraction. Data abstraction will continue to be important for EmNets, as will the grouping of abstractions into type hierarchies to allow families of related types of objects and the use of various design patterns to hide implementation details. Such abstractions will be needed to hide the particular types of computing elements used in EmNets (which promise to change radically and rapidly over the foreseeable future) while still allowing reuse of computational models, system designs, and (in some cases) software. It may be necessary to redefine certain data abstractions to give applications in this new domain access to the additional data they need to carry out their functions. The abstractions may need to provide ways for higher levels to negotiate different qualities of service (for example, time to carry out specific methods on this object at this time) or performance trade-offs (for example, speed of communication versus resolution of data provided). Memory constraints can also drive work on finding simpler ways of implementing these data abstractions.
Resource constraints in the network also will stretch current computational models. In the two common distributed system models, communication is abstracted almost completely out of the problem. Although this greatly simplifies reasoning about the system, it seems unlikely that these models will be rich enough to support EmNets. Both models buy simplicity at the cost of considerable complexity in the underlying system; it is not clear that this trade-off will be correct for the small components and subsystems that will constitute EmNets. More troubling than the need for richer high-level models is the possibility that the low-level models for the different communication layers will need to change, too, to reflect the resource constraints and poor link reliability of wireless nodes. The ways that networks are formed, messages routed, and participants described have evolved for networks of stable, stationary computational
elements. Researchers need to explore whether the networking layers on which these abstractions are built are correct for EmNets. If not, researchers need to explore how these models can be extended to allow additional information to be available for the communication layers, or available in a simpler form for the application, without making the model so complex that it is no longer useful.
Models Dealing with Failures
To design a reliable system, the designer needs a model that includes the types of failure that the system can experience, so that the design can respond to those failures. Some failures can be handled by the system itself; other failures can only be dealt with by the application, and still others will need to be reflected to the user. Failures may compromise security, safety, and/or reliability. Standard formal models of distributed computing identify failures of the components (such as crash or fail-stop failures); failures in the communication infrastructure; and Byzantine failures, in which a component can act in random fashion (including acting like a nonfailed component that sends incorrect information). Actual systems rarely deal with all of these failure models but vary by which failures they try to handle and which are exposed to the application. Examples of such failure models are provided in Box 5.1. However, these models were developed based on the assumption that a component that fails cannot be replaced, and that failure is generally rare or limited in scope. In the case of EmNets, in which the components are low cost and limited in their resources and functionality, different forms of failure may need to be accounted for within the system. A component may fail for a finite period of time, for example, shutting itself down to conserve energy. A network may fail because of limits on bandwidth, allowing some information to be passed from component to component but not allowing the throughput needed for the propagation of all relevant information. These types of failures may require a richer failure model than has typically been provided up to now.
Responses to such failures may also follow an unusual path in EmNets. Whereas a component failure in a standard distributed system might require failover to some replicated component or the election of a new leader in a master/slave replication, such a failure in an EmNet might require only that information be obtained from a different component of the system. In an EmNet that has large numbers of nodes gathering information, the failure of some nodes might be handled by estimation techniques using the information gathered from the remaining nodes. (This has obvious implications for the reliability and survivability of these systems.) Similarly, a network failure may require finding a different
Failure models can have a significant effect on the overall computational model for a system. The introduction of a failure type into a failure model may make the building of an application more complex than it would be with a less complete model, but the resulting application may be more reliable because it can survive failures that are not part of the simpler model. These differences can be illustrated by comparing systems with different types of failure models.
As an example of the simpler model, the Object Management Group’s Common Object Request Broker Architecture (CORBA) includes a remote procedure call (RPC) system in which communication failure was not originally part of the computational model. Calls could be made from objects on one machine to objects on a different machine, and it was assumed that the communication infrastructure would ensure that the call would be made and, if expected, a value returned. In later versions of the system, the failure model was enhanced by introducing the notion of an exception that would be thrown when the communication failed. The programmer using the system was not required to handle this exception; if an exception was thrown and no part of the program receiving the exception was explicitly designed to deal with the communication failure, then the client program would simply fail.
CORBA can be contrasted with the model found in the Java Remote Method Invocation (Java RMI) system. The RMI is also an RPC-style system, allowing an object on one machine to make calls to objects on a different machine. However, the RMI system requires that any method that can be implemented as a remote call be declared as possibly throwing a special exception that indicates a communication failure. Further, this exception must be handled by the calling code; if there is no exception handler, then the calling program will not compile. How the exception is handled will be application specific. In some cases, the client may simply shut itself down. In other cases, the client may try to find an equivalent service provider or roll back some internal state or contact some administrator to determine the cause of the communication failure. Thus, the notion of communication failure is part of the RMI computational model in a way it is not in the CORBA model.
As a result, programs written using RMI are somewhat more complex than those using CORBA in the sense that RMI programs must contain code to deal with communication failures, whereas programs with similar functionality written using the CORBA system need not. The RMI programs containing this extra code are also more robust, in the sense that they will survive failures in a network that would cause termination in the equivalent CORBA-based program.
neighbor to use as a pathway for the information (consider, for example, communications routing in the Internet.) The capability of the overall system to adapt to failure rather than to simply replace the failed component with an equivalent offers a new route to failure recovery that cannot be taken in more traditional systems but can be exploited in the circumstances offered in EmNets. This possibility opens up a number of interesting data modeling questions for EmNets, as discussed in the next section.
New Data Models
The kinds of systems that will be built with EmNets present a number of programming model problems. While these problems are not entirely new, they arise in a unique environment that makes traditional solutions to the problems difficult or impossible to use.
A key question is how to model the information gathered by an EmNet. Because many of the components are assumed to be unreliable, some will inevitably fail, and when they do, other parts of the system must be able to take over critical functions or compensate for the failure in some other fashion. In addition, if the components recover or are replaced, they need to continue doing what they were doing before, which may well include knowing some of the information gathered over time. All of these requirements imply a need for persistent data. Furthermore, the ability to have one component take over for another argues for a persistent state that is not stored at the component. One promising approach would be to model the system as if components were largely stateless, with a robust storage device in the network. Although a direct implementation of this approach would lead to a single point of failure and high cost, it is possible to distribute this store among the elements that maintain this abstraction and can tolerate failures in the nodes and networks. The computational model presented in such a system has at least two levels of memory. The first, which is not persistent but which is common at the leaves of the network, requires programming techniques that guard against the loss of information. The second, found in the interior of the network, stores the information in a persistent fashion. One of the interesting programming questions in such a system is how much processing should take place at the leaf nodes of the system. The components will be able to do some computation, and the more the amount of raw data available to the sensors can be reduced before sending it to the rest of the network, the more bandwidth is conserved. However, such computation means that power is being used at the edges of the network and that failures may result in the loss of the data. These sorts of trade-offs can only be made in a computational model that reflects
the two levels of memory and allows reasoning about the costs and benefits of the design choices in such a system. Whether these methods are appropriate for EmNets is an open research question. Research is also needed to determine if this information must be handled by explicit programming or if it can be made automatic, and to learn what requirements and costs are associated with automated backup replication and archiving.
Explicit programming to generate a consistent, persistent memory is made more difficult because of issues having to do with concurrency and failures. When information is spread over a set of machines that can fail independently and are connected by a network that also can fail, it is difficult to coordinate changes in that information to ensure global consistency. Further, as different parts of the system manipulate the same information, it is possible that changes are made at inopportune times, giving inconsistent views of the system. A computational model traditionally used to deal with these issues involves the notion of a transaction. In a transactional model, a coordination convention is introduced to ensure that in all but the most extreme of failure conditions, either all the operations in the transaction are completed or none of them is. It is not possible for some to be completed while others fail. Further, the transactional model introduces concurrency controls that ensure that each view of the system is consistent and that all parts of the system will view changes as happening in the same sequence. In systems supporting this model, one need not worry about what happens if a failure occurs half-way through the operation; in addition, transactions ensure that the intermediate state of the atomic collection cannot be seen by other operations in the system.
The transactional model is an example of a computational abstraction that makes the job of the application programmer much easier, at the price of increasing the complexity of the underlying system. The transactional model of memory is very powerful because it simplifies reasoning about many types of interactions; however, implementing a transactional model of memory is quite complex and may not be possible on all of the various kinds of nodes found in EmNets. These implementation issues may make a pure transactional memory model too expensive to be used in the design of EmNets, and it might be possible to create a compromise model for these systems. Some weaker notion, with fewer guarantees but also without some of the implementation problems that accompany the transactions mentioned above, might be developed both to maintain consistency in the persistent state and to accomplish some of the application tasks.
The transactional model is also an example of how a single abstraction can be introduced into a computational model to greatly facilitate the design of reliable systems. Currently, there is no such unifying and sim-
plifying abstraction in the computational model for EmNets, and one is sorely needed. It might be some variation of the transaction abstraction, or it might be a completely different computational construction. The only way to develop it is to encourage research in a number of different directions to find one that bears fruit.
The two-level model of memory leads naturally to a shared memory model of communication, described earlier. But to make a big, persistent store work flexibly, methods of naming the contained objects are needed. One particularly interesting research question deals with the intentional naming of objects—providing a name for an object that is related to its function or other attributes. This naming structure might have significant advantages in systems with high redundancy levels, in which similar data are collected by many different devices. Isolating information so that not all of the information obtained by every component is available to every other component may also require hierarchical or partitioned memory models, in which the placement of information determines which components can access it.
The programming models used by these systems may depart from the familiar in radical ways, or they may take familiar programming notions and apply them in ways that they have not been applied before. Many EmNets are highly event- and datacentric. Especially in sensor networks, users may be more interested in receiving information about a particular event that has been detected (for example, a chemical concentration exceeding a particular threshold) or in receiving a particular set of data (for example, the chemical concentration in a particular geographical region) than in receiving information from a particular node (for example, the chemical concentration reported by sensor number 1234). This may also be true in a smart space in which users wish to send data to the nearest network element, to an element with particular characteristics (for example, high-bandwidth communications capability), or to the nearest element to which they have a direct line of sight. This sort of capability becomes even more important in dynamic systems in which nodes, resources, obstacles, and event triggers themselves move around in unpredictable ways. It implies that many EmNets will need to be designed with a focus on naming and operations on data elements instead of naming and operations on node identities. Event-driven programming is common in areas like user interfaces, where the program is driven by events generated by the user. These techniques, which share the quality of reacting to occurrences in the physical world, are generally not applied in the context of a network, but may provide a fertile area of information exchange between practitioners of different fields of computer science and other disciplines.
Models of Trust
Trust issues enter into the computational model of EmNets for many reasons (see Chapter 4), including the likelihood of changes in the set of entities that make up those systems and the likelihood that such systems will make use of mobile code. Both likelihoods may require adding trust notions to a model for EmNets that are traditionally outside of conventional computational models.
In the case of mobile code, it will often be the case that the environment into which code is moved will need to establish a trust relationship with that code. This cannot be done by some interaction with the code, since by the time such an interaction could happen the imported code will have been loaded into the host environment and will probably have had access to at least one thread of control. Waiting until this point to establish a trust relationship with the imported code is dangerous, since the code could already have damaged the host system. The mechanisms for establishing trust may in fact reside in the underlying system and will only be reflected in the computational model as additional failures that can occur because of security. However, the computational model may need to be enriched beyond that to allow setting various limits on the power of imported code. What will be required for trusting mobile code is not clear; what is clear is that research into the establishment of such trust relationships is needed.
Beyond the trusting of mobile code is the reestablishment of trust when members of the system are replaced, repaired, or upgraded. The discussions of reconfiguration in Chapter 3 only go as far as to allow the establishment of communication and cooperation between such nodes; they are essentially questions of how we can make such nodes work together. The questions surrounding the reestablishment of a trust relationship are fundamentally different in that they involve the set of circumstances under which such working together is not allowed to occur. However, the decision whether or not to trust either new (mobile) code or new elements of the EmNet will need to be part of the computational model.
Models for Concurrency
EmNets are inherently concurrent systems, that is, they are collections of entities that operate independently but attempt to cooperate on a common task. There are no particularly good programming models for concurrent programming; in fact, the general wisdom is to avoid the need for concurrency whenever possible. Concurrency in programs tends to be programmed directly. For example, an active object might begin with a
single sequence of instruction execution and as part of that execution, create other, independent sequences of instruction execution. These would occur either in another processor or on another machine, in a logically separate process scheduled by the operating system of a single machine, or in a separate thread of execution in the same process, scheduled by the underlying operating system or by some library. If these so-called threads of execution are cooperating, they must do so by communicating or sharing some information. Access to the communication paths or shared information is generally coordinated explicitly by the programmer, using mechanisms such as locks and semaphores. However, this type of explicit synchronization is a well-known cause of bugs, the most common of which involves a single thread assuming that a piece of shared information cannot be changed over some period of time by any of the other threads of execution, even though no lock is held on the information.
Similar explicit approaches to concurrency control, such as shared job queues that allow coordinating work among the different threads of execution, are also limited in scale or prone to programmer error. Systems that attempt to hide or deal with these issues have automatically been designed around small networks of very large machines, and it is not at all clear that the same principles apply to large networks of very small machines.
Of further concern, almost all existing ways of dealing (programmatically) with concurrency introduce the possibility of very large time delays when there is competition for a resource, a pattern that runs counter to the need of EmNets for predictable, real-time performance. Given that attachment to the real world is a requirement and that it entails known performance parameters, it follows that the usual ways of dealing with concurrency are not applicable to EmNets. An additional constraint in EmNets is the need to support the model on very small system components (for example, 8-bit processors with very limited programming and storage).
There are, however, methods that might be applied to concurrency within EmNets. Optimistic or wait-free algorithms may be applicable in these systems. In addition, some of the techniques of control systems—in which constant approximations are made of future states that are then compared to the actual results—can cut down the requirements for concurrent access to information. This is an open area of research both within the EmNet community and within the larger programming community, and results from both communities should be studied for their applicability to the problems of concurrency in EmNets.
Models of Location
As noted before, a defining characteristic of EmNets is their connection not only to other computing systems but also to the physical world. Because of this connection, there is a mapping from many of the members of such a network to a particular location in three-dimensional space, namely the location at which the system interacts with the world.10 There are also spatial relationships among the various elements of the EmNets themselves. By adding location information to the basic computational model, it may be possible to invent new algorithms, techniques, or configurations that exploit this additional information to make advances in reliability, trust, or functionality. A number of location-based concepts could be of interest, including absolute location, proximity, relative distance, and relative motion. Whether some or all of these are needed or relevant is an open question that needs to be addressed. In addition, the layers at which location should become part of the model, and the interfaces used to gain access to that information, need to be investigated. Such an approach exploits the impression that many EmNets are event-or datacentric: What matters is not the precise part of the EmNet that is performing some computation but rather the sensing of some occurrence or the computing of some data by any member of the assemblage.
Traditional networked systems have tended to be closed in the sense that interactions take place among members of the system, with little or no connection to the physical world (other than, perhaps, the users of the networked systems and the physical artifacts that are explicitly—and only—part of the system itself). Because of this, such systems were often based on topological principles that abstracted over the physical location of network members and relied only on the connectedness relations between the members. By introducing into the equation the physical location of the elements of EmNets, one can expand the vocabulary for network relationships to include concepts such as proximity, distance, and a host of geometric relationships. This vocabulary (and the information that it allows one to describe) can be used to produce new algorithms that can minimize energy use or maximize computing power in a particular area. It also allows the naming of areas where information is to be gath-
ered rather than the nodes in the network that gather that information, and ultimately the naming of the information itself rather than the sensors that receive that information.11
Similarly, nodes often use information about their location in three-dimensional space to determine their action (for example, which sensor should be tasked to monitor a particular geographical region or which is the nearest switch that should operate a particular piece of networked audiovisual equipment). Traditional computing systems have not needed such information, so support for geolocation information is relatively weak. Robotics is the best example of a computer science discipline that has faced this problem, and work in this field demonstrates the difficulty of the task (see Chapter 3 for a discussion of distributed robotics). Particular technological approaches for supporting geolocation are discussed in Chapter 2; however, even given the existence of geolocation systems, additional effort is needed to define and refine the abstractions used by application and system developers as they work with geolocation.
CONDUCTING RESEARCH ON MODELS AND ABSTRACTIONS
Computational models are not developed in a vacuum. The computational model for EmNets will evolve as applications of the technology are developed. Full applications need not be completed before this activity can move forward, although enough of a prototype needs to be developed that new models can be tried, measured, and evaluated for their relevance and completeness in the new set of environments and with the new set of assumptions that EmNets present. As experience in building these applications is gained, designers will discover which abstractions are useful, which ones hide information that needs to be visible, and what types of connections between the abstractions will allow people to model and reason about the types of EmNets that they want to build.
Research in this area will require a delicate balance between, on the one hand, application development and underlying system construction and, on the other, the building of the computational model. Although some driving applications will be needed to test the work, the goal needs
to be the construction of underlying systems that can be used with multiple applications. The underlying system should be the instantiation of a computational model that presents the right set of abstractions for reasoning about the overall system infrastructure as well as the particular application. Thus, the building of the application should not be viewed as the end goal of the research but rather as a means for identifying those parts of the model and infrastructure that can be applied more broadly than the application at hand.
As these models are built, run-time environments based on them can also be developed, and this, in turn, will make it easier to develop applications using the models. The development of environments based on the models will allow the application programmers to develop systems based on the models more quickly and researchers to evaluate and modify both the models and the environments more quickly. This scenario forms a positive feedback loop in which run-time environments built to reflect models allow more rapid application development, which in turn allows more complete evaluation of the models. Such a cycle can lead to rapid evolution of the model and the run-time environment in response to the rapid development of applications; however, the initial stages of building this loop will be lengthy relative to the later stages and seemingly chaotic as well, as basic assumptions are tested and computational models are in significant flux.
This is not to say that the initial inquiries into computational models and their associated run-time environments will be completely unstructured. There are a number of areas in which it seems clear even at this early stage that fruitful investigation can be undertaken. One such area of investigation is the network model itself. During the past 20 years, both industry and academic researchers have worked with a computational model exemplified by the Open Systems Interconnection (OSI) seven-layer reference model. This model describes a set of abstractions defined by the interface presented by each of the layers, giving a modular structure to the model of the network. In addition, the model requires that each layer obtain information only from the layer immediately below it and provide information only to the layer immediately above it. The end result is a set of models of a network, each providing more functionality (but at a higher cost) than the layer below. Changes in any layer are isolated in that layer, because each layer is defined by an interface, which by remaining the same, insulates the layer above from changes. (See Box 5.2 for more details on the OSI model.) Clearly, the OSI seven-layer model will be unsatisfactory for EmNets, which seem to require something more lightweight. Such networks may need different abstractions at various layers, requiring that different interfaces be defined for the modular constructs. The strict layering of the OSI model may hide infor-
The Open Systems Interconnection (OSI) seven-layer model is a standard taxonomic description of networks and a universal reference model for communication protocols. The model is promoted by the International Organization for Standardization, a worldwide federation of national standards bodies from some 100 countries. The seven layers, together with some examples of the types of network entities that occupy each layer, are as follows (top to bottom):
7. Application (network file system (NFS), file transfer protocol (FTP), hypertext transfer protocol (HTTP));
6. Presentation (extensible markup language (XML), ASCII, Java serialization, COM);
5. Session (Sun remote procedure call (RPC), DCE RPC, Internet Inter-ORB protocol (IIOP), remote method invocation (RMI));
4. Transport (transmission control protocol (TCP), user datagram protocol (UDP));
3. Network (Internet protocol (IP));
2. Data link (wire formats for messages); and
1. Physical (wires, signaling).
The standard world of computers on a network is largely homogeneous at levels 3 and 4, permitting great (and largely transparent) diversity at layers 1 and 2 and great diversity at the higher levels. This is effectively a computational model of the network, specifying (at each layer) the interface to the information at that layer, the information that has to be provided to the next layer up, and what guarantees are made by an entity at a particular layer. Each layer acts as an abstraction over the actual workings of the network, with each piece of functionality built on more basic layers. Those underlying layers can change without affecting the upper layers because they are defined by strong interfaces, which do not change from implementation to implementation.
It seems unlikely that this set of abstractions will suffice for EmNets. For example, an EmNet application might need access to the physical layer for information about power in order to save energy or to the network layer in order to do some creative routing. As the chapter points out, new models and abstractions are needed to handle the unique constraints and challenges that EmNets present.
mation needed by EmNets (for example, information about specific nodes or components); accordingly, some relaxation of the layering may be a fruitful area for research.
It should be noted that once models of computation are defined and prototypes have been implemented, significant work will be needed in the design and analysis of algorithms that work within the new models
for EmNets. Algorithms that optimize for certain resources, for example, and give near-optimal trade-offs between the various relevant resources will be very important. Designing and implementing algorithms that can both solve the problems EmNets will pose and be implementable within the constrained environment that EmNets will be operating in are likely to be a significant challenge. In addition, the question of how the quality of service might degrade in the presence of partial information (a likely scenario since it may not always be possible, owing to bandwidth or resource constraints, to have all the information) may well need to be answered. Current work on this sort of question deals with time-space trade-offs for computation and trade-offs between the quality of the solution and the precision of the input data, for example. EmNets present yet more kinds of trade-offs that will need to be addressed.
Finally, the examples discussed in this chapter share a characteristic—each identifies an assumption of the current computing model for networks that will not hold in the coming world of EmNets and proposes an alternative to that computing model based on a more reasonable assumption. As people attempt to build applications of EmNets, it will be important for them to identify suspicious assumptions or counterproductive abstractions in the current computing model, and to think of alternatives that can be built into the infrastructure for the application. Many more assumptions and abstractions will be identified than have been listed here. Funding agencies should watch for patterns in which researchers identify a doubtful assumption or abstraction, replace it with another that seems more useful in the context of the application, and determine if the new assumption or abstraction can be used in other applications.
Birrell, Andrew, G. Nelson, S. Owicki, and E. Wobber. 1994. Network Objects. Digital Equipment Corporation Systems Research Center Technical Report 115.
Computer Science and Telecommunications Board (CSTB), National Research Council. 1999. Trust in Cyberspace. Washington, D.C.: National Academy Press.
Freeman, Eric , Susanne Hupfer, and Ken Arnold. 1999. JavaSpaces Principles, Patterns, and Practice. Reading, Mass.: Addison-Wesley.
Object Management Group. 1991. Common Object Request Broker: Architecture and Specification. OMG Document No. 91.12.1.
Schneider, F.B. 1993. “What good are models and what models are good?” Distributed Systems, 2nd ed., S.J. Mullender, ed. Reading, Mass.: Addison-Wesley.
Wollrath, A., R. Riggs, and J. Waldo. 1996. “A distributed object model for the Java(tm) system.” Computing Systems 9(4):265-290.
Zahn, L., T. Dineen, P. Leach, E. Martin, N. Mishkin, J. Pato, and G. Wyant. 1990. Network Computing Architecture. Englewood Cliffs, N.J.: Prentice-Hall.