Click for next page ( 155


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 154
5 Trustworthy Systems from Untrustworthy Components It is easy to build a system that is less trustworthy than its least trust- worthy component. The challenge is to do better: to build systems that are more trustworthy than even their most trustworthy components. Such designs can be seen as "trustworthiness amplifiers." The prospect that a system could be more trustworthy than any of its components might seem implausible. But classical engineering is full of designs that accom- plish analogous feats. In building construction, for example, one might find two beams that are each capable of supporting a 200-pound load being laminated together to obtain an element that will support in excess of 400 pounds. Can this sort of thing be done for trustworthiness of computing components, services, and systems? For some dimensions of trustworthiness it already has. Today, many computing services are implemented using replication, and multiple processors must fail before the service becomes unavailable the service is more reliable than any single component processor. Secrecy, another dimension of trustworthi- ness, provides a second example: encrypting an already encrypted text, but with a different key, can (although not always; see Menenzes et al., 1997) increase the effective key length, hence the work factor for conduct- ing a successful attack. Again, note how design (multiple encryption, in this case) amplifies a trustworthiness property (secrecy). Replication and multiple encryption amplify specific dimensions of trustworthiness. But the existence of these techniques and others like them also suggests a new approach for implementing networked infor- mation system (NIS) trustworthiness: A system's structure, rather than 154

OCR for page 154
TRUSTWORTHY SYSTEMS FROM UNTRUSTWORTHY COMPONENTS 155 its individual components, should be the major source of trustworthiness. This chapter explores that theme. By pointing out connections between what is known for specific trustworthiness dimensions and what is needed, the intent is to inspire investigations that would support a vision of trustworthiness by design. Detailed descriptions of specific research problems would be premature at this point too little is known. Accord- ingly, this chapter is more abstract than the other technical chapters in this volume. Getting to the point where specific technical problems have been identified will itself constitute a significant step forward. REPLICATION AND DIVERSITY Diversity can play a central role in implementing trustworthiness. The underlying principle is simple: some members of a sufficiently di- verse population will survive any given attack, although different mem- bers might be immune to different attacks. Long understood in connec- tion with the biological world, this principle can also be applied for implementing fault tolerance and certain security properties, two key dimensions of trustworthiness. Amplifying Reliability A server can be viewed abstractly as a component that receives re- quests from clients, processes them, and produces responses. A reliable service can be constructed using a collection of such servers. Each client request is forwarded to a sufficient number of servers so that a correct response can be determined, even if some of the servers are faulty. The forwarding may be performed concurrently, as in active replication (Schneider, 1990), or, when failures are restricted to more benign sorts, serially (forwarding to the next server only if the previous one has failed), as in the primary backup approach (Alsberg and Day, 1976~. This use of replication amplifies the reliability of the components. Observe that the amplification occurs whether or not the servers em- ployed are especially reliable, provided the servers fail independently. The failure-independence requirement is actually an assumption about diversity. Specifically, in this context, "attacks" correspond to server fail- ures, and failure-independence of servers is equivalent to positing a server population with sufficient diversity so that each attack fells only a single server. Processors that are physically separated, powered from different sources, and communicate over narrow-bandwidth links approximate such a population, at least with respect to the random hardware failures. So, this replication-based design effectively amplifies server fault toler- ance against random hardware failures. Error correcting codes, used to

OCR for page 154
156 TRUST IN CYBERSPACE tolerate transient noise bursts during message transmissions, and alterna- tive-path routing, used to tolerate router and link outages, can also be viewed in these terms reliability is achieved by using replicas that fail independently. Notice, however, that replication can diminish another aspect of trust- worthiness privacy because replicating a service or database increases the number of locations where the data can be compromised (Randell and Dobson, 1986~. Use of selective combinations of secret sharing and cryp- tographic techniques (so-called threshold cryptography) may, in some cases, reduce the exposure (DeSantis et al., 1994~. And replication is not the only example in which techniques for enhancing one aspect of trust- worthiness can adversely affect another. Design and implementation errors in hardware or software compo- nents are not so easily tolerated by replication. The problem is that repli- cas of a single component define a population that lacks the necessary diversity. This is because attacks are now the stimuli that cause compo- nents to encounter errors and, since all replicas share design and imple- mentation errors, a single attack will affect all replicas. However, if dif- ferently designed and implemented components were used, the necessary diversity would be present in the population. This approach was first articulated in connection with computer programming by Elmendorf,1 who called it "fault-tolerant programming" (Elmendorf, 1972), and subse- quently it has been refined by researchers and employed in a variety of control applications, including railway and avionics (Voges, 1988~. How- ever, the approach is expensive each program is developed and tested independently N times and by separate development teams. More trou- bling than cost, though, are the experimental results that raise questions about whether separate development teams do indeed create populations with sufficient diversity when these teams start with the identical specifi- cations (Knight and Leveson, 1986~. See Ammann and Knight (1991) for an overall assessment of the practical issues concerning design diversity. There are circumstances, however, in which replication can amplify resilience to software design and implementation errors. Program execu- tion typically is determined not only by input data but also by other aspects of the system state. And, as a result of other system activity, the system state may differ from one execution of a given program to the next, causing different logic to be exercised in that program. Thus, an error that 1Dionysius Lardner in 1834 also pointed out the virtues of this approach to computing. See Voges (1988), page 4, for the Lardner quote: "The most certain and effectual check upon errors which arise in the process of computation is to cause the same computations to be made by separate and independent computers; and this check is rendered still more deci- sive if they make their computations by different methods."

OCR for page 154
TRUSTWORTHY SYSTEMS FROM UNTRUSTWORTHY COMPONENTS 157 causes one execution of the program to fail might not be triggered in a subsequent execution, even for the same input data. Experiences along these lines have been reported by programmers of Tandem systems in which system support for transactions makes it particularly easy to build software that reruns programs after apparent software failures (Gray and Renter, 1997~. Further supporting experiences are reported in Huang et al. (1995), who show that periodic server restarts decrease the likelihood of server crashes. Interestingly, it is this same phenomenon that gives rise to so-called Heisenbugs (Gray and Renter, 1997) transient failures that are difficult to reproduce because they are triggered by circumstances beyond the control of a tester. Particularly troubling are Heisenbugs that surface only after a tester adds instrumentation to facilitate debugging a system. Amplifying Security Diversity not only can amplify reliability, but it can also be used to amplify immunity to more coordinated and hostile forms of attack. For such attacks, simple replication of components provides no benefit. These attacks are not random or independent; after successfully attacking one replica, an attacker can be expected to target other replicas and repeat that attack. A vulnerability in one replica constitutes a vulnerability for all replicas, and a population of identical replicas will lack the necessary diversity to survive. But a more diverse population even though its members might each support the same functionality can provide a mea- sure of immunity from attacks. The diversity necessary for deflecting hostile attacks can be viewed in terms of protocols, interfaces, and their implementations. Any attack will necessarily involve accessing interfaces because attacks exploiting vul- nerabilities in standard protocols can be viewed as attacks against an interface. The attack will succeed owing to vulnerabilities associated with the semantics of those interfaces or because of flaws in the implementa- tion of those interfaces. Different components or systems that provide the same functionality might do so by supporting dissimilar interfaces, by supporting similar interfaces having different implementations, or by sup- porting similar interfaces having similar implementations. With greater similarity comes increased likelihood of common vulnerabilities. For ex- ample, in UNIX implementations from different vendors, there will be some identical interfaces (because that is what defines UNIX) with identi- cal implementations, some identical interfaces in which the implementa- tions differ, and some internal interfaces that are entirely dissimilar. A Windows NT implementation is less similar to a UNIX system than an- other UNIX system would be. Thus, a successful attack against one UNIX implementation is more likely to succeed against the other UNIX imple

OCR for page 154
158 TRUST IN CYBERSPACE mentations than against Windows NT. Unfortunately, realities of the marketplace and the added complexities when diverse components are used in building a system reduce the practicality of aggressively employ- ing diversity in designing systems. Findings 1. Replication and diversity can be employed to build systems that amplify the trustworthiness of their components. Research is needed to understand the limits and potential of this approach. How can diversity be added to a collection of replicas? How can responses from a diverse set of replicas be combined so that responses from corrupted components are ignored? 2. Research is also needed to understand how to measure similarities between distinct implementations of the same functionality and to deter- mine the extent to which distinct implementations share vulnerabilities. MONITOR, DETECT, RESPOND Monitoring and detection constitute a second higher-level design ap- proach that can play a role in implementing trustworthiness: attacks or failures are allowed to occur, but they are detected and a suitable and timely response is initiated. This approach has been applied both with respect to security and to fault tolerance. Its use for fault tolerance is broadly accepted, but its role in providing security is somewhat contro- versial. Physical plant security typically is enforced by using such a com- bined approach locks keep intruders out, and alarms, video surveil- lance cameras, and the threat of police response not only serve as deter- rents but also enable the effects of an intrusion to be redressed. This combined approach is especially attractive when shortcomings in preven- tion technology are suspected. For example, in addition to antiforgery credit card technology and authorization codes for each transaction, credit card companies monitor and compare each transaction with profiles of past cardholder activity. A combined approach may be even more cost- effective than solely deploying prevention technology of sufficient strength. Limitations in Detection Whatever the benefits, the monitor-detect-respond approach is lim- ited by the available detection technology response is not possible with- out detection. For example, when this approach is used for security, the

OCR for page 154
TRUSTWORTHY SYSTEMS FROM UNTRUSTWORTHY COMPONENTS 159 detection subsystem must recognize attacks (and report them) or must recognize acceptable behavior (and report exceptions) (Lunt, 1993~. To recognize attacks, the detection subsystem must be imbued with some characterization of those attacks. This characterization might be pro- grammed explicitly (perhaps as a set of pattern-matching rules for some aspect of system behavior) or derived by the detection subsystem itself from observing attacks. Notice that whatever means is employed, new attacks might go unrecognized. Systems that recognize acceptable behav- ior employ in effect some model for that behavior. Again, whether the model is programmed explicitly or generated by observing past accept- able behavior, the detection subsystem can be fooled by new behavior- for example, the worker who stays uncharacteristically late to meet a deadline. With only approximate models to drive the detection subsystem, some attacks might not be detected and some false alerts might occur. Undetected attacks are successful attacks. And with false alerts, one de- tection problem is simply transformed into another one, with false alerts being conveyed to human operators for analysis. An operator constantly dealing with false alerts will become less attentive and less likely to notice a bona fide attack. Attackers might even try to exploit human frailty by causing false alerts so that subsequent real attacks are less likely to attract notice. Any detection subsystem must gather information about the system it is monitoring. Deploying the necessary instrumentation for this surveil- lance may require modifications to existing systems components. That, however, could be difficult with commercial off-the-shelf components, since their internals are rarely available for view or modification. It also may become increasingly difficult if there is greater use of encryption for preserving confidentiality of communications, since that restricts the places in the system where monitoring can be performed. Data must be collected at the right level, too. Logs of low-level events might be difficult to parse; keeping only logs of events at higher levels of abstraction might enable an attack to be conducted below the level of the surveillance. A final difficulty with using the monitor-detect-respond approach to aug- ment prevention mechanisms is its implicit reliance on prevention tech- nology. The surveillance and detection mechanisms must be protected from attack and subversion. Response and Reconfiguration For the monitor-detect-respond paradigm to work, a suitable response must be available to follow up the detection of a failure or attack. When it is failures that are being detected, system reconfiguration to

OCR for page 154
160 TRUST IN CYBERSPACE isolate the faulty components seems like a reasonable response. For sys- tems whose components are physically close, solutions for this system- management problem are understood reasonably well. But for systems spanning a wide-area network, like a typical networked information sys- tem (NIS), considerably less is known. The problem is that communica- tion delays now can be significant, giving rise to open questions about trade-offs involving the granularity and flexibility of the system-manage- ment functions that must be added to implement reconfigurations. And there is also the question of how to integrate partitions once they can be reconnected. When hostile attacks are being detected, further concerns come into play. Isolating selected subsystems might be the sensible response, but knowing how and when to do so requires additional research into how to design an NIS that can continue functioning, perhaps in a degraded mode, once partitioned. Having security functionality be degraded in response to an attack is unwise though, since the resulting system could then admit a two-phase attack. The first phase causes the system to reconfigure and become more vulnerable to attack; the second phase of the attack exploits one of those new vulnerabilities. Finally, system reconfiguration mecha- nisms also must be protected from attacks that could compromise system availability. Triggering the reconfiguration mechanism, for example, could be the basis for a denial-of-service attack. Perfection and Pragmatism The monitor-detect-respond paradigm is theoretically limited by, among other things, the capabilities of the detection subsystem that it employs. This is more of a problem for attack monitoring than for failure monitoring. Specifically, a failure detector for a given system is unlikely to grow less effective over time, whereas an attack detector will grow less effective because new attacks are constantly being devised. Other com- mon defensive measures, such as virus scanners and firewalls, are simi- larly flawed in theory but useful nevertheless. There is nothing wrong with deploying theoretically limited solu- tions. What is known as "defense in depth" in the security community argues for using a collection of mechanisms so that the burden of perfec- tion is placed on no single mechanism. One mechanism covers the flaws of another. Implicit in defense in depth, however, is a presumption about coverage. An attack that penetrates one mechanism had better not pen- etrate all of the others. Unfortunately, this coverage presumption is one that is not easily discharged attack detectors are never accompanied by useful characterizations of their coverage, partly because no good charac- terizations exist for the space of attacks. Analogous to the error bars and

OCR for page 154
TRUSTWORTHY SYSTEMS FROM UNTRUSTWORTHY COMPONENTS 161 safety factors that structural engineers employ, security engineers need ways to understand the limitations of their materials. What is needed can be seen as another place where the research into a "theory of insecurity" (advocated in Chapter 4) would have value, by providing a method by which vulnerabilities could be identified and their system-wide implica- tions understood. Findings 1. Monitoring and detection can be employed to build systems that amplify the trustworthiness of their components. But research is needed to understand the limits and potential of this approach. 2. Limitations in system monitoring technology and in technology to recognize events, like attacks and failures, impose fundamental limits on the use of monitoring and detection for implementing trustworthiness. For example, the limits and coverage of the various approaches to in- truder and anomaly detection are not well understood. PLACEMENT OF TRUSTWORTHINESS FUNCTIONALITY In traditional uniprocessor computing systems, functionality for en- forcing security policies and tolerating failures is often handled by the kernel, a small module at the lowest level of the system software. That architecture was attractive for three reasons: . Correct operation of the kernel hence, security and fault-toler- ance functionality for the entire system depended on no other software and, therefore, could not be compromised by flaws in other system soft ware. Keeping the kernel small facilitated understanding it and gaining assurance in the entire system's security and fault-tolerance functionality. By segregating security and fault-tolerance functionality, both of which are subtle to design and implement, fewer programmers with those skills were required, and all programmers could leverage the efforts of the few. Whether such an architecture is suitable for building an NIS seems less clear. For such a system to be scalable and to tolerate the failure of any single component, the "kernel" would have to span some of the network infrastructure and perhaps multiple processors. And, because NIS com- ponents are likely to be distributed geographically, ensuring unimpeded access to a "kernel" might force it, too, to be geographically distributed. A "kernel" that must span multiple, geographically distributed proces

OCR for page 154
162 TRUST IN CYBERSPACE sors is not likely to be small or easily understood, making alternative architectures seem more attractive. For example, an argument might be made for placing security and fault-tolerance functionality at the perim- eter of the system, so that processors minimize their dependence on net- work infrastructure and other parts of the system. An effort was made, associated with the Trusted Network Interpreta- tion (the so-called Red Book) of the Trusted Computer System Evaluation Criteria (TCSEC), to extend the "kernel" concept, for the security context, from a single computer to an entire network (U.S. DOD, 1987~. According to the Red Book, there was a piece of the "kernel" in each processing component, and communication between components was assumed to be secure. This approach was found to be infeasible for large networks or even relatively small nonhomogeneous ones. Too few NISs have been built, and even fewer have been carefully analyzed, for any sort of consensus to have emerged about what architec- tures are best or even about what aspects of an NIS and its environment are important in selecting an architecture. The two extant NISs discussed in Chapter 2 the public telephone network (PTN) and the Internet give some feel for viable architectures and their consequences. A proposed third system under discussion within government circles, the so-called minimum essential information infrastructure (MEII), gives insight into difficulties and characteristics associated with specifying a sort of "ker- nel" for an NIS. Therefore, the remainder of this section reviews these three systems and architectures. While only a start, this exercise suggests that further research in the area could lead to insights that would be helpful to NIS designers. Public Telephone Network The PTN is structured around a relatively small number of highly reliable components. A single modern telephone switch can handle all of the traffic for a town with tens of thousands of residents; long-distance traffic for the entire country is routed through only a few hundred switches. All of these switches are designed to be highly available, with downtime measured in small numbers of minutes per year. Control of the PTN is handled by a few centrally managed computers. The end systems (telephones) do not participate in PTN management and are not expected to have processing capacity. The use of only a small number of components allows telephone companies to leverage their scarce human resources. PTN technicians are needed to operate, monitor, maintain, test, and upgrade the software in only a relatively small number of machines. Having centralized control simplifies network-wide load management, since the state of the system

OCR for page 154
TRUSTWORTHY SYSTEMS FROM UNTRUSTWORTHY COMPONENTS 163 is both accessible and easily changed. But the lack of diversity and cen- tralization does little to prevent widespread outages. First, shared vul- nerabilities and common-mode failures are more than a possibility; they have already occurred. Second, after propagating only a short distance (i.e., through a relatively small number of components), a failure or attack can affect a significant portion of the system. As discussed in Chapter 2, the PTN maintains state for each call being handled. This, in turn, facilitates resource reservations per call that en- able quality of service guarantees per call a connection, once established, receives 56 kbps (kilobits per second) of dedicated bandwidth. But, estab- lishing a connection in the PTN is not guaranteed. If a telephone switch does not have sufficient bandwidth available, then it will decline to process a call. Consequently, existing connections are in no way affected by in- creases in offered load.2 Internet The Internet, by and large, exemplifies a more distributed architec- ture than the PTN. It is built from thousands of routers that are run by many different organizations and (as a class) are somewhat less reliable than telephone switches. Control in the Internet is decentralized, and delivery of packets is not guaranteed. Routers communicate with each other to determine the current network topology and automatically route packets, or discard them for lack of resources. The end systems (i.e., hosts) are responsible for transforming the Internet's "best effort" service into something stronger, and hosts are assumed to have processing capac- ity for this purpose. The reliability of the Internet comes from the relatively high degree of redundancy and absence of centralized control. To be sure, any given end system on the Internet experiences lower availability than, for instance, a typical telephone. However, the network as a whole will remain up despite outages. No single make of computer or operating system is run everywhere in the Internet, though many share a common pedigree. Di- versity of hardware and software protects the Internet from some com- mon-mode design and implementation failures and contributes to the reliability of the whole. But the Internet's routing infrastructure is built using predominantly Cisco routers, with Bay and a few other companies .sunolvin~ the rest. In that regard. the In tern et its like the PTN, relying --r r -A ---I _ (~ ___ __, 2If the call is declined by a switch, then the call may be routed via other switches or it may be declined altogether by returning a busy signal to the call initiation

OCR for page 154
164 TRUST IN CYBERSPACE largely on switches from Lucent, with Nortel, Siemens, and a few others supplying the rest. With protocol implementations installed in the tens of millions of end systems, it is relatively difficult to install changes to the Internet's proto- cols. This, then, is one of the disadvantages of an architecture that de- pends on end-system processing. Even installing a change in the Internet's routers is difficult because of the large number of organizations involved. As discussed in Chapter 2, the Internet's routers, by design, do not maintain state for connections indeed, connections are known only to the end systems. Different packets between a pair of end systems can travel different routes, and that provides a simple and natural way to tolerate link and router outages. The statelessness of the Internet's rout- ers means that router memory capacity does not limit the number of end systems nor the number of concurrently open connections. However, there is a disadvantage to this statelessness: routers are unable to offer hosts true service guarantees, and the service furnished to a host can be affected by increases in load caused by other hosts. In addition to supporting end-system scaling, the statelessness of the Internet helps avoid a problem often associated with distributed architec- tures: preserving constraints that link the states of different system com- ponents. Preservation of constraints, especially when outages of compo- nents must be tolerated, can require complex coordination protocols. Note that consistency constraints do link the routing tables in each of the Internet's routers. But these are relatively weak consistency constraints and are, therefore, easy to maintain. Even so, the Internet experiences routing-state maintenance problems, known as "routing flaps." (Routing response is dampened to help deal with this problem, at the level of the Border Gateway Protocol.) State per connection would be much harder to maintain because of the sheer numbers and the short-lived nature of the connections. Minimum Essential Information Infrastructure A minimum essential information infrastructure (MEII) is a highly trustworthy communications subsystem a network whose services are immune to failures and attacks. The notion of an MEII was originally proposed in connection with providing support for NISs that control criti- cal infrastructures.3 The MEII essentially was to be a "kernel" for many, if not all, NISs. 3According to Anderson et aL <1998y, the term ''MEII,, is credited to Rich Mesic, a RAND researcher who was involved in a series of information-warfare exercises run by RAND starting in 1995.

OCR for page 154
TRUSTWORTHY SYSTEMS FROM UNTRUSTWORTHY COMPONENTS 165 The study committee believes that implementing a single MEII for the nation would be misguided and infeasible. An independent study con- ducted by RAND (Anderson et al., 1998) also arrives at this conclusion. One problem is the incompatibilities that inevitably would be introduced as nonhardened parts of NISs are upgraded to exploit new technologies. NISs constantly evolve to exploit new technology, and an MEII that did not evolve in concert would rapidly become useless. A second problem with a single national MEII is that "minimum" and "essential" depend on context and application (see Box 5.1), so one size cannot fit all. For example, water and power are essential services. Los- ing either in a city for a day is troublesome, but losing it for a week is unacceptable, as is having either out for even a day for an entire state. A hospital has different minimum information needs for normal operation (e.g., patient health records, billing and insurance records) than it does during a civil disaster. Finally, the trustworthiness dimensions that should be preserved by an MEII depend on the customer: local law enforcement agents may not require secrecy in communications when handling a civil disaster but would in day-to-day crime fighting. Despite the impracticality of having a single national MEII, providing all of the trustworthiness functionality for an NIS through a "kernel" could be a plausible design option. Here are likely requirements: The "kernel" should degrade gracefully, shedding less essential functions if necessary to preserve more essential functions. For example, low-speed communications channels might remain available after high- speed ones are gone; recent copies of data might in some cases be used in place of the most current data.4 The "kernel" should, to the extent possible, be able to function even if all elements of the infrastructure are not functioning. An example is the PTN, whose essential components have backup battery power en- abling them to continue operating for a few hours after a power failure and without telephone company emergency generators (which might not be functioning). The "kernel" must be designed with restart and recovery in mind. It should be possible to restore the operation, starting from nothing, if necessary. Note that neither the PTN nor the Internet exhibits all three of these characteristics, although the PTN probably comes closer than the Inter 4Applications that depend on a gracefully degrading MEII must themselves be able to function in the full spectrum of resource availability that such an MEII might provide.

OCR for page 154
166 TRUST IN CYBERSPACE net.5 The development of a "kernel" exhibiting all three of the character- istics might well require new research, and an attempt to build such a "kernel" could reveal technical problems that are not, on the surface, apparent. Implementing an NIS using such a "kernel" could also be a 5There is some question as to whether the PIN can be disconnected and then restarted from scratch

OCR for page 154
TRUSTWORTHY SYSTEMS FROM UNTRUSTWORTHY COMPONENTS 167 useful research exercise, since it might reveal other important characteris- tics the "kernel" should possess. An alternative vision of the specification for a trustworthy "kernel" is as a computer network hardware, communications lines, and software- that has a broad spectrum of operating modes. At one end of the spec- trum, resource utilization is optimized; at the other end entered in re- sponse to an attack toutings are employed that may be suboptimal but more trustworthy because they use diverse and replicated toutings. In the more conservative mode, packets might be duplicated or fragmented6 by using technology that is effective for communicating information even when a significant fraction of the network has been compromised.7 Notice that for such a multimode MEII implementation to be viable, it must possess some degree of diversity. Thus, there might well be a point after which hardening by using trustworthy components should defer to design goals driven by diversity. Second, detecting the occurrence of an attack is a prerequisite to making an operating-mode change that consti- tutes a defense in this MEII vision. Tools for monitoring the global status of the network thus become important, especially since a coordinated attack might be recognized only by observing activity in a significant fraction of the network. A third plausible architecture for supporting trustworthiness func- tionality is to use some sort of a service broker that would monitor the status of the communications infrastructure. This service broker would sense problems and provide information to restore service dynamically, interconnecting islands of unaffected parts of the communications infra- structure. For example, it might be used in commandeering for priority uses some unaffected parts that normally operate as private intranets. Findings 1. Attempting to build a single MEII for the nation would be mis- guided and a waste of resources because of the differing requirements of NISs. 2. Little is known about the advantages and disadvantages of differ- ent NIS system architectures and about where best to allocate in a system the responsibility for trustworthiness functionality. A careful analysis of 6see, for example, Rabin `1989~. 7Note that this multimode scheme implements resistance to attacks by using techniques traditionally used for supporting fault tolerance, something that seems especially attractive because a single mechanism is then being used to satisfy multiple requirements for trust- worthiness. On the other hand, single mechanisms do present a common failure mode risk.

OCR for page 154
168 TRUST IN CYBERSPACE existing systems would be one way to learn about the trustworthiness consequences of different architectures. 3. The design of systems that exhibit graceful degradation has great potential, but little is known about supporting or exploiting such systems. NONTRADITIONAL PARADIGMS Other less architecturally oriented design approaches have been in- vestigated for amplifying trustworthiness properties, most notably am- plifying fault tolerance. These approaches are more algorithmic in flavor. Further research is recommended to develop the approaches and to better understand the extent and domain of their applicability. ~eit-stab~zat~on, for example, has been used to Implement system services that recover from transient failures (Schneider, 1993~. Informally, a self-stabilizing algorithm is one that is guaranteed to return to some predefined set of acceptable states after it has been perturbed and to do so without appealing to detectors or centralized controllers of any sort. For example, some communications protocols depend on the existence of a token that is passed among participants and empowers its holder to take certain actions (e.g., send a message). A self-stabilizing token manage- ment protocol would always return the system to the state in which there is a single token, even after a transient failure causes loss or duplication of the token. More generally, the design of network management and rout- ing protocols could clearly benefit from a better understanding of control algorithms having similar convergent properties. The goal should be control schemes that are robust by virtue of the algorithm being used rather than the robustness of individual components. It may also be possible to develop a science base for algorithms that amplify resilience or other dimensions of trustworthiness by relying on group behavior. Metaphors and observations about the nature of our natural world flocking birds, immunological systems,8 and crystalline structures in physics might provide ideas for methods to manage net- works of computers and the information they contain. The design ap- proaches outlined above population diversity and monitor-detect-re- spond have clear analogies with biological concepts. Studying the organization of free markets and game theory for algorithmic content might be another source of ideas. Of course, there are significant differ- ences between an NIS and the natural world; these differences might restrict the applicability of natural group behavior algorithms to NISs. ~ 1 ~ ~ 1 1 - ~ ~ 1 1 1 1 ~ - 8With regard to the immunology metaphor, sophisticated attacks are like biological weapons, which have always proven effective in overcoming natural immunity.

OCR for page 154
TRUSTWORTHY SYSTEMS FROM UNTRUSTWORTHY COMPONENTS 169 For example, the actions and behaviors of natural systems arise not from deterministic programming but from complex, sometimes random, inter- actions of the individual elements. Instead of exhibiting the desirable robust behaviors, collections of programmed computers might instead become synchronized or converge in unintended ways. Clearly, research is needed to establish what ideas can apply to an NIS and to understand how they can be leveraged. See Anderson et al. (1998) for a discussion of how biological metaphors might be applied to the design of an MEII. Finding A variety of research directions involving new types of algo- rithms self-stabilization, emergent behavior, biological metaphors- have the potential to be useful in defining systems that are trustworthy. Their strengths and weaknesses are not well understood, and further research is called for. REFERENCES Alsberg, P.A., and J.D. Day. 1976. "A Principle for Resilient Sharing of Distributed Re- sources," pp. 627-644 in Proceedings of the 2nd International Conference on Software Engi- neering. Los Alamitos, CA: IEEE Computer Society Press. Ammann, P.E., and J.C. Knight. 1991. "Design Fault Tolerance," Reliability Engineering and System Safety, 32~1~:2549. Anderson, Robert H., Phillip M. Feldman, Scott Gerwehr, Brian Houghton, Richard Mesic, John D. Finder, and Jeff Rothenberg. 1998. A "Minimum Essential Information Infra- structure" for U.S. Defense Systems: Meaningful? Feasible? Useful? Santa Monica, CA: RAND National Defense Research Institute, in press. DeSantis, A., Y. Desmedt, Y. Frankel, and M. Yung. 1994. "How to Share a Function Se- curely," pp. 522-533 in Proceedings of the 26th ACM Symposium on the Theory of Comput- ing. New York: ACM Press. Elmendorf, W.R. 1972. "Fault-Tolerant Programming," pp. 79-83 in Proceedings of the 2nd International Symposium on Fault-tolerant Computing (FTCS-2). Los Alamitos, CA: IEEE Computer Society Press. Gray, James, and Andreas Renter. 1997. Transaction Processing: Concepts and Techniques. San Mateo, CA: Morgan Kaufmann Publishers. Huang, Yennun, Chandra Kintala, Nick Kolettis, and N. Dudley Fulton. 1995. "Software Rejuvenation: Analysis, Module, and Applications," pp. 381-390 in Proceedings of the 25th Symposium on Fault-tolerant Computing. Los Alamitos, CA: IEEE Computer Soci- ety Press. Knight, J.C., and Nancy G. Leveson. 1986. "An Experimental Evaluation of the Assump- tion of Independence in Multi-version Programming," IEEE Transactions on Software Engineering, 12~1~: 96-109. Lunt, Teresa F. 1993. "A Survey of Intrusion Detection Techniques," Computers and Secu- rity, 12~4~:405-418. Menenzes, Alfred J., Paul C. Van Oorschot, and Scott A. Vanstone. 1996. Handbook of Applied Cryptography. CRC Press Series on Discrete Mathematics and Its Applications. Boca Raton, FL: CRC Press, October.

OCR for page 154
170 TRUST IN CYBERSPACE Rabin, M.O. 1989. "Dispersal of Information for Security, Load Balancing, and Fault Toler- ance," Communications of the ACM, 36~2~:335-348. Available online at . Randell, B., and J. Dobson. 1986. "Reliability and Security Issues in Distributed Computing Systems," pp. 113-118 in Proceedings of the Fifth Symposium on Reliability in Distributed Software and Database Systems. Los Alamitos, CA: IEEE Computer Society Press. Schneider, Fred B. 1990. "Implementing Fault-tolerant Services Using the State Machine Approach: A Tutorial," ACM Computing Surveys, 22~4~:299-319. Schneider, Marco. 1993. "Self-stabilization," ACM Computing Surveys, 25~1~: 45-67. U.S. Department of Defense (DOD). 1987. Trusted Network Interpretation of the Trusted Computer System Evaluation Criteria, NCSC-TG-005, Library Number S228,526, Version 1, the "Red Book." Ft. Meade, MD: National Computer Security Center. Voges, Udo. 1988. Software Diversity in Computerized Control Systems. Vol. 2 in the series Dependable Computing and Fault Tolerance Systems. Vienna, Austria: Springer-Verlag.