Cover Image

PAPERBACK
$119.00



View/Hide Left Panel
Click for next page ( 212


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 211
ROBUSTNESS AND TRANSPARENCY IN INTELLIGENT SYSTEMS Randall Davis INTRODUCTION Developing and building a space station will confront problems of significant complexity in ale extraordinarily demanding environment. The station's size and complexity will make necessary the extensive use of automation for monitoring and control of critical subsystems, such as like support. The station complexity, along with the novelty of spare as an environment, means that all contingencies cannot be anticipated. Yet the hostility of the environment means the consequences of failure can be substantial. In such situations, robustness and transparency become essential properties of the systems we develop. A system is robust to the de-tree that it has the ability to deal with unanticipated events. ~ ~ ~ 1 ~ ~ ~ 1 ~ ~ _ A system is ransparcul co one Degree Anal Ins operation can ne mare comprehensible to an observer. This paper is concerned with these two properties--robustness and transparency--from a number of perspectives. We claim that they are crucial to the space station undertaking (and indeed to any situation with similar levels of complexity and similar consequences of failure). We argue that they are fundamental properties of models and system designs based on those models. As a result, robustness and transparency cannot easily be grafted on afterward; they must be considered at the outset and designed in. We explore how this might happen, i.e., how these two properties translate into constraints on system design and describe a number of research efforts that may lead to better understanding of how such design might be accomplished. It is useful at this point to establish some simple vocabulary. By "system" or "device" we mean the hardware whose behavior we wish to understand and control. the power distribution system, for example, would include all the cables, batteries, fuel cells, solar arrays, switches, etc., that supply power to the station. By "model" we mean a description of that hardware that will allow us to analyze, interpret diagnose, and guide its behavior. The model may be implicit in a pr y designed to monitor the hardware or it may exist in the mind of the human doing the same lob. ~- - wnen expresses explicitly, it is typically written in terms of schematics, performance curves, eng~n==rinq drawings, etc. The model also may be implicit in a program 211

OCR for page 211
212 designed to monitor the hardware or it may exist In the meal of the human doing the sane job. on any case it provides the basic framework use to ur~ers~rx] the device. file we speak broadly of systems and Gels, our concern here is for the most part with systems of physical device= ~ - ~ engineering models of them; much of what we say is likely to carry over to software as well. Models of human behavior and social systems are largely beyond what we attempt to do here. Unanticipated Events: Motivation and the assoc~ar=~ Because ~ Ah of ~t we discuss is motivated by the difficulties of dealing with unanticipated events, it is worth taking a Foment to consider what they are and why they are important. By unanticipated events we mean any occurrence requiring a response that has not been previously planned for, analyzed, and the appropriate response determined. One compelling example might occur if the life support system monitors present a collection of readings that indicate a malfunction but do not match any known pattern of misbehavior. The readings need to be analyzed and an appropriate response initiated, yet this cannot be done "by the book;" it requires that we reason through what could have happened to produce such readings. The importance of such events arises from. their inevitability, due to both the complexity of the space station and the novelty of the environment. Unanticipated events and Interactions are a fact of life for complex, large scale systems because the number of different kinds of things that can go wrong is so vast, and our ability to do exhaustive formal analyses of fault events has rather modest limits. Space is a sufficiently novel environment that we have no comprehensive catalog of standard fault models that can be checked ahead of time. Unanticipated Events: Example an interesting sequence During S ~ -2, the second space shuttle mission, ~ ~ _ , of events lead at one point to the recognition that a fuel cell was failing and later to the realization that in its degraded state it could conceivably explode. m is sequence of events helps to illustrate both the ~nevit~h~ity of unanticipated events and the kinds of knowledge and reasoning noosed to dead with them. Some brief background will help make the events comprehensible. The basic function of the 3 fuel cells (Figure 1) is to produce electricity by combining hymen and oxygen ~ a carefully controlled reaction using potassium hydroxide as a ca - 1yst. me combustion product is water, revved fax.. the ~~l by the water removal system (Figure 21: damp hydrogen enters the condenser at the right, pulled abort by the flow produced by the motor and pump at left. The motor is also turns a separator that pushes corxiens~ water Armless tears ache walls of the Shaver mere they accumulate due to surface tension (recall this

OCR for page 211
213 PU RGE . WATER L SEPARATOR WATER SEPARATOR FUEL ~ WATER ~ ~ ~ of;-< ~ CELL ~02 H2 ~02 H2 ~02 H2 POTABLE POWER WATER FIGURE: 1 me fuel cell and water separation system. is a Og ernriromnent). me now drier hydrogen returns to die fuel all, Bile the anus of water cont~nll~lly being formed at the separator is pined up and guided to Be water storage area. A meter at the outlet Uniters water pH, Choking for contamination (e.g., patassi~ hydride freak` the fuel cells, since the water is ~nterxied for consumption. In very math abbreviated form, the sequence of events leading to early mission termirmtion of S=-2 preceded as follows (Ei~hoefer, 1985): Crunch: p~=un~: During pre-launch activities, the fuel cell pH meters register high. Interpretation: Familiar, unexplained anamaly. At various times oxygen and hydrogen flow meters read high; at one point oxygen flow goes off-scale. Interpretation: Sensors malfunctioning.

OCR for page 211
214 PURGE ~ ~ POWER T~ANNULUS OF H2O ~ ~ ~ PITOT ABE FROM POWER SECTION in_ _ CONDENSER ASPIRATOR . . ~ _ CALVE ~ PH METER WA ER TO STORAGE FIGURE 2 Details of the water separation up-. Source: Gerald Eic~hhoef~r (July 1~85~. + 3:00 heel cell 1 (FC1) begins ~ dined load; the other ~ assume more load. Interpretation: ~ =7) by ~ failing. Controllers consider purgir~ FC1. Degraded performance suggests possible flooding; AH high also suggests flooding; Ungirt will rove water. purging EC1 reject~--purged KOH might solidify, blocking purge line that is con to all 3 cells. + 3:25 Crew ask ~ to test pH manually. If sensor is correct, potable water may be getting contaminated by KOH. + 4:25 Crew too busy with other duties to perform test. + 4:40 FC1 off loads significantly Interpretation: Clear failure. + 4:51 FC1 isolated from remainder of electrical system and shut down.

OCR for page 211
215 + 5:48 Mission evaluation room recognizes new failure mode for the ~~l in the current situation. Once it is shut down pressure slowly drops, but can drop at different rates on each side. If pressure differential becomes large enough, gas bubbles from one side can cross to the otter, possibly combining explosively. + 7:52 FC1 restarted with reactant valves closed; reactants cons arm voltage In cell drops to 0. Post-m~ssion analysis of the fuel cell and water separator revered that the FlI meter had been working correctly ark that a small particle blo ~ the nozzle in the water separator of cell 1, preventing water removal to the storage area. The water backed up first in the separator and later in the cell, flooding the cell (hence the high pH), leading to performance degradation, consequent load shedding, and eventual failure. Treasons From m e Example m is example is useful for.a number of reasons. It illustrates, first, robustness and transparency in the face of unanticipated events. The reasoning was robust in the sense that the blockage had not previously been anticipated, yet engineers were able to reason through how the device worked, and were able to recognize and predict a novel sequence of potentially serious consequences. The reasoning was transparent in the sense that the story above is comprehensible. Even given the very small amount of information in Figures 1 and 2 and the short description above, the description of the events ''makes sense." Second,-it suggests the difficulty of a prior identification and analysis of all failure modes and all the ways those failures may combine. Even with all the ~~refu1 design, testing, and previously experience with fuel cell technology, a new mode of cell failure was encountered. Third, it illustrates the kin] of knowledge and reasoning that was required to understand, diagnose, and repair the problem. The knowledge involved information about structure (interconnection of parbs) an] behavior (the function of a component labeled "motor" or "pump"), supplied by the diagrams in Figures 1 and 2. Knowledge of basic chemistry and physics was also involved, used to understand the behavior potassium hydroxide in solution and the notion of surface tension. importantly, the reasoning relies on causal models, descriptions of devices and processes that capture our ordinary notion of what it means for one event to cause another (e.g., the motor causes the pump to turn which causes the hydrogen and water to move through the condenser, etc.~. The reasoning involved was of several varieties., The fourth event above, for instance, illustrates reasoning about behavior to predict consequences: if the cell is flooded, potassium hydroxide can get in the water, meaning it can get to the water separator and then into the

OCR for page 211
216 water storage. Another form of reasoning involved working from observed symptoms to diagnoses and then to repair actions: If FC1 is shedding load, it's an indication of degraded performance, which suggests flooding. Flooding in turn suggests purging as a repair. Simple knowledge of connectivity and chemistry ruled out that action in the event above at + 3:00: it Bright have blocked the In purge line. Finally, it offers a simple way summarizing arch of what this paper is abaft: while all of the reasoning above was done by people using then' ~ dels of the device; in question, we suggest giving computers exactly the same sort of knowledge and reasoning abilities. They could, as a result, perform as far more effective assistants. We believe this can be done by supplying them with something like the diagrams of Figures 1 and 2, with knowledge about structure, behavior, an understanding of cavity chemistry, physics, electronics, an] more. In short, we need to give them the same understanding of Show things works that we use in everyday engineering reasoning. The aspiration, of course, is easy, execution is considerably more diffi~; this is cleanly no small undertaking. In the remainder of this paper, we examine some of the research issues that arise On attempting to make this happen. How can we provide descriptions enable by a machine that are equally as rich as those in Figures 1 and 2? Consider, for example, how much knowledge is captured by the simple labels motor, pump, and condenser. How can we provide the kinds of reasoning abilities displayed above? How can we provide the ability to j~l~;ciously select the correct madel for a given problem? Consider how cur view shifted from one grounded in physics, to one oriented towards chemistry, to one grounds] in electronics, as the need arose. How can we provide the ability to simplify a complex model, -affecting cut just the "relevant" details? Consider what a drastic, yet useful, simplification Figures 1 and 2 are of the actual devices. (Consider too what a ~ sleading statement it was, above, to say "Even given the very small amount of information in Figures 1 and 2 ..., the description of the events maims sense. " It male sense precisely because the right level of detail was chosen. Has might we get a Tnadhine to do that?) For that Batter, how do hymen engineers do all these things?

OCR for page 211
217 Unanticipated Events As A Focus Unanticipated events like the blockage of the water separator are an appropriate focus for this paper because this symposium aims to identify research issues for future attention rather than incremental improvement to current practice. Some useful techniques already exist for simulation, fault insertion, and creation of error recc very procedures for foreseeable events. Additional work is in progress on techniques for error avoidance and In design mg systems that are error tolerant. There is also a well-established approach to producing robustness through man-machine combinations: divide the work so that the more routine tasks fall to the machine and rely on the human for rescurcefu~ responses to atypical events. All of these are appropriate, important, and will continue to contribute to system design. But new rich issues arise in part by asking what relevant things we don't know how to do very well, or at all. From that perspective, unanticipated events present a set of interesting and important challenges, providing an appropriate focus for this paper. They also lead to increased concern about transparency. Other rationales already exist for transparency, including giving users an understanding of the system's reasoning so they know when to rely on the conclusions, and the importance of keeping the system accessible to human comprehension and possible intervention. Dealing with unanticipated events abbe additional motivation, most visible in the question of system Override: to determine whether a system's response is based on inappropriate a.caumptions (e.g., an inappropriate model), we need first to know what those resumptions are. Transparency helps make this possible. Agenda ~ r discussion now proceeds in three basic steps. First, to help make clear the difficulties involved in robustness, we explore briefly some non-solutions to the problem. Second, we identify two broad categories of attack that are likely to offer some leverage on the problem: developing models and reasoning methods powerful enough to handle unanticipated events, and developing techniques for coping with situations where only imperfect models are available. Finally, we describe a number of specific research topics that will help to develop the models, methods and techniques needed to produce rob~.ctness and transparency. SC ME NON-SOLUTIONS TO THE PROBLEM Before proposing a new attack on a problem, it's worth asking whether the problem can be tackled with known techniques. We consider three plausible approaches and explore why each of them fails to provide the degree of robustness we believe is necessary.

OCR for page 211
218 One traditional approach is the use of man-mach~ne combinations, relying on he human to horde non-routine situations. This is, of course, useful and can be quite effective over a Garde of problems. In the fuel cell problem of SlE;-2, for instance, routine monitoring was haled automatically, while exceptions were analyzed by human experts. It is also clear, however, that systems currently being designed and used are sufficiently complex that this will no longer be sufficient, unless we can inake our automated assistants smarter. Some nuclear power an] chemical processing plants, for instance, are complex enough that non-rout me events lead to massive overload on human Information handling abilities. So many alarms ware triggered during the Three Mite Island accident, for instance, that not only was it effectively impossible to interpret them, even detection became problematic as multiple alarms masked one another. Somewhat more immediately relevant, during shuttle mission STS-9 an alarm was triggered more than 250,000 over 3 days, due to an unanticipated thermal sensitivity in a Spacelab remote acquisition unit, along with an oversight in user software. It is likely that similar and perhaps higher levels of complexity will be involved in the space station. As a result, we need to do more than rely on the human half of the team to handle all exceptions. We need to upgrade the ability of our machines to interpret, diagnose, and respond to unanticipated events, enabling man-machine combinations to remain effective in the face of complex systems and novel environments. A second route of attack on the problem might appear to be the creation of more reliable software through improved software engineering, program verification, or automatic programming Unfortunately all of these solve a problem different from the one at hand here. The issue is illustrate] In the figure below: techniques for production of reliable software all assist in ensuring that a program matches its specifications. Unanticipated events, however, will by definition not show up in the specifications. The problem here is not so much one of debugging code, it is the creation and debugging of the model and specifications. Finally given its wide popularity, we ~ ght ask what expert system technolo ~ might be able to contribute to the difficulties we face. Here too the answer is that they have little to offer. The fundamental limitation in these systems arises from the character of the knowledge they use. Traditional expert systems gain theft power by collecting empirical associations, if-then rules that capture the inferences human experts have learned through experience. We refer to them as empirical Code \ I Program VeriflcatIon ; / Software Engineering Automatic Programming J Specifications World

OCR for page 211
219 associations to indicate the Tracker of the kna~riedge they capture--associations, typically between symptoms and diseases, gathered as a result of human experience. Instantly, those associations are typically heuristic rather than ~~; i.e., they capture what experts have observed to happen without n~-C-~rily being able to explain why it should be so. A medical diagnosis system, for excrete, ~ t have a rule of the fond "a college , , student carnpla~ of fatigue, fever, and sore throat is likely to have mononucleosis.' The n he offers useful guidance even if the experts cannot provide a detailed caned (i.e., physiological) explanation for why the conclusion follows. Indeed the power of the technology comes in part from the assistance it provides in accumulating large numbers of fragmentary rules of thumb for tasks for which no well-def~ned cat theory exists. One important consequence of this kind of knowledge, however, is a kind of brittleness. Current generation expert systems are idiots savant, providing impressive performance on narrowly def ted tasks and performing well when the problem is exactly suite] to the program's expertise. But performance can degrade quite sharply with even small variations in problem character. In general the difficulty arises from a lack of underlying theory: since the rules ir~ic~te only Cat conclusions follow and not shy, the program has no Cares of Bead ire with cases that "almost" match the rule, or cases that appear to be 'minor" exceptions. Bleed, they have no notion of what "almost" or Minor could mean. 'FIGURING IT OTT' Having reviewed some existing technology that does not appear capable of providing the degree of robustness needed, we turn now to considering what kinds of ideas and technologies would help solve the problem. fine basic thrust of cur argument is quite simple. As size and complexity of systems increase, we see a decrease in the opportunity to do an exhaustive a priori analysis and pre-specify appropriate responses. The space station will likely be complex enough to preclude such analysis; the novelty of the environment increases the chance of unanticipated challenges. To deal with such situations we need a new approach to building intelligent systems, one based on a simple premise: when you can't say in advance what will happen, the ability to "figure out" how to respond becomes much more important. Where knowledge-based systems, for instance, "kn ~ ' what to do because they have been given a large body of task-specific heuristics, we require intelligent systems capable of figuring out what to do. This ability should play a supporting role and is clearly not a replacement for existing approaches. Where we can anticipate and analyze of course we should, and where we can construct effective fault tolerant systems we should. But as system complexity grows and the number and seriousness of unanticipated events increases, we need the

OCR for page 211
220 flexibility ark breadth of robust problem solving systems to BEAU with Bern. The key question, of course, is he to construct systems with this property. In the remainder of this paper we suggest several ways of Cookie for answers to that question. edits AND ENGINEERING PROBLEM SOLOING Faced with an unanticipated event in a complex system, a powerful way to figure out what to do is by reasoning freon an urxiers~carK5ing of the system, a ~rKxie~ of "how it works." A behavioral meek, for instance, can be of considerable help In de=1 ing with complex software like an operatic system. Tn dealing with a complex physical device, a model of structure end function (schematics end bl~cdiagr=E;), alor~with an understating of causality can bee esser~tial in understarxli~, interpreting and drugging behavior=. How might we proceed, for e ~ nple, when faced, with a set of sensor readings from the fuel calls that indicate malfunction but do not match any known pattern of misbehavior? The most robust solution appears to be grounded in knowing how it works, i.e., creating and lacing models that capture structure, behavior, and causality at an appropriate level of detail. We need to know what the component pieces are, how they each work, how they are interconnected, and so forth. We argue that, in the most general terms, the creation, selection, and use of appropriate models is the most powerful approach to the problem4. It is in many ways the essence of eng peering problem solving. Since, as we discuss in more detail below, models are abstractions, the process of model creation and selection is essentially one of deciding which abstraction to apply. Faced with a complex system to be analyzed, an engineer can bring to bear a powerful collection of approximations and abstractions. As a relatively simple example in electrical eng~n^='ing, for instance, an engineer may decide to view a circuit as digital or analog, Unbar or non-~near. But even to approach the problem as one of circuit theory means we have made the more basic assumption that we can model the circuit as if signals propagated instantaneously, and hence ignore electrodynamic effects. Models and their underlying abstractions are thus ubiquitous in this kind of problem solving. We believe that an important source of power in the problem solving of a good engineer is the ability to create, select, use, and understand the limits of applicability of such models. Consequently, we believe that a powerful approach to building robust problem solving programs is to identify and capture the knowledge on which that modeling ability is based. Similarly, a powerful approach to building transient problem solving Emblems is to may that h,mrledge explicit in our programs. One general thrust of the r~h we suggest is thus broadly concerned] with advancing our understanding of model creation, selection, and use, and demonstrating that understanding bar creating progrmns capable of doing such things.

OCR for page 211
221 A second general thrust is made feasible by the fact that the Apace station is an engined artifact, a device inked to a~rpli~h a specific pur~se dose design is urger our ~ntrcl. As a rat, we can also ask, ha can we design in such a fashion ached dealing with unanticipated events is easier? That is, given the inevitability of encountering such events and the difficulty of reasoning about them in complex systems, how should we design so that the reasoning and analysis task becomes easier? We speculate, for instance, about what "design for comprehensibility" might mean. Other approaches we discuss that share the same basic mind set include understanding (and hence capturing in programs) "common sense" physical reasoning, and exploring the origins of robust problem solving in people, whose grateful degradation in performance is so markedly different from the behavior of auto mated systems. We refer to this set of approaches as '~king the best situation" because they have in common the assumption that it is in fact possible to model the system and approach the problem by asking how we can facilitate model creation and use. But bat abut the alternative? how can we get robust behavior In situations where no effective m~el yet exists, in situations where the orgy available m~els are incomplete or insufficiency detailed for the dark at hand? We bum that set of alternatives '~raking the best of the situation, " to suggest that, catkin a Eden to reason freon, we have to fall back on some less pawefful methods. In this we speculate very briefly about research in using multiple, overlapping but incomplete models. ~ODFTs AND EROGRPWS Since much of our discussion is focused on m gels- ~ ating thern, using them, and determining their limitations it is worth taking a moment to review briefly some of their funiEment~1 properties. Since we will for the most part be concerned with embodying those models in computer prc grams, it is similarly worth reviewing briefly the relation between models and programs, understanding the role the computer plays ~ all ~is. The Role of the Computer Let's start with the role of the computer. Given the size and complexity of the spans station, extensive use will have to be made of software to automate tasks like monitoring and control. Any such program inevitably embodies a model of the task at hand. Even a program as simple as one that monitors CO2 and displays a warning when the level exceeds a threshold has, implicit in it, a much simplified model of the sensing device, the environment (e.g., that CO2 is uniformly dispersed), what levels of CO2 are safe, etc. Since models and computer programs are often so closely intertwined, it

OCR for page 211
223 complex system and then develop an equally complex piece of software that attempts to mo m tor, interpret, and perhaps control it. Layers of complexity will only mate it more difficult to deal with novel situations. Perhaps the simplest demonstration of the futility of this approach comes In dealing with events that may be cutside the range of applicability of the prc gram. The more complex the underlying system, the more complex the program needed to interpret it, i.e., the more complex the mcdel of that system needs to be. And the more complex the model is, the more difficult it becomes to determ me whether it is based on assumptions that do not hold for the current situation, an] hence the current events are outside its range of applicability. Second, if robustness an] transparency are properties of models and systems, not properties of programs, it follows that they cannot be grafted on, they must be designed in. That is we need to understand how to design in such a fashion that the resulting systems have those properties, and how to create models that have those properties. One of the research strategies we suggest in this paper is to turn this question around, and ask how the desire for systems with these two properties can be translated into constraints on system design. That is, is it possible to design in such a way that the resulting systems are easy to model robustly and transparently. Robustness and Transparency On Models We have argued that robustness and transparency are properties of systems and models rather than of programs and that a primary route to resourceful systems is the creation of models with these properties. But that isn't easy. To see why not, we examine the kinds of things that commonly get m the way. Three common sources of failures of robustness are incompleteness, information overload, and incorrect level of detail. Models may be ~nccmplete because information that should have been included was critter. A particularly relevant e ~ le arose in the Solar Max repair during Mission 41-C. The initial attempt to attach to the satellite failed because additional, undocumented hardware had been added to the satellite near the attachment point, preventing the mating of the satellite an] the attachment device. The lesson here is the obvious one: you can't reliably figure out what to do if your picture of the device in question is incomplete. A second source of failure of robustne~s--~nformation overload--occNrs when information processing ability available is overwhelmed by the amount of data or the size of model. The data rate may be so high that it cannot be interpreted fast enough. The mated itself may be so large that it outstrips the process m g power available. The issue here is the same for man or machine: in either case the available processing power may be insufficient to use the model. Ihie lesson here is the need to ensure that the models we build are computahie with the power available.

OCR for page 211
224 Information overload is frequer~ly a result of the third In source of failure: selecting the wrong level of detail, ~ particular choosing Coo law a level. Attempting to malel ache behavior of a ctigit=1 circuit using quanta m ~ hanics ~ ght be an ins ~ eating ~ allenge, but d surely drown In detail. If, on the other hand, too high a level is chosen, the mcdel emits relevant phenomena. For example, some circuit designs that are correct when viewed at the digital level may in fact not work *ue to effects that are obvious only when viewed at the analog level. All of this leads us to a fundamental difficulty in designing and using models. RcbNstness depends in large measure on completeness of the Yet all models are abstractions, simplifications of the ~ tic A. . thing being modeled, so no model can ever be entirely complete. Nor in fact could we want it to be. Much of the power of a model arises f ~.~ its assumption that some things are " ~ rtant details," causing them to be amity ~ . There is power in this because it allows us to ignore some phenomena and concentrate on others; it is this license to omit some things that reduces the information processing requirements of using the model to within tolerable levels. But there is as a result a fundamental] tension betwe ~ oampletenesc (and attendant robustness) and complexity. If we make no simplifying assumptions we drown in detail; yet any simplifying assumption we make may turn out to be incorrect, rendering err model ~noomplete in scone important way. This in turn raises interesting questions, further explored belay, including ha' we select an appropriate model, i.e., an appropriate set of simplifying assumptions, and has we might recover in the event that we select one that is inappropriate. RESEARCH TOPICS ~ this section we discuss ~ broad terms a Her of research topics relevant to he aver 1 goal of building system; that are b ~ h r ~ ust and transparent. For the most part, we proceed from the assumption that getting machines to assist in significant ways with reasoning about situations like the STS-2 fuel cell problem will require that they have appropriate models. We then ask how those models can be created and indeed how we can design the device fern the outset in such a way that the model creation process is made simpler. Model Selection and Creation Selecting and creating models is perhaps the most fundamental issue in solving eng m Bering problems and an important determinant of the robustness of the solution. Tt is a skill that is in some ways well known: it's what good engineers have learned to do through years of experience. The goal here is to understand that skill and experience well q h that it can be embodied in a program, allowing automated a.C=istance in -Collecting and creating appropriate models.

OCR for page 211
225 In almost any design or analysis problem, the most basic question is how to "think aborts the object in question, i.e., how to model it. Given the acknowledgment that all models are abstractions, it is futile (and as we have suggested, inappropriate) to seek perfect completeness and rctustness. That in turn means that the modeling decision concerns what to pay attention to, i.e., what properties of the object are relevant to the task at hand and which can safely be ignored. Hence the goal is to find a model with two properties. First it should be complete enough that it handles the important phenomena. Second it should be abstract enough that it is computable and capable of producing a description at a useful level of detail (i.e., even if it were possible? it would be of little use to produce a picosecond, Marc volt-level analysis of a circuit whose digital behavior is of interests. But naming the goal is easy; the research challenge is In finding a more precise understanding of what it means to ''consider the tasks' and to determine when a model is l' complete enough, Abstract enough", and at an appropriate level of detail. One possible route to understanding the nature and character of models is to define the kinds of abstractions commonly used ~ creating them. This might be done by determining what kinds of abstractions are commonly (and often implicitly) employed by eng beers. What are the rest of the terms like digits, analog, loner, etc.? Is there just an unstructured collection of such terms or is there, as we would guess, some sort of organizing principle that can be used to establish an ordering on them? If so, we might be able to say more concretely what it means to proceed from a more abstract to a more precise mcdel and might be able to develop programs capable of such behavior. It is unlikely that there is a simple, strict hierarchy that will allow us to move in a single, unambiguous direction. Much more likely we will find a tangled graph of models; part of the tack is to sort out the different kinds of interconnections likely to be encountered. A second possible route to understanding the nature of models arises from the simple observation that models ignore details. Perhaps then different kinds of models can be generated by selecting different combinations of details to ignore. The task here is to characterize different "kinds" of details; the ides set of them would not only generate known models but might suggest additional models as well. By either of these rout-~--studying the kinds of abstractions used or the kinds of details ignored--we might be able to produce an array of different kinds of models. That br logs us to the problem of model selection, determining which to use in a particular situation. Some assistance may be provided by knowing how the array of models is organized, i.e., what it means to be a "different kind of model." The difficulty arises in determining what the important phenomena are in the problem at hand and select mg a variety of model capable of dealing with it. How is it that a human engineer knows which approximations are plausible and which are likely to lead to error? It is unlikely that we will ever be able to guarantee that the knowledge used for model selection is flawless or that the models given to the program are flawless. We thus need to confront the problem of detecting and cleaning with models that are inappropriately chosen for , _ ,

OCR for page 211
226 the bask at hi or that are ~nc~nplete In sine relevant detail. Human engineers at times make the whorl selection or use a faulty model, yet are capable of detecting this at bend ing with it. How mitt we get machines to do the same? Finally, note that progress on m~el selection will have an Important impact on the saddest loaded issue of system override. If, ~= we have argued, unanticipated events are ~nevit~hie, simply having a detailed meet is not enough: events may occur that are outside the range of applicability of the model. This can be a particularly difficult problem because it concerns deciding "how to think about" the problem. We argue that override is funlamenta1ly a decision that a particular model is inappropriate. Consider the example of a program monitoring and controlling life support. We might be tempted to override its decisions if they seem sufficiently different frump our own, but why should they differ? The most basic answer seems to be that the model the program is using to interpret sensor readings is inappropriate, i.e., based on assumptions that are not valid in the current sibilation. The only objective way to discover this is by determining why that model was chosen, what approximations it embodies, and what the limitations are on those approximations. Since much of this information was used to make the model selection to begin with, leverage on the override problem can come from under sta ~ model selection and, importantly, from making explicit both the model itself and the assumptions underlying it. This would give us reasonably objective grounds for the override decision, since the model and its underlying assumptions will be available, and can be examined and compared to the current situation. It also reminds us how important it is that such information be made explicit, rather than left implicit in the program code or the mind of the program author. Model Specification Needs To Be TESS Trouble Than It Is Worth We have repeatedly stressed the importance of models as a basis for robust reasoning about complex systems. But specifying those models is not an easy ta.ck, for several reasons. At the simplest level the issue is volume: there is an enormous amount of information to be captured. Existing design capture systems don't deal well with the problem because they don't make the information collection process easy enough, nor do they offer sufficient payoff once the information is entered to provide a motivation for doing it. They are ~ general more trouble than they're worth. For design changes in particular, it IS today often easier simply to try out the change and then (maybe) go back and update the specification database. In the case of Solar Max, for instance, perhaps no one knew about the additional hardware because it had been added at the cast minute and never documented. The problem of documenting code is similar: it's often easier to try it out, then document. Often the documentation never gets done because it simply isn't viewed as critical to the undertaking.

OCR for page 211
227 The problem is both organizational and technical. Organizational issues arise because design documentation is typically of leant use to the original designer, who is most familiar with the object. There should be a value structure within the organization that makes clear the importance of supplying complete design specifications and emphasizes that, as in Solar Max, the consequences of even minor ~ e ~ ~ C~llSSlOnS can ne S~1~S. But there is a more radical position on this issue that is surely worth exploring. It Ought to be impossible to create or modify a design without doing it via a design capture system. Put slightly differently, Where should be a design capture system so useful that no one would think of proceeding without it. The thought is utopian but not so far afield as it might seem. Existing VISI design tools, for example, providing sufficiently powerful functionality that no major design would be done without them. Even their basic functions--schematic capture and edit, design rule checking, simulati~n--pravide sufficient payback to make them worth the trouble. Existing tools also illustrate important limitations: they capture the final result, but not the rationales, not the design process. An effective system would be one that was useful from the earliest "sketch on the back on an envelope" stage, and that captured (and aided) every step and decision along the way. The result would be a record that included not only the final design, but its intended functionality, all rationales for the design choices, etc. The technical problems In creating such a system Include standard concerns about a good Interface, such as ease of use and portability; paper is still hard to beat. But the issues go considerably deeper than that. Engineers find communication with each other possible in part because of a large shared vocabulary and base of experience. Communication with a design capture system should be based on similar knowledge; the identification and representation of that-knowledge is a sizable research bask. The relevant vocabulary includes concepts about structure (shape, connectivity, etc.) and behavior (what the device should do). Both present interesting challenges. While connectivity is relatively straightforward, a compact and appropriate vocabulary for shape is not obvious. Behavior can sometimes be captured by equations or short segments of code, but descriptions in that form soon grow unwieldy and opaque. We near to develop a vocabulary for behavior capable of dealing with considerably more complex devices. There is also the problem of unspoken assumptions. If design capture systems simply transcribe what is expressed literally, forcing every fact to be made explicit, the description task will always be overwhelming. We need to understand and accumulate the knowledge and design conventions of engineers so that the system can make the relevant inferences about what was intended, even if not end.

OCR for page 211
228 Designers for: Testability, Diagnosability, Ar~lyzabilit~r, C=nprehensibilit~r, Transparency,... We have argued that the complexity of the station arxl the novelty of the environ preclude an exhaustive a priori analysis of contingencies and r ~ ire instead an ability to figure out what to do in the face of unanticipated events. We have suggested that this in turn is best facilitated by "knowing how things work," i.e., having a model of structure and behavior. The cc mplexity of the systems we design clearly has an impact on both how easy it will be to create such models and how easy it will be to reason with them once they exist. Since we are in fact designing the station (rather than trying to model a naturally occurring system), it is worth asking what can be done at the design state to facilitate model creation and model use. Design for T-~hility Design for City is one relatively well 7 known approach in this category/. It acknowledges that newly manufactured devices have to be exhaustively tested to verify their correct operation before they are placed in service an] sun -tests that ,.t~ Arm A_ ~ ,.~r~ ~ `~ - ~~ 1 ~ - ! ~ ~ to ~ - _~1' J ~ we -away`` ~` whys ~~ '"~. '~ ~~= I. Substantial effort has been devoted to this in circuit design, with some success. Given the likely need for equiE:nent maintenance and Jche difficulty of a house (station?) call by service technicians, it will be useful ~ design the station In such a way that basic diagnostic tests can easily be run on devices that may be malfunction ng. Where well known concepts like ensuring that signals are observable and controllable are likely to carry over easily, part of the research task here lies in extending techniques developed for simple digit al circuits to deal with much larger subsystems Design for Diagnosability Designs for diagn~c~hility is a less well understood task. Where testing involves methodically trying out all of the designed behaviors of the device, diagnosis is a process of reasoning from the observed symptoms of m~1 function to identify the possibly faulty components. Diagnostic power is measured ~ part by discrimination able ity: more powerful diagnostic reasoning techniques implicate fewer components. But some problems are inherently ambiguous a device may be designed ~ such a way that the observed symptoms must correctly implicate a large number of different components. resign for diagnosabili~ would involve designing in a way that avoids this sibilation. Rat more pceitively, it weld mean designing ~ ways hat seek to minimize the nor of neons implicated by a malfunction. One very simple Cation along this line can be made by considering the topology of ache device: ache only su~nents that can be r ~ nsible for an Served sy ~ tom are those that are "carry connected" to it. In an electronic circuit, for example, the most obvious causal connections are provided by wires. More generally,

OCR for page 211
229 there must be same sequence of physical interactions by which the error propagate= from its source to the point where it is observed. The fewer such interactions, the fewer candidate subcc=ponents. Simply put, this argues for "sparse (modular) designs," i.e., those with relatively few interconnections. Designs with uni-directional components (i.e., those that operate in a single direction and have dist Act inputs and outputs, like logic gates and unlike resistors), also have smaller candidate sets. In devices with unidirectional components there is a s mgle a;-'ection of cau.~aiity, giving us a notion of "upstream" and "downstream" of the symptom. Only components that are upstream can be responsible for the ~- Diagnosis also involves probing, i.e., taking additional measurements inside the device, as well as generating and running tests designed to distinguish among possible candidate subcomponents. We might also examine design styles that facilitate both of these tasks. Designing for Analyzability, Comprehensibility, Transparency Given our emphasis on being able to figure cut what to do, perhaps the most fur~ent=1 third to do -=r:Ly on is hat might be All led design for analyzability or comprehensibility. If we have to thirJc about how the devil= works ark] reason through the possibly subtle effects of an unanticipated event, then let's at loot mace that easy to do. This may be little more than the traditional admonition to "keep it simple," here given the additional motivation of on-the-spot analysis and response. Simplicity in design will aid in making that easy; it may present additional virtues as well. Simplicity often produces transparency, an important component in people's willingness to am ept auto meted assistance with critical tasks. Simplicity will help achieve NASA's design goal of allowing crews to intervene at low levels in any station subsystem. Finally, simplicity may also produce robustness by assisting in determining when a model is inappropriate. We argued above that the override decision is part of the model selection process and could be facilitated by making explicit the simplifying assumptions underlying each mcdel. Those assumptions might not always be specified completely, at times it may be n=~=sary to determine what they are. This is likely to be easier to determine if the model itself can be analyzed easily. Robustness Requires Common Sense Current expert systems are brittle In part because they lace< con sense knacriedge, that large collection of simple facts about the world that is shared across a culture. At the simplest it may include facts such as EShysi~1 Ejects have mass and take up space, that two things cannot occupy the same space at the same time, or that objects that are unsupported will fall. In the absence of such an underpinning of world

OCR for page 211
230 knowledge, the system must interpret its miles with ccllnplete literal mindedness and can do little In situations In which the rules "almost" apply. Consider for example a mile in a medical diagnosis expert s:7sten pacifying in cart that "the patient is between 17 and 21 years; old." Does the rule apply if the patient is 16 years ll months old? How about 16 years o.9 months? Our common sense knowledge of the world tells us that the human body doesn't change discontinuously, so the rule is probably still relevant. Compare this with a rule that says ''If the postmark date is after April 15, then the tax return is late." Here we know acorn from common sense knowledge, that there is On fact a discont~nu_~. .~=h of these chunks of common sense is simple enough and easily addend to a system; the problem is finding and representing the vast cog. an of them necessary to support the kind of reasoning people do with. so little effort. For eng peering problem solving of the sort relevant to our concerns here there is another layer of what we ~ ght m~1l eng~nee ring common sense that Includes such facts as, liquids are incompressible, all objects are affected by gravitational fields, but not all objects are affected by electromagnetic fields, electromagnetic fields can be shielded, and so forth. Engineers also know large numbers of simple facts about functionality, such as what a valve does, and why a door is like a valve. the research tack here is the identification, accumulation, organization, and Interconnection of the vast numbers of simple facts that make up common sense (Lenat et al., 1986) and engineering common sense. Only with this body of knowledge will we be able to create systems that are more flexible and less literal minced. , What is the Source of Human Robustness? S. Once robustness in problem solving is a common trait of experienced engineers, we ought to take the obvious step of examining that behavior and attempting to understand its origins. What is it that human en do, what ~ it ant they hacw, that allows them to Size ark deal with .inadec~ate models? By is. it that human behavior sterns to degrade Gracefully as problems become more difficult, rather than precipitously, as is the case with our current program;? Part of the answer may lay ~ the ~ er of and variety of models they can use, along with their body of common sense knowledge. Multiple Models mus far our approach has focused on creating robustness by reasoning from detailed models. But how can we get robust behavior in situations where no effective mcdel yet exists? One quite plausible reason for this would be incomplete information: even assuming we know all the limits of the models we have, selection of an appropriate one might

OCR for page 211
231 depend on a fact about the system or environment that we simply don't have yet. In this section, we speculate on one possible approach to such problems. One idea explored to some degree in the HEARSAY system (Erman, et al., 1980) for speech understanding involves the use of multiple knowledge sources, each dealing with a slightly different body of knowledge. Our imperfect knowledge about the Bask-- tempting an utterance as a sentence--means that none of the knowledge sources can be Guaranteed to be correct. The basic insight here is to employ a group of cooperating experts, each with a different expertise, in the hope that their individual weaknesses are distinct (and hence will in some sense be mutually cc=pensated) but their strengths will be mutually reinforcing. A similar technique might be useful in eng peering p emblem solving: lacking any one model believed to be appropriate, we might try using a collection of them that appear to be plausible and that have somewhat different conditions of appli~=hiii~y. Even given such a collection, of course, there remains the interesting and difficult problem of deciding how to combine their results when the outcomes are (as expected) not identical. SORRY We have argued that the complexity of the station and the novelty of space as an environment makes it impossible to prduct and analyze all contingencies in advance. me hostility of the emriro~nt means the consequences of failure are substantial. In such situations, r~b~.=trless art trance bra essential properties of the systems developed. Systems are robust to the extent that they can Bead with events that have not keen specifically anticipated and analyzed. They are transparent to the extent that they can make thew reasoning comprehensible to an observer. Given the inevitability of unanticipated events, robustness is best accomplished by "figuring out" what to do, rather than relying on a list of predetermined responses. But "figuring out," the sort of analysis and reasoning routinely done by engineers, can only be done if you "know how it works." i.e.. have a model of the device. We thus believe that a Key source or power an engineer m g reason Meg as the collection of models engineers use, along with the approximations and abstractions that underlie the models. One major thrust of research then should be directed toward understanding the processes of morel creation, selection, and simplification. Given the serious consequences of working from incomplete information, a second major thrust should be devoted toward model and design capture. Existing systems for VISI design are effective enough to make them essential tools, and hence effective in same aspects of design capture. We need to provide similar levels of tools for all varieties= of design and need to understand how to capture design rationales as well as the final result of the design process.

OCR for page 211
232 Given the difficulty of the reasoning process even with complete information, we suggest turning the question around and asking what we can do at design time to make the reason ng task easier. We have speculated about what design for testability, diagnosability, and comprehensibility might mean, and suggest further exploration there as well. Finally, it appears that additional leverage on the problem is available from examining human performance to determine the source of robustness in cur own problem solving behavior, and fern compiling the large body of ~ on sense knowledge that seems to be a source of graceful degradation in human problem solving. ACKNCWIEIGUENTS ~ ~ ort for the preparation of this report came in part from a research grant from Digital Equipment Corporation, from the Defense Advanced Research Projects Agency of the Department of Defense, under office of Naval Research contract N00014-84-K-0124, and f~v~ a research grant from the Wang Corporation. This paper benefitted significantly from comments on =='ly drafts by Walter Hamscher, Brian Williams, Reid Simmons and Dan Weld. NONES 1. Rich and Waters, ace., Artificial Intelligence and Software Engineering, Morgan Kaufmann, 1986, is a recent survey of attempts to use AI approaches to this problem. It provides a historic=] overview and a wide range view of the problem with extensive references. Also see the A::: Transactions on Software Eng Sneering. 2. Davis, Buchanan, Shortliffe, Action Naples as a representation, Artificial Intelligence, February 1977, [p. 15-45, provides an early c~erview of MYCIN, the first purely mle-bas~~ expert in. Waterman, A (guide to Expert Sparse, Addison Weslev. 1986. is a Rand Xc orients hare Amoral applications of the technology arm provides a large set of examples and refenan,~c. 3. Paths. eri.. 0~1itative R - Inning Ah - ,t Ph~;m=1 .~t-=mc Nor~h-Holland, 1984, ~ the book version of the December 1984 issue of Artificial Intelligence, a special issue on that topic. Nine articles illustrate the variety of models and tasks attacked, including diagnosis, design verification, behavior prediction, etc.

OCR for page 211
233 4. Relatively little work addresses this topic afire cry. Patil, Szolovits, and Schwartz, Cat underspin of patient illness In medical diagnosis, Proc Seventh IntI JO Conf on AI, Pp. 893-899, explores the combined Bale of three different kinds of models In diagnostic reason m g. Hobbs, Granularity, Proc Ninth IntI JO Conf on AI, Pp. 432-435 speculates on ways of producing coarser grained models from fine grarned ones. 5. See the deKI==r, Williams, and Forbus article= in Bobrow on. cit 6. See, for example, Gentner and Stevens, Mental Models, Lawrence Eribaum, 1983. 7. Breuer, A methcdology for the design of testable large-scale integrated circuits, Report SD'TR-85-33, January 1985, Space Division, Air Force Systems Command, provides a wide-ranging overview of different testability techniques. ~EN~F2C Eichoefer, Gerald 1985 MITRE Corp. Report. (Adapted flus July 16 report) lenat, and Prakash, and Shephard 1986 Using common sense to overcome brittleness. Magazine. Winter. . Pp. 65-85 in AI Erman, and Hayes-Roth, and Teaser, and Redly 1980 The hearsay-II speech understanding system: integrating knowledge to resolve uncertainty. Pp. 213-254 in Computing Surveys. June. /